Coarse-Guided Visual Generation via Weighted h-Transform Sampling
Abstract
A novel training-free visual generation method uses h-transform to guide diffusion models with improved quality and generalization compared to existing approaches.
Coarse-guided visual generation, which synthesizes fine visual samples from degraded or low-fidelity coarse references, is essential for various real-world applications. While training-based approaches are effective, they are inherently limited by high training costs and restricted generalization due to paired data collection. Accordingly, recent training-free works propose to leverage pretrained diffusion models and incorporate guidance during the sampling process. However, these training-free methods either require knowing the forward (fine-to-coarse) transformation operator, e.g., bicubic downsampling, or are difficult to balance between guidance and synthetic quality. To address these challenges, we propose a novel guided method by using the h-transform, a tool that can constrain stochastic processes (e.g., sampling process) under desired conditions. Specifically, we modify the transition probability at each sampling timestep by adding to the original differential equation with a drift function, which approximately steers the generation toward the ideal fine sample. To address unavoidable approximation errors, we introduce a noise-level-aware schedule that gradually de-weights the term as the error increases, ensuring both guidance adherence and high-quality synthesis. Extensive experiments across diverse image and video generation tasks demonstrate the effectiveness and generalization of our method.
Community
Achieve various conditional visual generation guided by a coarse sample with 1 line of code.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PropFly: Learning to Propagate via On-the-Fly Supervision from Pre-trained Video Diffusion Models (2026)
- UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models (2026)
- SemanticNVS: Improving Semantic Scene Understanding in Generative Novel View Synthesis (2026)
- Joint Geometric and Trajectory Consistency Learning for One-Step Real-World Super-Resolution (2026)
- FlowFixer: Towards Detail-Preserving Subject-Driven Generation (2026)
- One step further with Monte-Carlo sampler to guide diffusion better (2026)
- When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper