arxiv:2605.26032

Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution

Published on May 25

· Submitted by

Zhuo Chen on May 28

Massachusetts Institute of Technology

Upvote

Authors:

Abstract

SKILD is a scale-invariant k-space image learning diffusion model that unifies image generation and continuous super-resolution through a single unconditional framework by leveraging scale invariance in image content and physics systems.

AI-generated summary

Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introduce SKILD, a Scale-invariant K-Space Image Learning Diffusion model that unifies generation and continuous super-resolution within a single unconditional framework. Both natural images and critical physical systems exhibit scale invariance, and we leverage it to design a forward process that attenuates image content from fine to coarse scales while injecting spectrum-matched Gaussian noise, making scale an explicit coordinate of the diffusion dynamics. The same trained reverse process performs generation and continuous super-resolution by varying only the starting timestep: no task-specific architecture, no conditioning branch, no classifier-free guidance, no retraining per scale factor. Empirically, SKILD reaches FID 2.65 and Inception Score 9.63 on unconditional CIFAR-10, performs 2times--8times super-resolution on ImageNet from a single unconditional checkpoint while outperforming conditional models across perceptual metrics, and reconstructs critical Ising models whose connected four-point correlations closely track the ground truth.

View arXiv page View PDF GitHub 6 Add to collection

Community

zhuoc3

Paper submitter 1 day ago

Creating images from noise is image generation; reconstructing fine details from coarse inputs is super-resolution. Despite their practical differences, both can be understood as reversing information loss across scales. We introduce SKILD, a Scale-invariant K-Space Image Learning Diffusion model that unifies generation and continuous super-resolution within a single unconditional framework. Both natural images and critical physical systems exhibit scale invariance, and we leverage it to design a forward process that attenuates image content from fine to coarse scales while injecting spectrum-matched Gaussian noise, making scale an explicit coordinate of the diffusion dynamics. The same trained reverse process performs generation and continuous super-resolution by varying only the starting timestep: no task-specific architecture, no conditioning branch, no classifier-free guidance, no retraining per scale factor. Empirically, SKILD reaches FID 2.65 and Inception Score 9.63 on unconditional CIFAR-10, performs 2×–8× super-resolution on ImageNet from a single unconditional checkpoint while outperforming conditional models across perceptual metrics, and reconstructs critical Ising models whose connected four-point correlations closely track the ground truth.

librarian-bot

about 19 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.26032

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.26032 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.26032 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.26032 in a Space README.md to link it from this page.