import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image, export_to_video
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("tedlasai/learn2refocus", dtype=torch.bfloat16, device_map="cuda")
pipe.to("cuda")
prompt = "A man with short gray hair plays a red electric guitar."
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png"
)
output = pipe(image=image, prompt=prompt).frames[0]
export_to_video(output, "output.mp4")Learning to Refocus with Video Diffusion Models
This repository contains the model weights for the paper Learning to Refocus with Video Diffusion Models.
Project Page | GitHub Repository
Summary
Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. This work introduces a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, the approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications.
Usage
For detailed environment setup, training, and testing instructions, please refer to the official GitHub repository. The model utilizes fine-tuned Stable Video Diffusion (SVD) weights.
Citation
If you use our dataset, code, or model in your research, please cite the following paper:
@inproceedings{Tedla2025Refocus,
title={{Learning to Refocus with Video Diffusion Models}},
author={{Tedla, SaiKiran and Zhang, Zhoutong and Zhang, Xuaner and Xin, Shumian}},
booktitle={{Proceedings of the ACM SIGGRAPH Asia Conference}},
year={{2025}}
}
- Downloads last month
- -