AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 28
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("SuperPauly/AudioSR", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]Pre-trained AudioSR (Versatile Audio Super Resolution) models for use with ComfyUI-AudioSR custom node.
ComfyUI/models/AudioSR/ComfyUI Workflow:
Load Audio → AudioSR → Preview/Save Audio
Recommended Settings:
audiosr_speech_fp32.safetensors for voice, audiosr_basic_fp32.safetensors for everything elseAudioSR upscales low-quality audio to high-quality 48kHz output using latent diffusion. It:
Based on AudioSR: Versatile Audio Super-Resolution by Haohe Liu et al.
Original repository: https://github.com/haoheliu/versatile_audio_super_resolution
License: MIT