Diffusers
Safetensors
How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("saeed-5959/high_sync", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

HighSync: High-Quality Lip Synchronization via Latent Diffusion Models

Saeed Firouzi1 

Abstraction

We present HighSync, an end-to-end diffusion-based framework for high-fidelity lip synchronization that generates photorealistic talking-face videos aligned with arbitrary input audio. Existing approaches consistently struggle to reconcile image quality with synchronization accuracy, producing either visually degraded outputs or temporally inconsistent lip move- ments. HighSync addresses both challenges simultaneously and, to our knowledge, is the first lip sync model to operate natively at 512×512 resolution, positioning it as a viable solution for professional production environments such as the film and broad- cast industries. Central to our approach is the identification and systematic elimination of a data leakage phenomenon that has silently undermined temporal modeling in prior work, preventing models from developing a genuine dependence on the audio signal. Comprehensive evaluations across both perceptual quality and synchronization accuracy metrics confirm that HighSync achieves state-of-the-art performance on both fronts.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for saeed-5959/high_sync