YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
LongCat-AudioDiT-1B-Diffusers
Diffusers format for Meituan's LongCat-AudioDiT-1B.
Model Description
A DiT (Diffusion Transformer) based audio generation model for text-to-audio synthesis.
Usage
import soundfile as sf
from diffusers import LongCatAudioDiTPipeline
import torch
pipeline = LongCatAudioDiTPipeline.from_pretrained(
"ruixiangma/LongCat-AudioDiT-1B-Diffusers",
torch_dtype=torch.bfloat16
)
pipeline = pipeline.to("cuda")
prompt = "A calm ocean wave ambience with soft wind in the background."
audio = pipeline(prompt, audio_duration_s=5.0, num_inference_steps=20, guidance_scale=4.0, seed=42).audios[0, 0]
sf.write("output.wav", audio, pipeline.sample_rate)
License
MIT License — following the upstream license published with meituan-longcat/LongCat-AudioDiT-1B.
- Downloads last month
- 54
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support