I improved the public demo for TADA — a generative framework for speech modeling via text–acoustic dual alignment.
TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.
The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.
This updated demo makes the process clearer:
• load the model
• prepare a reference voice (optionally with transcript or Whisper auto-transcription)
• generate speech conditioned on that reference
It also adds multilingual support.
Presets are included for a few languages, but the model supports more:
English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese
Feel free to try different voices, accents, or languages and see how the alignment behaves.
👉 fffiloni/tada-dual-alignment-tts-demo
Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)