Workflow - Talking Avatar (voice cloning with Qwen-TTS)

#44
by RuneXX - opened

LTX - Talking Avatar with Qwen TTS (and steady camera lora optionally if you want "true" talking avatar)

A workflow with Qwen TTS directly connected to LTX-2 to allow voice cloning for consistent voice across video generations.
Prompt the dialog directly into Qwen TTS node and use a reference audio for voice cloning.

Image to video (I2V) : https://huggingface.co/RuneXX/LTX-2-Workflows/blob/main/LTX-2%20-%20I2V%20Talking%20Avatar%20(voice%20clone%20Qwen-TTS).json
Text to video (T2V) : https://huggingface.co/RuneXX/LTX-2-Workflows/blob/main/LTX-2%20-%20T2V%20Talking%20Avatar%20(voice%20clone%20Qwen-TTS).json

Needed nodes

Credit to @f0rkineye for the idea.

Btw, there are many Qwen TTS repros out now, and they all work, and they are all similar or same-ish. The one I picked was a bit random, based on it having a simple view node (as well as a complex one)
You can easily swap out the Qwen TTS node i used, for other one, if you already have Qwen TTS installed. Some alternative Qwen TTS nodes:
https://github.com/HAIGC/Comfyui-HAIGC-QwenTTS
https://github.com/DarioFT/ComfyUI-Qwen3-TTS
https://github.com/flybirdxx/ComfyUI-Qwen-TTS

Qwen-TTS has currently released 0.6b small fast model that is light on the computer, and a 1.7b more accurate model https://qwen.ai/blog?id=qwen3tts-0115
( you can of course use other TTS models such as VibeVoice from Microsoft that is very good, IndexTTS etc etc etc )

DarioFT's allows to set local model path and has nodes for training (fine-tuning) but no option for seed.

Your Image to video (I2V) workflow link seems to be broken πŸ€”

Your Image to video (I2V) workflow link seems to be broken πŸ€”

thanks for the notice ;-) fixed the link

Sign up or log in to comment