Model reading the prompt out loud.
This is an excellent model, to put it mildly.
However, it often reads the prompt out loud in scenes where dialogue or music isn't specified. Even adding a token to the negative prompt doesn't seem to stop it unless I switch schedulers and crank up the cfg (which degrades quality).
Just thought I'd mention it to see if anyone else had the same issue or if it's just my prompting style. The latter is definitely possible. I'm using a 30b Qwen model for refinement so things get a bit verbose, lol.
The stock text encoder is to blame as its not properly programmed to limit random dialogue. Using the hereticx encoder with a prompt enhancer works wonders. Look into using the Rune XX workflow as the stock 2.3 T2V workflow in ComfyUI is garbage. Switching over the Kijai's distilled transformers works wonders as well. It still doesn't handle three-way conversations correctly in sequence, but the results are much better.
Thanks for the heads up. I'm only just now switching from WAN to LTX so the built-in audio stuff is very new to me, lol.
I was already using a heretic gemma but, per you suggestion, I swapped out the paired default encoder in the dual clip. I ended up using the one from Kijai/LTX2.3 (plus the split out transformer and VAE's from there as well). That has largely fixed the issue.