Can anyone give me a hint on how google T5 model is involved in the generation process?

by junyaoren - opened Jan 10, 2025

Jan 10, 2025

Can anyone give me a hint on how google T5 model is involved in the generation process? Since during inference, this model was downloaded? Is it used for prompt upsampling?

aayushjansari

Jan 18, 2025

The T5-XXL is used for the linguistic context and text conditioning of text inputs.
In the architecture, each transformer block uses a sequential self attention layer(for spatiotemporal tokens), followed by a cross-attention layer(here semantic context is integrated using T5-XXL), followed by a FFN.

You can refer the "Cross-attention for text conditioning" in the architecture section of the cosmos paper if you like

https://research.nvidia.com/publication/2025-01_cosmos-world-foundation-model-platform-physical-ai

Hope this helped. 😃

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment