Instructions to use nvidia/Cosmos-1.0-Diffusion-7B-Text2World with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use nvidia/Cosmos-1.0-Diffusion-7B-Text2World with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- NeMo
How to use nvidia/Cosmos-1.0-Diffusion-7B-Text2World with NeMo:
# tag did not correspond to a valid NeMo domain.
- Diffusers
How to use nvidia/Cosmos-1.0-Diffusion-7B-Text2World with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("nvidia/Cosmos-1.0-Diffusion-7B-Text2World", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Can anyone give me a hint on how google T5 model is involved in the generation process?
Can anyone give me a hint on how google T5 model is involved in the generation process? Since during inference, this model was downloaded? Is it used for prompt upsampling?
The T5-XXL is used for the linguistic context and text conditioning of text inputs.
In the architecture, each transformer block uses a sequential self attention layer(for spatiotemporal tokens), followed by a cross-attention layer(here semantic context is integrated using T5-XXL), followed by a FFN.
You can refer the "Cross-attention for text conditioning" in the architecture section of the cosmos paper if you like
https://research.nvidia.com/publication/2025-01_cosmos-world-foundation-model-platform-physical-ai
Hope this helped. π