DreamDojo-Diffusers
Collection
DreamDojo-Diffusers is a collection of HuggingFace-compatible checkpoints converted from https://huggingface.co/nvidia/DreamDojo β’ 10 items β’ Updated
Base model trained on diverse video data. Part of the DreamDojo model family.
| Size | 2B |
| Stage | Pre-training |
| Architecture | DiT (Diffusion Transformer) with AdaLN-LoRA |
| Base | Cosmos Predict 2.5 |
DreamDojo-Pretrain-2B-Diffusers/
βββ transformer/ # DiT backbone (sharded safetensors)
βββ crossattn_adapter/ # Text-to-DiT projection (100352 β 1024)
βββ vae/ # AutoencoderKLWan (standard diffusers)
βββ lam/ # Latent Action Model (710M params)
βββ text_encoder/ # Cosmos-Reason1-7B
βββ scheduler/ # FlowMatchEulerDiscreteScheduler
βββ action_processor/ # DreamDojo-specific config
βββ config.json
| 2B | |
|---|---|
| Model channels | 2048 |
| Transformer blocks | 28 |
| Attention heads | 16 |
| Patch size (spatial / temporal) | 2 / 1 |
| Action dim | 384 (unified) |
@article{dreamdojo2025,
title={DreamDojo: Advancing Real-World Robot Policies Through Generated Interactive Environments},
author={NVIDIA},
year={2025}
}
Please refer to the NVIDIA DreamDojo repository for license terms.