DreamDojo-Pretrain-2B-Diffusers

Base model trained on diverse video data. Part of the DreamDojo model family.


Size	2B
Stage	Pre-training
Architecture	DiT (Diffusion Transformer) with AdaLN-LoRA
Base	Cosmos Predict 2.5

Checkpoint Structure

DreamDojo-Pretrain-2B-Diffusers/
├── transformer/            # DiT backbone (sharded safetensors)
├── crossattn_adapter/      # Text-to-DiT projection (100352 → 1024)
├── vae/                    # AutoencoderKLWan (standard diffusers)
├── lam/                    # Latent Action Model (710M params)
├── text_encoder/           # Cosmos-Reason1-7B
├── scheduler/              # FlowMatchEulerDiscreteScheduler
├── action_processor/       # DreamDojo-specific config
└── config.json

Architecture

	2B
Model channels	2048
Transformer blocks	28
Attention heads	16
Patch size (spatial / temporal)	2 / 1
Action dim	384 (unified)

Citation

@article{dreamdojo2025,
  title={DreamDojo: Advancing Real-World Robot Policies Through Generated Interactive Environments},
  author={NVIDIA},
  year={2025}
}

License

Please refer to the NVIDIA DreamDojo repository for license terms.

Downloads last month: -

Inference Providers NEW

Video-to-Video

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Physis-AI/DreamDojo-Pretrain-2B-Diffusers

DreamDojo-Diffusers

Collection

DreamDojo-Diffusers is a collection of HuggingFace-compatible checkpoints converted from https://huggingface.co/nvidia/DreamDojo • 10 items • Updated Apr 21