Diffusers documentation

HiDreamImageTransformer2DModel

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

HiDreamImageTransformer2DModel

A Transformer model for image-like data from HiDream-I1.

The model can be loaded with the following code snippet.

from diffusers import HiDreamImageTransformer2DModel

transformer = HiDreamImageTransformer2DModel.from_pretrained("HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16)

Loading GGUF quantized checkpoints for HiDream-I1

GGUF checkpoints for the HiDreamImageTransformer2DModel can be loaded using ~FromOriginalModelMixin.from_single_file

import torch
from diffusers import GGUFQuantizationConfig, HiDreamImageTransformer2DModel

ckpt_path = "https://huggingface.co/city96/HiDream-I1-Dev-gguf/blob/main/hidream-i1-dev-Q2_K.gguf"
transformer = HiDreamImageTransformer2DModel.from_single_file(
    ckpt_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16
)

HiDreamImageTransformer2DModel

class diffusers.HiDreamImageTransformer2DModel

< >

( patch_size: int | None = None in_channels: int = 64 out_channels: int | None = None num_layers: int = 16 num_single_layers: int = 32 attention_head_dim: int = 128 num_attention_heads: int = 20 caption_channels: list = None text_emb_dim: int = 2048 num_routed_experts: int = 4 num_activated_experts: int = 2 axes_dims_rope: tuple = (32, 32) max_resolution: tuple = (128, 128) llama_layers: list = None force_inference_output: bool = False )

forward

< >

( hidden_states: Tensor timesteps: LongTensor = None encoder_hidden_states_t5: Tensor = None encoder_hidden_states_llama3: Tensor = None pooled_embeds: Tensor = None img_ids: torch.Tensor | None = None img_sizes: list[tuple[int, int]] | None = None hidden_states_masks: torch.Tensor | None = None attention_kwargs: dict[str, typing.Any] | None = None return_dict: bool = True **kwargs )

Parameters

  • hidden_states (torch.Tensor of shape (batch_size, in_channels, height, width) or (batch_size, patch_height * patch_width, patch_size * patch_size * channels)) — Input hidden_states.
  • timesteps (torch.LongTensor) — Used to indicate denoising step.
  • encoder_hidden_states_t5 (torch.Tensor) — Conditional embeddings computed from the T5 text encoder.
  • encoder_hidden_states_llama3 (torch.Tensor) — Conditional embeddings computed from the Llama3 text encoder.
  • pooled_embeds (torch.Tensor) — Pooled text embeddings used for additional conditioning.
  • img_ids (torch.Tensor, optional) — Image position ids for the patched hidden states.
  • img_sizes (list of tuple of int, optional) — Per-sample patch grid sizes used to unpatchify the output.
  • hidden_states_masks (torch.Tensor, optional) — Mask over patched hidden_states.
  • attention_kwargs (dict, optional) — A kwargs dictionary that if specified is passed along to the AttentionProcessor as defined under self.processor in diffusers.models.attention_processor.
  • return_dict (bool, optional, defaults to True) — Whether or not to return a ~models.transformer_2d.Transformer2DModelOutput instead of a plain tuple.

The HiDreamImageTransformer2DModel forward method.

Transformer2DModelOutput

class diffusers.models.modeling_outputs.Transformer2DModelOutput

< >

( sample: torch.Tensor )

Parameters

  • sample (torch.Tensor of shape (batch_size, num_channels, height, width) or (batch size, num_vector_embeds - 1, num_latent_pixels) if Transformer2DModel is discrete) — The hidden states output conditioned on the encoder_hidden_states input. If discrete, returns probability distributions for the unnoised latent pixels.

The output of Transformer2DModel.

Update on GitHub