Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / diffusers /pr_12509 /en /api /pipelines /stable_diffusion /k_diffusion.md

rtrm

20 days ago

preview code

download

raw

12.1 kB

K-Diffusion

k-diffusion is a popular library created by Katherine Crowson. We provide StableDiffusionKDiffusionPipeline and StableDiffusionXLKDiffusionPipeline that allow you to run Stable DIffusion with samplers from k-diffusion.

Note that most the samplers from k-diffusion are implemented in Diffusers and we recommend using existing schedulers. You can find a mapping between k-diffusion samplers and schedulers in Diffusers here

StableDiffusionKDiffusionPipeline[[diffusers.StableDiffusionKDiffusionPipeline]]

class diffusers.StableDiffusionKDiffusionPipelinediffusers.StableDiffusionKDiffusionPipelinehttps://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/pipelines/stable_diffusion_k_diffusion/pipeline_stable_diffusion_k_diffusion.py#L66[{"name": "vae", "val": ": AutoencoderKL"}, {"name": "text_encoder", "val": ": CLIPTextModel"}, {"name": "tokenizer", "val": ": typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.models.clip.tokenization_clip_fast.CLIPTokenizerFast]"}, {"name": "unet", "val": ": UNet2DConditionModel"}, {"name": "scheduler", "val": ": KarrasDiffusionSchedulers"}, {"name": "safety_checker", "val": ": StableDiffusionSafetyChecker"}, {"name": "feature_extractor", "val": ": CLIPImageProcessor"}, {"name": "requires_safety_checker", "val": ": bool = True"}]- vae (AutoencoderKL) -- Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.

text_encoder (CLIPTextModel) -- Frozen text-encoder. Stable Diffusion uses the text portion of CLIP, specifically the clip-vit-large-patch14 variant.
tokenizer (CLIPTokenizer) -- Tokenizer of class CLIPTokenizer.
unet (UNet2DConditionModel) -- Conditional U-Net architecture to denoise the encoded image latents.
scheduler (SchedulerMixin) -- A scheduler to be used in combination with unet to denoise the encoded image latents. Can be one of DDIMScheduler, LMSDiscreteScheduler, or PNDMScheduler.
safety_checker (StableDiffusionSafetyChecker) -- Classification module that estimates whether generated images could be considered offensive or harmful. Please, refer to the model card for details.
feature_extractor (CLIPImageProcessor) -- Model that extracts features from generated images to be used as inputs for the safety_checker.0

Pipeline for text-to-image generation using Stable Diffusion.

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

The pipeline also inherits the following loading methods:

load_textual_inversion() for loading textual inversion embeddings
load_lora_weights() for loading LoRA weights
save_lora_weights() for saving LoRA weights

> This is an experimental pipeline and is likely to change in the future.

encode_promptdiffusers.StableDiffusionKDiffusionPipeline.encode_prompthttps://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/pipelines/stable_diffusion_k_diffusion/pipeline_stable_diffusion_k_diffusion.py#L202[{"name": "prompt", "val": ""}, {"name": "device", "val": ""}, {"name": "num_images_per_prompt", "val": ""}, {"name": "do_classifier_free_guidance", "val": ""}, {"name": "negative_prompt", "val": " = None"}, {"name": "prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "lora_scale", "val": ": typing.Optional[float] = None"}, {"name": "clip_skip", "val": ": typing.Optional[int] = None"}]- prompt (str or List[str], optional) -- prompt to be encoded

device -- (torch.device): torch device
num_images_per_prompt (int) -- number of images that should be generated per prompt
do_classifier_free_guidance (bool) -- whether to use classifier free guidance or not
negative_prompt (str or List[str], optional) -- The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
prompt_embeds (torch.Tensor, optional) -- Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (torch.Tensor, optional) -- Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
lora_scale (float, optional) -- A LoRA scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded.
clip_skip (int, optional) -- Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings.0

Encodes the prompt into text encoder hidden states.

StableDiffusionXLKDiffusionPipeline[[diffusers.StableDiffusionXLKDiffusionPipeline]]

class diffusers.StableDiffusionXLKDiffusionPipelinediffusers.StableDiffusionXLKDiffusionPipelinehttps://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/pipelines/stable_diffusion_k_diffusion/pipeline_stable_diffusion_xl_k_diffusion.py#L90[{"name": "vae", "val": ": AutoencoderKL"}, {"name": "text_encoder", "val": ": CLIPTextModel"}, {"name": "text_encoder_2", "val": ": CLIPTextModelWithProjection"}, {"name": "tokenizer", "val": ": CLIPTokenizer"}, {"name": "tokenizer_2", "val": ": CLIPTokenizer"}, {"name": "unet", "val": ": UNet2DConditionModel"}, {"name": "scheduler", "val": ": KarrasDiffusionSchedulers"}, {"name": "force_zeros_for_empty_prompt", "val": ": bool = True"}]

encode_promptdiffusers.StableDiffusionXLKDiffusionPipeline.encode_prompthttps://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/pipelines/stable_diffusion_k_diffusion/pipeline_stable_diffusion_xl_k_diffusion.py#L206[{"name": "prompt", "val": ": str"}, {"name": "prompt_2", "val": ": typing.Optional[str] = None"}, {"name": "device", "val": ": typing.Optional[torch.device] = None"}, {"name": "num_images_per_prompt", "val": ": int = 1"}, {"name": "do_classifier_free_guidance", "val": ": bool = True"}, {"name": "negative_prompt", "val": ": typing.Optional[str] = None"}, {"name": "negative_prompt_2", "val": ": typing.Optional[str] = None"}, {"name": "prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "pooled_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_pooled_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "lora_scale", "val": ": typing.Optional[float] = None"}, {"name": "clip_skip", "val": ": typing.Optional[int] = None"}]- prompt (str or List[str], optional) -- prompt to be encoded

prompt_2 (str or List[str], optional) -- The prompt or prompts to be sent to the tokenizer_2 and text_encoder_2. If not defined, prompt is used in both text-encoders
device -- (torch.device): torch device
num_images_per_prompt (int) -- number of images that should be generated per prompt
do_classifier_free_guidance (bool) -- whether to use classifier free guidance or not
negative_prompt (str or List[str], optional) -- The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
negative_prompt_2 (str or List[str], optional) -- The prompt or prompts not to guide the image generation to be sent to tokenizer_2 and text_encoder_2. If not defined, negative_prompt is used in both text-encoders
prompt_embeds (torch.Tensor, optional) -- Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (torch.Tensor, optional) -- Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
pooled_prompt_embeds (torch.Tensor, optional) -- Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, pooled text embeddings will be generated from prompt input argument.
negative_pooled_prompt_embeds (torch.Tensor, optional) -- Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, pooled negative_prompt_embeds will be generated from negative_prompt input argument.
lora_scale (float, optional) -- A lora scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded.
clip_skip (int, optional) -- Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings.0

Encodes the prompt into text encoder hidden states.

Xet Storage Details

Size:: 12.1 kB
Xet hash:: ac166486a05ff1dedd045b7c4a318336f217bf98683b56a084917564c153dc1d

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.