Buckets:

hf-doc-build
/

doc-dev

hf-doc-build/doc-dev / diffusers /pr_12595 /en /api /image_processor.md

35 kB

	# VAE Image Processor

	The `VaeImageProcessor` provides a unified API for [StableDiffusionPipeline](/docs/diffusers/pr_12595/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline)s to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, PyTorch, and NumPy arrays.

	All pipelines with `VaeImageProcessor` accept PIL Image, PyTorch tensor, or NumPy arrays as image inputs and return outputs based on the `output_type` argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the `output_type` argument (for example `output_type="latent"`). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing PyTorch tensors directly between different pipelines.

	## VaeImageProcessor[[diffusers.image_processor.VaeImageProcessor]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.image_processor.VaeImageProcessor</name><anchor>diffusers.image_processor.VaeImageProcessor</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L88</source><parameters>[{"name": "do_resize", "val": ": bool = True"}, {"name": "vae_scale_factor", "val": ": int = 8"}, {"name": "vae_latent_channels", "val": ": int = 4"}, {"name": "resample", "val": ": str = 'lanczos'"}, {"name": "reducing_gap", "val": ": int = None"}, {"name": "do_normalize", "val": ": bool = True"}, {"name": "do_binarize", "val": ": bool = False"}, {"name": "do_convert_rgb", "val": ": bool = False"}, {"name": "do_convert_grayscale", "val": ": bool = False"}]</parameters><paramsdesc>- do_resize (`bool`, optional, defaults to `True`) --
	Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. Can accept
	`height` and `width` arguments from [image_processor.VaeImageProcessor.preprocess()](/docs/diffusers/pr_12595/en/api/image_processor#diffusers.image_processor.VaeImageProcessor.preprocess) method.
	- vae_scale_factor (`int`, optional, defaults to `8`) --
	VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor.
	- resample (`str`, optional, defaults to `lanczos`) --
	Resampling filter to use when resizing the image.
	- do_normalize (`bool`, optional, defaults to `True`) --
	Whether to normalize the image to [-1,1].
	- do_binarize (`bool`, optional, defaults to `False`) --
	Whether to binarize the image to 0/1.
	- do_convert_rgb (`bool`, optional, defaults to be `False`) --
	Whether to convert the images to RGB format.
	- do_convert_grayscale (`bool`, optional, defaults to be `False`) --
	Whether to convert the images to grayscale format.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Image processor for VAE.





	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>apply_overlay</name><anchor>diffusers.image_processor.VaeImageProcessor.apply_overlay</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L794</source><parameters>[{"name": "mask", "val": ": Image"}, {"name": "init_image", "val": ": Image"}, {"name": "image", "val": ": Image"}, {"name": "crop_coords", "val": ": typing.Optional[typing.Tuple[int, int, int, int]] = None"}]</parameters><paramsdesc>- mask (`PIL.Image.Image`) --
	The mask image that highlights regions to overlay.
	- init_image (`PIL.Image.Image`) --
	The original image to which the overlay is applied.
	- image (`PIL.Image.Image`) --
	The image to overlay onto the original.
	- crop_coords (`Tuple[int, int, int, int]`, optional) --
	Coordinates to crop the image. If provided, the image will be cropped accordingly.</paramsdesc><paramgroups>0</paramgroups><rettype>`PIL.Image.Image`</rettype><retdesc>The final image with the overlay applied.</retdesc></docstring>

	Applies an overlay of the mask and the inpainted image on the original image.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>binarize</name><anchor>diffusers.image_processor.VaeImageProcessor.binarize</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L529</source><parameters>[{"name": "image", "val": ": Image"}]</parameters><paramsdesc>- image (`PIL.Image.Image`) --
	The image input, should be a PIL image.</paramsdesc><paramgroups>0</paramgroups><rettype>`PIL.Image.Image`</rettype><retdesc>The binarized image. Values less than 0.5 are set to 0, values greater than 0.5 are set to 1.</retdesc></docstring>

	Create a mask.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>blur</name><anchor>diffusers.image_processor.VaeImageProcessor.blur</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L276</source><parameters>[{"name": "image", "val": ": Image"}, {"name": "blur_factor", "val": ": int = 4"}]</parameters><paramsdesc>- image (`PIL.Image.Image`) --
	The PIL image to convert to grayscale.</paramsdesc><paramgroups>0</paramgroups><rettype>`PIL.Image.Image`</rettype><retdesc>The grayscale-converted PIL image.</retdesc></docstring>

	Applies Gaussian blur to an image.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>convert_to_grayscale</name><anchor>diffusers.image_processor.VaeImageProcessor.convert_to_grayscale</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L259</source><parameters>[{"name": "image", "val": ": Image"}]</parameters><paramsdesc>- image (`PIL.Image.Image`) --
	The input image to convert.</paramsdesc><paramgroups>0</paramgroups><rettype>`PIL.Image.Image`</rettype><retdesc>The image converted to grayscale.</retdesc></docstring>

	Converts a given PIL image to grayscale.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>convert_to_rgb</name><anchor>diffusers.image_processor.VaeImageProcessor.convert_to_rgb</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L242</source><parameters>[{"name": "image", "val": ": Image"}]</parameters><paramsdesc>- image (`PIL.Image.Image`) --
	The PIL image to convert to RGB.</paramsdesc><paramgroups>0</paramgroups><rettype>`PIL.Image.Image`</rettype><retdesc>The RGB-converted PIL image.</retdesc></docstring>

	Converts a PIL image to RGB format.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>denormalize</name><anchor>diffusers.image_processor.VaeImageProcessor.denormalize</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L227</source><parameters>[{"name": "images", "val": ": typing.Union[numpy.ndarray, torch.Tensor]"}]</parameters><paramsdesc>- images (`np.ndarray` or `torch.Tensor`) --
	The image array to denormalize.</paramsdesc><paramgroups>0</paramgroups><rettype>`np.ndarray` or `torch.Tensor`</rettype><retdesc>The denormalized image array.</retdesc></docstring>

	Denormalize an image array to [0,1].








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>get_crop_region</name><anchor>diffusers.image_processor.VaeImageProcessor.get_crop_region</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L293</source><parameters>[{"name": "mask_image", "val": ": Image"}, {"name": "width", "val": ": int"}, {"name": "height", "val": ": int"}, {"name": "pad", "val": " = 0"}]</parameters><paramsdesc>- mask_image (PIL.Image.Image) -- Mask image.
	- width (int) -- Width of the image to be processed.
	- height (int) -- Height of the image to be processed.
	- pad (int, optional) -- Padding to be added to the crop region. Defaults to 0.</paramsdesc><paramgroups>0</paramgroups><rettype>tuple</rettype><retdesc>(x1, y1, x2, y2) represent a rectangular region that contains all masked ares in an image and
	matches the original aspect ratio.</retdesc></docstring>

	Finds a rectangular region that contains all masked ares in an image, and expands region to match the aspect
	ratio of the original image; for example, if user drew mask in a 128x32 region, and the dimensions for
	processing are 512x512, the region will be expanded to 128x128.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>get_default_height_width</name><anchor>diffusers.image_processor.VaeImageProcessor.get_default_height_width</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L566</source><parameters>[{"name": "image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor]"}, {"name": "height", "val": ": typing.Optional[int] = None"}, {"name": "width", "val": ": typing.Optional[int] = None"}]</parameters><paramsdesc>- image (`Union[PIL.Image.Image, np.ndarray, torch.Tensor]`) --
	The image input, which can be a PIL image, NumPy array, or PyTorch tensor. If it is a NumPy array, it
	should have shape `[batch, height, width]` or `[batch, height, width, channels]`. If it is a PyTorch
	tensor, it should have shape `[batch, channels, height, width]`.
	- height (`Optional[int]`, optional, defaults to `None`) --
	The height of the preprocessed image. If `None`, the height of the `image` input will be used.
	- width (`Optional[int]`, optional, defaults to `None`) --
	The width of the preprocessed image. If `None`, the width of the `image` input will be used.</paramsdesc><paramgroups>0</paramgroups><rettype>`Tuple[int, int]`</rettype><retdesc>A tuple containing the height and width, both resized to the nearest integer multiple of
	`vae_scale_factor`.</retdesc></docstring>

	Returns the height and width of the image, downscaled to the next integer multiple of `vae_scale_factor`.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>normalize</name><anchor>diffusers.image_processor.VaeImageProcessor.normalize</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L212</source><parameters>[{"name": "images", "val": ": typing.Union[numpy.ndarray, torch.Tensor]"}]</parameters><paramsdesc>- images (`np.ndarray` or `torch.Tensor`) --
	The image array to normalize.</paramsdesc><paramgroups>0</paramgroups><rettype>`np.ndarray` or `torch.Tensor`</rettype><retdesc>The normalized image array.</retdesc></docstring>

	Normalize an image array to [-1,1].








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>numpy_to_pil</name><anchor>diffusers.image_processor.VaeImageProcessor.numpy_to_pil</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L133</source><parameters>[{"name": "images", "val": ": ndarray"}]</parameters><paramsdesc>- images (`np.ndarray`) --
	The image array to convert to PIL format.</paramsdesc><paramgroups>0</paramgroups><rettype>`List[PIL.Image.Image]`</rettype><retdesc>A list of PIL images.</retdesc></docstring>

	Convert a numpy image or a batch of images to a PIL image.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>numpy_to_pt</name><anchor>diffusers.image_processor.VaeImageProcessor.numpy_to_pt</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L177</source><parameters>[{"name": "images", "val": ": ndarray"}]</parameters><paramsdesc>- images (`np.ndarray`) --
	The NumPy image array to convert to PyTorch format.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>A PyTorch tensor representation of the images.</retdesc></docstring>

	Convert a NumPy image to a PyTorch tensor.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>pil_to_numpy</name><anchor>diffusers.image_processor.VaeImageProcessor.pil_to_numpy</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L157</source><parameters>[{"name": "images", "val": ": typing.Union[typing.List[PIL.Image.Image], PIL.Image.Image]"}]</parameters><paramsdesc>- images (`PIL.Image.Image` or `List[PIL.Image.Image]`) --
	The PIL image or list of images to convert to NumPy format.</paramsdesc><paramgroups>0</paramgroups><rettype>`np.ndarray`</rettype><retdesc>A NumPy array representation of the images.</retdesc></docstring>

	Convert a PIL image or a list of PIL images to NumPy arrays.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>postprocess</name><anchor>diffusers.image_processor.VaeImageProcessor.postprocess</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L744</source><parameters>[{"name": "image", "val": ": Tensor"}, {"name": "output_type", "val": ": str = 'pil'"}, {"name": "do_denormalize", "val": ": typing.Optional[typing.List[bool]] = None"}]</parameters><paramsdesc>- image (`torch.Tensor`) --
	The image input, should be a pytorch tensor with shape `B x C x H x W`.
	- output_type (`str`, optional, defaults to `pil`) --
	The output type of the image, can be one of `pil`, `np`, `pt`, `latent`.
	- do_denormalize (`List[bool]`, optional, defaults to `None`) --
	Whether to denormalize the image to [0,1]. If `None`, will use the value of `do_normalize` in the
	`VaeImageProcessor` config.</paramsdesc><paramgroups>0</paramgroups><rettype>`PIL.Image.Image`, `np.ndarray` or `torch.Tensor`</rettype><retdesc>The postprocessed image.</retdesc></docstring>

	Postprocess the image output from tensor to `output_type`.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>preprocess</name><anchor>diffusers.image_processor.VaeImageProcessor.preprocess</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L613</source><parameters>[{"name": "image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]]"}, {"name": "height", "val": ": typing.Optional[int] = None"}, {"name": "width", "val": ": typing.Optional[int] = None"}, {"name": "resize_mode", "val": ": str = 'default'"}, {"name": "crops_coords", "val": ": typing.Optional[typing.Tuple[int, int, int, int]] = None"}]</parameters><paramsdesc>- image (`PipelineImageInput`) --
	The image input, accepted formats are PIL images, NumPy arrays, PyTorch tensors; Also accept list of
	supported formats.
	- height (`int`, optional) --
	The height in preprocessed image. If `None`, will use the `get_default_height_width()` to get default
	height.
	- width (`int`, optional) --
	The width in preprocessed. If `None`, will use get_default_height_width()` to get the default width.
	- resize_mode (`str`, optional, defaults to `default`) --
	The resize mode, can be one of `default` or `fill`. If `default`, will resize the image to fit within
	the specified width and height, and it may not maintaining the original aspect ratio. If `fill`, will
	resize the image to fit within the specified width and height, maintaining the aspect ratio, and then
	center the image within the dimensions, filling empty with data from image. If `crop`, will resize the
	image to fit within the specified width and height, maintaining the aspect ratio, and then center the
	image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only
	supported for PIL image input.
	- crops_coords (`List[Tuple[int, int, int, int]]`, optional, defaults to `None`) --
	The crop coordinates for each image in the batch. If `None`, will not crop the image.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>The preprocessed image.</retdesc></docstring>

	Preprocess the image input.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>pt_to_numpy</name><anchor>diffusers.image_processor.VaeImageProcessor.pt_to_numpy</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L196</source><parameters>[{"name": "images", "val": ": Tensor"}]</parameters><paramsdesc>- images (`torch.Tensor`) --
	The PyTorch tensor to convert to NumPy format.</paramsdesc><paramgroups>0</paramgroups><rettype>`np.ndarray`</rettype><retdesc>A NumPy array representation of the images.</retdesc></docstring>

	Convert a PyTorch tensor to a NumPy image.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>resize</name><anchor>diffusers.image_processor.VaeImageProcessor.resize</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L468</source><parameters>[{"name": "image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor]"}, {"name": "height", "val": ": int"}, {"name": "width", "val": ": int"}, {"name": "resize_mode", "val": ": str = 'default'"}]</parameters><paramsdesc>- image (`PIL.Image.Image`, `np.ndarray` or `torch.Tensor`) --
	The image input, can be a PIL image, numpy array or pytorch tensor.
	- height (`int`) --
	The height to resize to.
	- width (`int`) --
	The width to resize to.
	- resize_mode (`str`, optional, defaults to `default`) --
	The resize mode to use, can be one of `default` or `fill`. If `default`, will resize the image to fit
	within the specified width and height, and it may not maintaining the original aspect ratio. If `fill`,
	will resize the image to fit within the specified width and height, maintaining the aspect ratio, and
	then center the image within the dimensions, filling empty with data from image. If `crop`, will resize
	the image to fit within the specified width and height, maintaining the aspect ratio, and then center
	the image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only
	supported for PIL image input.</paramsdesc><paramgroups>0</paramgroups><rettype>`PIL.Image.Image`, `np.ndarray` or `torch.Tensor`</rettype><retdesc>The resized image.</retdesc></docstring>

	Resize image.








	</div></div>

	## InpaintProcessor[[diffusers.image_processor.InpaintProcessor]]

	The `InpaintProcessor` accepts `mask` and `image` inputs and process them together. Optionally, it can accept padding_mask_crop and apply mask overlay.

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.image_processor.InpaintProcessor</name><anchor>diffusers.image_processor.InpaintProcessor</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L842</source><parameters>[{"name": "do_resize", "val": ": bool = True"}, {"name": "vae_scale_factor", "val": ": int = 8"}, {"name": "vae_latent_channels", "val": ": int = 4"}, {"name": "resample", "val": ": str = 'lanczos'"}, {"name": "reducing_gap", "val": ": int = None"}, {"name": "do_normalize", "val": ": bool = True"}, {"name": "do_binarize", "val": ": bool = False"}, {"name": "do_convert_grayscale", "val": ": bool = False"}, {"name": "mask_do_normalize", "val": ": bool = False"}, {"name": "mask_do_binarize", "val": ": bool = True"}, {"name": "mask_do_convert_grayscale", "val": ": bool = True"}]</parameters></docstring>

	Image processor for inpainting image and mask.



	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>postprocess</name><anchor>diffusers.image_processor.InpaintProcessor.postprocess</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L943</source><parameters>[{"name": "image", "val": ": Tensor"}, {"name": "output_type", "val": ": str = 'pil'"}, {"name": "original_image", "val": ": typing.Optional[PIL.Image.Image] = None"}, {"name": "original_mask", "val": ": typing.Optional[PIL.Image.Image] = None"}, {"name": "crops_coords", "val": ": typing.Optional[typing.Tuple[int, int, int, int]] = None"}]</parameters></docstring>

	Postprocess the image, optionally apply mask overlay


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>preprocess</name><anchor>diffusers.image_processor.InpaintProcessor.preprocess</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L887</source><parameters>[{"name": "image", "val": ": Image"}, {"name": "mask", "val": ": Image = None"}, {"name": "height", "val": ": int = None"}, {"name": "width", "val": ": int = None"}, {"name": "padding_mask_crop", "val": ": typing.Optional[int] = None"}]</parameters></docstring>

	Preprocess the image and mask.


	</div></div>

	## VaeImageProcessorLDM3D[[diffusers.image_processor.VaeImageProcessorLDM3D]]

	The `VaeImageProcessorLDM3D` accepts RGB and depth inputs and returns RGB and depth outputs.

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.image_processor.VaeImageProcessorLDM3D</name><anchor>diffusers.image_processor.VaeImageProcessorLDM3D</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L973</source><parameters>[{"name": "do_resize", "val": ": bool = True"}, {"name": "vae_scale_factor", "val": ": int = 8"}, {"name": "resample", "val": ": str = 'lanczos'"}, {"name": "do_normalize", "val": ": bool = True"}]</parameters><paramsdesc>- do_resize (`bool`, optional, defaults to `True`) --
	Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`.
	- vae_scale_factor (`int`, optional, defaults to `8`) --
	VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor.
	- resample (`str`, optional, defaults to `lanczos`) --
	Resampling filter to use when resizing the image.
	- do_normalize (`bool`, optional, defaults to `True`) --
	Whether to normalize the image to [-1,1].</paramsdesc><paramgroups>0</paramgroups></docstring>

	Image processor for VAE LDM3D.





	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>depth_pil_to_numpy</name><anchor>diffusers.image_processor.VaeImageProcessorLDM3D.depth_pil_to_numpy</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L1024</source><parameters>[{"name": "images", "val": ": typing.Union[typing.List[PIL.Image.Image], PIL.Image.Image]"}]</parameters><paramsdesc>- images (`Union[List[PIL.Image.Image], PIL.Image.Image]`) --
	The input image or list of images to be converted.</paramsdesc><paramgroups>0</paramgroups><rettype>`np.ndarray`</rettype><retdesc>A NumPy array of the converted images.</retdesc></docstring>

	Convert a PIL image or a list of PIL images to NumPy arrays.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>numpy_to_depth</name><anchor>diffusers.image_processor.VaeImageProcessorLDM3D.numpy_to_depth</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L1082</source><parameters>[{"name": "images", "val": ": ndarray"}]</parameters><paramsdesc>- images (`np.ndarray`) --
	The input NumPy array of depth images, which can be a single image or a batch.</paramsdesc><paramgroups>0</paramgroups><rettype>`List[PIL.Image.Image]`</rettype><retdesc>A list of PIL images converted from the input NumPy depth images.</retdesc></docstring>

	Convert a NumPy depth image or a batch of images to a list of PIL images.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>numpy_to_pil</name><anchor>diffusers.image_processor.VaeImageProcessorLDM3D.numpy_to_pil</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L1000</source><parameters>[{"name": "images", "val": ": ndarray"}]</parameters><paramsdesc>- images (`np.ndarray`) --
	The input NumPy array of images, which can be a single image or a batch.</paramsdesc><paramgroups>0</paramgroups><rettype>`List[PIL.Image.Image]`</rettype><retdesc>A list of PIL images converted from the input NumPy array.</retdesc></docstring>

	Convert a NumPy image or a batch of images to a list of PIL images.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>preprocess</name><anchor>diffusers.image_processor.VaeImageProcessorLDM3D.preprocess</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L1160</source><parameters>[{"name": "rgb", "val": ": typing.Union[torch.Tensor, PIL.Image.Image, numpy.ndarray]"}, {"name": "depth", "val": ": typing.Union[torch.Tensor, PIL.Image.Image, numpy.ndarray]"}, {"name": "height", "val": ": typing.Optional[int] = None"}, {"name": "width", "val": ": typing.Optional[int] = None"}, {"name": "target_res", "val": ": typing.Optional[int] = None"}]</parameters><paramsdesc>- rgb (`Union[torch.Tensor, PIL.Image.Image, np.ndarray]`) --
	The RGB input image, which can be a single image or a batch.
	- depth (`Union[torch.Tensor, PIL.Image.Image, np.ndarray]`) --
	The depth input image, which can be a single image or a batch.
	- height (`Optional[int]`, optional, defaults to `None`) --
	The desired height of the processed image. If `None`, defaults to the height of the input image.
	- width (`Optional[int]`, optional, defaults to `None`) --
	The desired width of the processed image. If `None`, defaults to the width of the input image.
	- target_res (`Optional[int]`, optional, defaults to `None`) --
	Target resolution for resizing the images. If specified, overrides height and width.</paramsdesc><paramgroups>0</paramgroups><rettype>`Tuple[torch.Tensor, torch.Tensor]`</rettype><retdesc>A tuple containing the processed RGB and depth images as PyTorch tensors.</retdesc></docstring>

	Preprocess the image input. Accepted formats are PIL images, NumPy arrays, or PyTorch tensors.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>rgblike_to_depthmap</name><anchor>diffusers.image_processor.VaeImageProcessorLDM3D.rgblike_to_depthmap</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L1044</source><parameters>[{"name": "image", "val": ": typing.Union[numpy.ndarray, torch.Tensor]"}]</parameters></docstring>

	Convert an RGB-like depth image to a depth map.


	</div></div>

	## PixArtImageProcessor[[diffusers.image_processor.PixArtImageProcessor]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.image_processor.PixArtImageProcessor</name><anchor>diffusers.image_processor.PixArtImageProcessor</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L1380</source><parameters>[{"name": "do_resize", "val": ": bool = True"}, {"name": "vae_scale_factor", "val": ": int = 8"}, {"name": "resample", "val": ": str = 'lanczos'"}, {"name": "do_normalize", "val": ": bool = True"}, {"name": "do_binarize", "val": ": bool = False"}, {"name": "do_convert_grayscale", "val": ": bool = False"}]</parameters><paramsdesc>- do_resize (`bool`, optional, defaults to `True`) --
	Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. Can accept
	`height` and `width` arguments from [image_processor.VaeImageProcessor.preprocess()](/docs/diffusers/pr_12595/en/api/image_processor#diffusers.image_processor.VaeImageProcessor.preprocess) method.
	- vae_scale_factor (`int`, optional, defaults to `8`) --
	VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor.
	- resample (`str`, optional, defaults to `lanczos`) --
	Resampling filter to use when resizing the image.
	- do_normalize (`bool`, optional, defaults to `True`) --
	Whether to normalize the image to [-1,1].
	- do_binarize (`bool`, optional, defaults to `False`) --
	Whether to binarize the image to 0/1.
	- do_convert_rgb (`bool`, optional, defaults to be `False`) --
	Whether to convert the images to RGB format.
	- do_convert_grayscale (`bool`, optional, defaults to be `False`) --
	Whether to convert the images to grayscale format.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Image processor for PixArt image resize and crop.





	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>classify_height_width_bin</name><anchor>diffusers.image_processor.PixArtImageProcessor.classify_height_width_bin</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L1421</source><parameters>[{"name": "height", "val": ": int"}, {"name": "width", "val": ": int"}, {"name": "ratios", "val": ": dict"}]</parameters><paramsdesc>- height (`int`) -- The height of the image.
	- width (`int`) -- The width of the image.
	- ratios (`dict`) -- A dictionary where keys are aspect ratios and values are tuples of (height, width).</paramsdesc><paramgroups>0</paramgroups><rettype>`Tuple[int, int]`</rettype><retdesc>The closest binned height and width.</retdesc></docstring>

	Returns the binned height and width based on the aspect ratio.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>resize_and_crop_tensor</name><anchor>diffusers.image_processor.PixArtImageProcessor.resize_and_crop_tensor</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L1439</source><parameters>[{"name": "samples", "val": ": Tensor"}, {"name": "new_width", "val": ": int"}, {"name": "new_height", "val": ": int"}]</parameters><paramsdesc>- samples (`torch.Tensor`) --
	A tensor of shape (N, C, H, W) where N is the batch size, C is the number of channels, H is the height,
	and W is the width.
	- new_width (`int`) -- The desired width of the output images.
	- new_height (`int`) -- The desired height of the output images.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>A tensor containing the resized and cropped images.</retdesc></docstring>

	Resizes and crops a tensor of images to the specified dimensions.








	</div></div>

	## IPAdapterMaskProcessor[[diffusers.image_processor.IPAdapterMaskProcessor]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.image_processor.IPAdapterMaskProcessor</name><anchor>diffusers.image_processor.IPAdapterMaskProcessor</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L1276</source><parameters>[{"name": "do_resize", "val": ": bool = True"}, {"name": "vae_scale_factor", "val": ": int = 8"}, {"name": "resample", "val": ": str = 'lanczos'"}, {"name": "do_normalize", "val": ": bool = False"}, {"name": "do_binarize", "val": ": bool = True"}, {"name": "do_convert_grayscale", "val": ": bool = True"}]</parameters><paramsdesc>- do_resize (`bool`, optional, defaults to `True`) --
	Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`.
	- vae_scale_factor (`int`, optional, defaults to `8`) --
	VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor.
	- resample (`str`, optional, defaults to `lanczos`) --
	Resampling filter to use when resizing the image.
	- do_normalize (`bool`, optional, defaults to `False`) --
	Whether to normalize the image to [-1,1].
	- do_binarize (`bool`, optional, defaults to `True`) --
	Whether to binarize the image to 0/1.
	- do_convert_grayscale (`bool`, optional, defaults to be `True`) --
	Whether to convert the images to grayscale format.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Image processor for IP Adapter image masks.





	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>downsample</name><anchor>diffusers.image_processor.IPAdapterMaskProcessor.downsample</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/image_processor.py#L1317</source><parameters>[{"name": "mask", "val": ": Tensor"}, {"name": "batch_size", "val": ": int"}, {"name": "num_queries", "val": ": int"}, {"name": "value_embed_dim", "val": ": int"}]</parameters><paramsdesc>- mask (`torch.Tensor`) --
	The input mask tensor generated with `IPAdapterMaskProcessor.preprocess()`.
	- batch_size (`int`) --
	The batch size.
	- num_queries (`int`) --
	The number of queries.
	- value_embed_dim (`int`) --
	The dimensionality of the value embeddings.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>The downsampled mask tensor.</retdesc></docstring>

	Downsamples the provided mask tensor to match the expected dimensions for scaled dot-product attention. If the
	aspect ratio of the mask does not match the aspect ratio of the output image, a warning is issued.








	</div></div>

	<EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/image_processor.md" />

Xet Storage Details

Size:: 35 kB
Xet hash:: 80307f772bd637fa955f2e029bf83bd5e6bf270fc891b8b55d9e9d48c10b3e8f

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.