Buckets:
| # Tiny AutoEncoder | |
| Tiny AutoEncoder for Stable Diffusion (TAESD) was introduced in [madebyollin/taesd](https://github.com/madebyollin/taesd) by Ollin Boer Bohan. It is a tiny distilled version of Stable Diffusion's VAE that can quickly decode the latents in a [StableDiffusionPipeline](/docs/diffusers/pr_12229/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline) or [StableDiffusionXLPipeline](/docs/diffusers/pr_12229/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLPipeline) almost instantly. | |
| To use with Stable Diffusion v-2.1: | |
| ```python | |
| import torch | |
| from diffusers import DiffusionPipeline, AutoencoderTiny | |
| pipe = DiffusionPipeline.from_pretrained( | |
| "stabilityai/stable-diffusion-2-1-base", torch_dtype=torch.float16 | |
| ) | |
| pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesd", torch_dtype=torch.float16) | |
| pipe = pipe.to("cuda") | |
| prompt = "slice of delicious New York-style berry cheesecake" | |
| image = pipe(prompt, num_inference_steps=25).images[0] | |
| image | |
| ``` | |
| To use with Stable Diffusion XL 1.0 | |
| ```python | |
| import torch | |
| from diffusers import DiffusionPipeline, AutoencoderTiny | |
| pipe = DiffusionPipeline.from_pretrained( | |
| "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 | |
| ) | |
| pipe.vae = AutoencoderTiny.from_pretrained("madebyollin/taesdxl", torch_dtype=torch.float16) | |
| pipe = pipe.to("cuda") | |
| prompt = "slice of delicious New York-style berry cheesecake" | |
| image = pipe(prompt, num_inference_steps=25).images[0] | |
| image | |
| ``` | |
| ## AutoencoderTiny[[diffusers.AutoencoderTiny]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.AutoencoderTiny</name><anchor>diffusers.AutoencoderTiny</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_tiny.py#L41</source><parameters>[{"name": "in_channels", "val": ": int = 3"}, {"name": "out_channels", "val": ": int = 3"}, {"name": "encoder_block_out_channels", "val": ": typing.Tuple[int, ...] = (64, 64, 64, 64)"}, {"name": "decoder_block_out_channels", "val": ": typing.Tuple[int, ...] = (64, 64, 64, 64)"}, {"name": "act_fn", "val": ": str = 'relu'"}, {"name": "upsample_fn", "val": ": str = 'nearest'"}, {"name": "latent_channels", "val": ": int = 4"}, {"name": "upsampling_scaling_factor", "val": ": int = 2"}, {"name": "num_encoder_blocks", "val": ": typing.Tuple[int, ...] = (1, 3, 3, 3)"}, {"name": "num_decoder_blocks", "val": ": typing.Tuple[int, ...] = (3, 3, 3, 1)"}, {"name": "latent_magnitude", "val": ": int = 3"}, {"name": "latent_shift", "val": ": float = 0.5"}, {"name": "force_upcast", "val": ": bool = False"}, {"name": "scaling_factor", "val": ": float = 1.0"}, {"name": "shift_factor", "val": ": float = 0.0"}]</parameters><paramsdesc>- **in_channels** (`int`, *optional*, defaults to 3) -- Number of channels in the input image. | |
| - **out_channels** (`int`, *optional*, defaults to 3) -- Number of channels in the output. | |
| - **encoder_block_out_channels** (`Tuple[int]`, *optional*, defaults to `(64, 64, 64, 64)`) -- | |
| Tuple of integers representing the number of output channels for each encoder block. The length of the | |
| tuple should be equal to the number of encoder blocks. | |
| - **decoder_block_out_channels** (`Tuple[int]`, *optional*, defaults to `(64, 64, 64, 64)`) -- | |
| Tuple of integers representing the number of output channels for each decoder block. The length of the | |
| tuple should be equal to the number of decoder blocks. | |
| - **act_fn** (`str`, *optional*, defaults to `"relu"`) -- | |
| Activation function to be used throughout the model. | |
| - **latent_channels** (`int`, *optional*, defaults to 4) -- | |
| Number of channels in the latent representation. The latent space acts as a compressed representation of | |
| the input image. | |
| - **upsampling_scaling_factor** (`int`, *optional*, defaults to 2) -- | |
| Scaling factor for upsampling in the decoder. It determines the size of the output image during the | |
| upsampling process. | |
| - **num_encoder_blocks** (`Tuple[int]`, *optional*, defaults to `(1, 3, 3, 3)`) -- | |
| Tuple of integers representing the number of encoder blocks at each stage of the encoding process. The | |
| length of the tuple should be equal to the number of stages in the encoder. Each stage has a different | |
| number of encoder blocks. | |
| - **num_decoder_blocks** (`Tuple[int]`, *optional*, defaults to `(3, 3, 3, 1)`) -- | |
| Tuple of integers representing the number of decoder blocks at each stage of the decoding process. The | |
| length of the tuple should be equal to the number of stages in the decoder. Each stage has a different | |
| number of decoder blocks. | |
| - **latent_magnitude** (`float`, *optional*, defaults to 3.0) -- | |
| Magnitude of the latent representation. This parameter scales the latent representation values to control | |
| the extent of information preservation. | |
| - **latent_shift** (float, *optional*, defaults to 0.5) -- | |
| Shift applied to the latent representation. This parameter controls the center of the latent space. | |
| - **scaling_factor** (`float`, *optional*, defaults to 1.0) -- | |
| The component-wise standard deviation of the trained latent space computed using the first batch of the | |
| training set. This is used to scale the latent space to have unit variance when training the diffusion | |
| model. The latents are scaled with the formula `z = z * scaling_factor` before being passed to the | |
| diffusion model. When decoding, the latents are scaled back to the original scale with the formula: `z = 1 | |
| / scaling_factor * z`. For more details, refer to sections 4.3.2 and D.1 of the [High-Resolution Image | |
| Synthesis with Latent Diffusion Models](https://huggingface.co/papers/2112.10752) paper. For this | |
| Autoencoder, however, no such scaling factor was used, hence the value of 1.0 as the default. | |
| - **force_upcast** (`bool`, *optional*, default to `False`) -- | |
| If enabled it will force the VAE to run in float32 for high image resolution pipelines, such as SD-XL. VAE | |
| can be fine-tuned / trained to a lower range without losing too much precision, in which case | |
| `force_upcast` can be set to `False` (see this fp16-friendly | |
| [AutoEncoder](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)).</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| A tiny distilled VAE model for encoding images into latents and decoding latent representations into images. | |
| [AutoencoderTiny](/docs/diffusers/pr_12229/en/api/models/autoencoder_tiny#diffusers.AutoencoderTiny) is a wrapper around the original implementation of `TAESD`. | |
| This model inherits from [ModelMixin](/docs/diffusers/pr_12229/en/api/models/overview#diffusers.ModelMixin). Check the superclass documentation for its generic methods implemented for | |
| all models (such as downloading or saving). | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_slicing</name><anchor>diffusers.AutoencoderTiny.disable_slicing</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_tiny.py#L172</source><parameters>[]</parameters></docstring> | |
| Disable sliced VAE decoding. If `enable_slicing` was previously enabled, this method will go back to computing | |
| decoding in one step. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_tiling</name><anchor>diffusers.AutoencoderTiny.disable_tiling</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_tiny.py#L187</source><parameters>[]</parameters></docstring> | |
| Disable tiled VAE decoding. If `enable_tiling` was previously enabled, this method will go back to computing | |
| decoding in one step. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_slicing</name><anchor>diffusers.AutoencoderTiny.enable_slicing</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_tiny.py#L165</source><parameters>[]</parameters></docstring> | |
| Enable sliced VAE decoding. When this option is enabled, the VAE will split the input tensor in slices to | |
| compute decoding in several steps. This is useful to save some memory and allow larger batch sizes. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_tiling</name><anchor>diffusers.AutoencoderTiny.enable_tiling</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_tiny.py#L179</source><parameters>[{"name": "use_tiling", "val": ": bool = True"}]</parameters></docstring> | |
| Enable tiled VAE decoding. When this option is enabled, the VAE will split the input tensor into tiles to | |
| compute decoding and encoding in several steps. This is useful for saving a large amount of memory and to allow | |
| processing larger images. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>diffusers.AutoencoderTiny.forward</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_tiny.py#L321</source><parameters>[{"name": "sample", "val": ": Tensor"}, {"name": "return_dict", "val": ": bool = True"}]</parameters><paramsdesc>- **sample** (`torch.Tensor`) -- Input sample. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a `DecoderOutput` instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>scale_latents</name><anchor>diffusers.AutoencoderTiny.scale_latents</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_tiny.py#L157</source><parameters>[{"name": "x", "val": ": Tensor"}]</parameters></docstring> | |
| raw latents -> [0, 1] | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>unscale_latents</name><anchor>diffusers.AutoencoderTiny.unscale_latents</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_tiny.py#L161</source><parameters>[{"name": "x", "val": ": Tensor"}]</parameters></docstring> | |
| [0, 1] -> raw latents | |
| </div></div> | |
| ## AutoencoderTinyOutput[[diffusers.models.autoencoders.autoencoder_tiny.AutoencoderTinyOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.models.autoencoders.autoencoder_tiny.AutoencoderTinyOutput</name><anchor>diffusers.models.autoencoders.autoencoder_tiny.AutoencoderTinyOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_tiny.py#L29</source><parameters>[{"name": "latents", "val": ": Tensor"}]</parameters><paramsdesc>- **latents** (`torch.Tensor`) -- Encoded outputs of the `Encoder`.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output of AutoencoderTiny encoding method. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/models/autoencoder_tiny.md" /> |
Xet Storage Details
- Size:
- 11.5 kB
- Xet hash:
- 9d7f4394296654a506a20ceb12989367b4f78e70b6b4441e235fc36ee2c638a0
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.