Buckets:
| # AutoencoderKLWan | |
| The 3D variational autoencoder (VAE) model with KL loss used in [Wan 2.1](https://github.com/Wan-Video/Wan2.1) by the Alibaba Wan Team. | |
| The model can be loaded with the following code snippet. | |
| ```python | |
| from diffusers import AutoencoderKLWan | |
| vae = AutoencoderKLWan.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", subfolder="vae", torch_dtype=torch.float32) | |
| ``` | |
| ## AutoencoderKLWan[[diffusers.AutoencoderKLWan]] | |
| #### diffusers.AutoencoderKLWan[[diffusers.AutoencoderKLWan]] | |
| [Source](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/autoencoders/autoencoder_kl_wan.py#L960) | |
| A VAE model with KL loss for encoding videos into latents and decoding latent representations into videos. | |
| Introduced in [Wan 2.1]. | |
| This model inherits from [ModelMixin](/docs/diffusers/main/en/api/models/overview#diffusers.ModelMixin). Check the superclass documentation for it's generic methods implemented | |
| for all models (such as downloading or saving). | |
| wrapperdiffusers.AutoencoderKLWan.decodehttps://github.com/huggingface/diffusers/blob/main/src/diffusers/utils/accelerate_utils.py#L43[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}] | |
| #### enable_tiling[[diffusers.AutoencoderKLWan.enable_tiling]] | |
| [Source](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/autoencoders/autoencoder_kl_wan.py#L1093) | |
| Enable tiled VAE decoding. When this option is enabled, the VAE will split the input tensor into tiles to | |
| compute decoding and encoding in several steps. This is useful for saving a large amount of memory and to allow | |
| processing larger images. | |
| **Parameters:** | |
| tile_sample_min_height (`int`, *optional*) : The minimum height required for a sample to be separated into tiles across the height dimension. | |
| tile_sample_min_width (`int`, *optional*) : The minimum width required for a sample to be separated into tiles across the width dimension. | |
| tile_sample_stride_height (`int`, *optional*) : The minimum amount of overlap between two consecutive vertical tiles. This is to ensure that there are no tiling artifacts produced across the height dimension. | |
| tile_sample_stride_width (`int`, *optional*) : The stride between two consecutive horizontal tiles. This is to ensure that there are no tiling artifacts produced across the width dimension. | |
| #### forward[[diffusers.AutoencoderKLWan.forward]] | |
| [Source](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/autoencoders/autoencoder_kl_wan.py#L1409) | |
| **Parameters:** | |
| sample (`torch.Tensor`) : Input sample. | |
| return_dict (`bool`, *optional*, defaults to `True`) : Whether or not to return a `DecoderOutput` instead of a plain tuple. | |
| #### tiled_decode[[diffusers.AutoencoderKLWan.tiled_decode]] | |
| [Source](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/autoencoders/autoencoder_kl_wan.py#L1331) | |
| Decode a batch of images using a tiled decoder. | |
| **Parameters:** | |
| z (`torch.Tensor`) : Input batch of latent vectors. | |
| return_dict (`bool`, *optional*, defaults to `True`) : Whether or not to return a `~models.vae.DecoderOutput` instead of a plain tuple. | |
| **Returns:** | |
| ``~models.vae.DecoderOutput` or `tuple`` | |
| If return_dict is True, a `~models.vae.DecoderOutput` is returned, otherwise a plain `tuple` is | |
| returned. | |
| #### tiled_encode[[diffusers.AutoencoderKLWan.tiled_encode]] | |
| [Source](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/autoencoders/autoencoder_kl_wan.py#L1259) | |
| Encode a batch of images using a tiled encoder. | |
| **Parameters:** | |
| x (`torch.Tensor`) : Input batch of videos. | |
| **Returns:** | |
| ``torch.Tensor`` | |
| The latent representation of the encoded videos. | |
| ## DecoderOutput[[diffusers.models.autoencoders.vae.DecoderOutput]] | |
| #### diffusers.models.autoencoders.vae.DecoderOutput[[diffusers.models.autoencoders.vae.DecoderOutput]] | |
| [Source](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/autoencoders/vae.py#L46) | |
| Output of decoding method. | |
| **Parameters:** | |
| sample (`torch.Tensor` of shape `(batch_size, num_channels, height, width)`) : The decoded output sample from the last layer of the model. | |
Xet Storage Details
- Size:
- 4.12 kB
- Xet hash:
- 1b83e6579219901b6784a134316bbac02e6efb7b4ca3d84c067a2efd30d0b4bc
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.