Buckets:
| # AutoencoderOobleck | |
| The Oobleck variational autoencoder (VAE) model with KL loss was introduced in [Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools) and [Stable Audio Open](https://huggingface.co/papers/2407.14358) by Stability AI. The model is used in 🤗 Diffusers to encode audio waveforms into latents and to decode latent representations into audio waveforms. | |
| The abstract from the paper is: | |
| *Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.* | |
| ## AutoencoderOobleck[[diffusers.AutoencoderOobleck]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.AutoencoderOobleck</name><anchor>diffusers.AutoencoderOobleck</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L294</source><parameters>[{"name": "encoder_hidden_size", "val": " = 128"}, {"name": "downsampling_ratios", "val": " = [2, 4, 4, 8, 8]"}, {"name": "channel_multiples", "val": " = [1, 2, 4, 8, 16]"}, {"name": "decoder_channels", "val": " = 128"}, {"name": "decoder_input_channels", "val": " = 64"}, {"name": "audio_channels", "val": " = 2"}, {"name": "sampling_rate", "val": " = 44100"}]</parameters><paramsdesc>- **encoder_hidden_size** (`int`, *optional*, defaults to 128) -- | |
| Intermediate representation dimension for the encoder. | |
| - **downsampling_ratios** (`List[int]`, *optional*, defaults to `[2, 4, 4, 8, 8]`) -- | |
| Ratios for downsampling in the encoder. These are used in reverse order for upsampling in the decoder. | |
| - **channel_multiples** (`List[int]`, *optional*, defaults to `[1, 2, 4, 8, 16]`) -- | |
| Multiples used to determine the hidden sizes of the hidden layers. | |
| - **decoder_channels** (`int`, *optional*, defaults to 128) -- | |
| Intermediate representation dimension for the decoder. | |
| - **decoder_input_channels** (`int`, *optional*, defaults to 64) -- | |
| Input dimension for the decoder. Corresponds to the latent dimension. | |
| - **audio_channels** (`int`, *optional*, defaults to 2) -- | |
| Number of channels in the audio data. Either 1 for mono or 2 for stereo. | |
| - **sampling_rate** (`int`, *optional*, defaults to 44100) -- | |
| The sampling rate at which the audio waveform should be digitalized expressed in hertz (Hz).</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| An autoencoder for encoding waveforms into latents and decoding latent representations into waveforms. First | |
| introduced in Stable Audio. | |
| This model inherits from [ModelMixin](/docs/diffusers/pr_12509/en/api/models/overview#diffusers.ModelMixin). Check the superclass documentation for it's generic methods implemented | |
| for all models (such as downloading or saving). | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>wrapper</name><anchor>diffusers.AutoencoderOobleck.decode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/utils/accelerate_utils.py#L43</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>wrapper</name><anchor>diffusers.AutoencoderOobleck.encode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/utils/accelerate_utils.py#L43</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_slicing</name><anchor>diffusers.AutoencoderOobleck.disable_slicing</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L366</source><parameters>[]</parameters></docstring> | |
| Disable sliced VAE decoding. If `enable_slicing` was previously enabled, this method will go back to computing | |
| decoding in one step. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_slicing</name><anchor>diffusers.AutoencoderOobleck.enable_slicing</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L359</source><parameters>[]</parameters></docstring> | |
| Enable sliced VAE decoding. When this option is enabled, the VAE will split the input tensor in slices to | |
| compute decoding in several steps. This is useful to save some memory and allow larger batch sizes. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>diffusers.AutoencoderOobleck.forward</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L439</source><parameters>[{"name": "sample", "val": ": Tensor"}, {"name": "sample_posterior", "val": ": bool = False"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}]</parameters><paramsdesc>- **sample** (`torch.Tensor`) -- Input sample. | |
| - **sample_posterior** (`bool`, *optional*, defaults to `False`) -- | |
| Whether to sample from the posterior. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a `OobleckDecoderOutput` instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| </div></div> | |
| ## OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput</name><anchor>diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L202</source><parameters>[{"name": "sample", "val": ": Tensor"}]</parameters><paramsdesc>- **sample** (`torch.Tensor` of shape `(batch_size, audio_channels, sequence_length)`) -- | |
| The decoded output sample from the last layer of the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output of decoding method. | |
| </div> | |
| ## OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput</name><anchor>diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L202</source><parameters>[{"name": "sample", "val": ": Tensor"}]</parameters><paramsdesc>- **sample** (`torch.Tensor` of shape `(batch_size, audio_channels, sequence_length)`) -- | |
| The decoded output sample from the last layer of the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output of decoding method. | |
| </div> | |
| ## AutoencoderOobleckOutput[[diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput</name><anchor>diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12509/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L187</source><parameters>[{"name": "latent_dist", "val": ": OobleckDiagonalGaussianDistribution"}]</parameters><paramsdesc>- **latent_dist** (`OobleckDiagonalGaussianDistribution`) -- | |
| Encoded outputs of `Encoder` represented as the mean and standard deviation of | |
| `OobleckDiagonalGaussianDistribution`. `OobleckDiagonalGaussianDistribution` allows for sampling latents | |
| from the distribution.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output of AutoencoderOobleck encoding method. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/models/autoencoder_oobleck.md" /> |
Xet Storage Details
- Size:
- 9.03 kB
- Xet hash:
- ee545c6598b0ef036ac4e224b74b5f00deb6c23cfcdc8c6032e74e7caff07555
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.