Buckets:

hf-doc-build/doc-dev / diffusers /pr_12652 /en /api /models /autoencoder_oobleck.md
rtrm's picture
|
download
raw
5.39 kB
# AutoencoderOobleck
The Oobleck variational autoencoder (VAE) model with KL loss was introduced in [Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools) and [Stable Audio Open](https://huggingface.co/papers/2407.14358) by Stability AI. The model is used in 🤗 Diffusers to encode audio waveforms into latents and to decode latent representations into audio waveforms.
The abstract from the paper is:
*Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.*
## AutoencoderOobleck[[diffusers.AutoencoderOobleck]]
#### diffusers.AutoencoderOobleck[[diffusers.AutoencoderOobleck]]
[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L294)
An autoencoder for encoding waveforms into latents and decoding latent representations into waveforms. First
introduced in Stable Audio.
This model inherits from [ModelMixin](/docs/diffusers/pr_12652/en/api/models/overview#diffusers.ModelMixin). Check the superclass documentation for it's generic methods implemented
for all models (such as downloading or saving).
wrapperdiffusers.AutoencoderOobleck.decodehttps://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/utils/accelerate_utils.py#L43[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]
**Parameters:**
encoder_hidden_size (`int`, *optional*, defaults to 128) : Intermediate representation dimension for the encoder.
downsampling_ratios (`list[int]`, *optional*, defaults to `[2, 4, 4, 8, 8]`) : Ratios for downsampling in the encoder. These are used in reverse order for upsampling in the decoder.
channel_multiples (`list[int]`, *optional*, defaults to `[1, 2, 4, 8, 16]`) : Multiples used to determine the hidden sizes of the hidden layers.
decoder_channels (`int`, *optional*, defaults to 128) : Intermediate representation dimension for the decoder.
decoder_input_channels (`int`, *optional*, defaults to 64) : Input dimension for the decoder. Corresponds to the latent dimension.
audio_channels (`int`, *optional*, defaults to 2) : Number of channels in the audio data. Either 1 for mono or 2 for stereo.
sampling_rate (`int`, *optional*, defaults to 44100) : The sampling rate at which the audio waveform should be digitalized expressed in hertz (Hz).
#### wrapper[[diffusers.AutoencoderOobleck.encode]]
[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/utils/accelerate_utils.py#L43)
#### forward[[diffusers.AutoencoderOobleck.forward]]
[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L425)
**Parameters:**
sample (`torch.Tensor`) : Input sample.
sample_posterior (`bool`, *optional*, defaults to `False`) : Whether to sample from the posterior.
return_dict (`bool`, *optional*, defaults to `True`) : Whether or not to return a `OobleckDecoderOutput` instead of a plain tuple.
## OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]]
#### diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]]
[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L202)
Output of decoding method.
**Parameters:**
sample (`torch.Tensor` of shape `(batch_size, audio_channels, sequence_length)`) : The decoded output sample from the last layer of the model.
## OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]]
#### diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]]
[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L202)
Output of decoding method.
**Parameters:**
sample (`torch.Tensor` of shape `(batch_size, audio_channels, sequence_length)`) : The decoded output sample from the last layer of the model.
## AutoencoderOobleckOutput[[diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput]]
#### diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput[[diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput]]
[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L187)
Output of AutoencoderOobleck encoding method.
**Parameters:**
latent_dist (`OobleckDiagonalGaussianDistribution`) : Encoded outputs of `Encoder` represented as the mean and standard deviation of `OobleckDiagonalGaussianDistribution`. `OobleckDiagonalGaussianDistribution` allows for sampling latents from the distribution.

Xet Storage Details

Size:
5.39 kB
·
Xet hash:
68464198ecf1137d7c6a944e367e35179cbea8440db13220d1414c32c9104dee

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.