Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / diffusers /pr_12652 /en /api /models /autoencoder_oobleck.md

rtrm

17 days ago

preview code

download

raw

5.39 kB

	# AutoencoderOobleck

	The Oobleck variational autoencoder (VAE) model with KL loss was introduced in [Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools) and [Stable Audio Open](https://huggingface.co/papers/2407.14358) by Stability AI. The model is used in 🤗 Diffusers to encode audio waveforms into latents and to decode latent representations into audio waveforms.

	The abstract from the paper is:

	Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

	## AutoencoderOobleck[[diffusers.AutoencoderOobleck]]

	#### diffusers.AutoencoderOobleck[[diffusers.AutoencoderOobleck]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L294)

	An autoencoder for encoding waveforms into latents and decoding latent representations into waveforms. First
	introduced in Stable Audio.

	This model inherits from [ModelMixin](/docs/diffusers/pr_12652/en/api/models/overview#diffusers.ModelMixin). Check the superclass documentation for it's generic methods implemented
	for all models (such as downloading or saving).

	wrapperdiffusers.AutoencoderOobleck.decodehttps://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/utils/accelerate_utils.py#L43[{"name": "args", "val": ""}, {"name": "*kwargs", "val": ""}]

	Parameters:

	encoder_hidden_size (`int`, optional, defaults to 128) : Intermediate representation dimension for the encoder.

	downsampling_ratios (`list[int]`, optional, defaults to `[2, 4, 4, 8, 8]`) : Ratios for downsampling in the encoder. These are used in reverse order for upsampling in the decoder.

	channel_multiples (`list[int]`, optional, defaults to `[1, 2, 4, 8, 16]`) : Multiples used to determine the hidden sizes of the hidden layers.

	decoder_channels (`int`, optional, defaults to 128) : Intermediate representation dimension for the decoder.

	decoder_input_channels (`int`, optional, defaults to 64) : Input dimension for the decoder. Corresponds to the latent dimension.

	audio_channels (`int`, optional, defaults to 2) : Number of channels in the audio data. Either 1 for mono or 2 for stereo.

	sampling_rate (`int`, optional, defaults to 44100) : The sampling rate at which the audio waveform should be digitalized expressed in hertz (Hz).
	#### wrapper[[diffusers.AutoencoderOobleck.encode]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/utils/accelerate_utils.py#L43)
	#### forward[[diffusers.AutoencoderOobleck.forward]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L425)

	Parameters:

	sample (`torch.Tensor`) : Input sample.

	sample_posterior (`bool`, optional, defaults to `False`) : Whether to sample from the posterior.

	return_dict (`bool`, optional, defaults to `True`) : Whether or not to return a `OobleckDecoderOutput` instead of a plain tuple.

	## OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]]

	#### diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L202)

	Output of decoding method.

	Parameters:

	sample (`torch.Tensor` of shape `(batch_size, audio_channels, sequence_length)`) : The decoded output sample from the last layer of the model.

	## OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]]

	#### diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L202)

	Output of decoding method.

	Parameters:

	sample (`torch.Tensor` of shape `(batch_size, audio_channels, sequence_length)`) : The decoded output sample from the last layer of the model.

	## AutoencoderOobleckOutput[[diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput]]

	#### diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput[[diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L187)

	Output of AutoencoderOobleck encoding method.

	Parameters:

	latent_dist (`OobleckDiagonalGaussianDistribution`) : Encoded outputs of `Encoder` represented as the mean and standard deviation of `OobleckDiagonalGaussianDistribution`. `OobleckDiagonalGaussianDistribution` allows for sampling latents from the distribution.

Xet Storage Details

Size:: 5.39 kB
Xet hash:: 68464198ecf1137d7c6a944e367e35179cbea8440db13220d1414c32c9104dee

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.