Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / diffusers /pr_12595 /en /api /models /autoencoder_oobleck.md

rtrm

19 days ago

preview code

download

raw

7.93 kB

	# AutoencoderOobleck

	The Oobleck variational autoencoder (VAE) model with KL loss was introduced in [Stability-AI/stable-audio-tools](https://github.com/Stability-AI/stable-audio-tools) and [Stable Audio Open](https://huggingface.co/papers/2407.14358) by Stability AI. The model is used in 🤗 Diffusers to encode audio waveforms into latents and to decode latent representations into audio waveforms.

	The abstract from the paper is:

	Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

	## AutoencoderOobleck[[diffusers.AutoencoderOobleck]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.AutoencoderOobleck</name><anchor>diffusers.AutoencoderOobleck</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L295</source><parameters>[{"name": "encoder_hidden_size", "val": " = 128"}, {"name": "downsampling_ratios", "val": " = [2, 4, 4, 8, 8]"}, {"name": "channel_multiples", "val": " = [1, 2, 4, 8, 16]"}, {"name": "decoder_channels", "val": " = 128"}, {"name": "decoder_input_channels", "val": " = 64"}, {"name": "audio_channels", "val": " = 2"}, {"name": "sampling_rate", "val": " = 44100"}]</parameters><paramsdesc>- encoder_hidden_size (`int`, optional, defaults to 128) --
	Intermediate representation dimension for the encoder.
	- downsampling_ratios (`List[int]`, optional, defaults to `[2, 4, 4, 8, 8]`) --
	Ratios for downsampling in the encoder. These are used in reverse order for upsampling in the decoder.
	- channel_multiples (`List[int]`, optional, defaults to `[1, 2, 4, 8, 16]`) --
	Multiples used to determine the hidden sizes of the hidden layers.
	- decoder_channels (`int`, optional, defaults to 128) --
	Intermediate representation dimension for the decoder.
	- decoder_input_channels (`int`, optional, defaults to 64) --
	Input dimension for the decoder. Corresponds to the latent dimension.
	- audio_channels (`int`, optional, defaults to 2) --
	Number of channels in the audio data. Either 1 for mono or 2 for stereo.
	- sampling_rate (`int`, optional, defaults to 44100) --
	The sampling rate at which the audio waveform should be digitalized expressed in hertz (Hz).</paramsdesc><paramgroups>0</paramgroups></docstring>

	An autoencoder for encoding waveforms into latents and decoding latent representations into waveforms. First
	introduced in Stable Audio.

	This model inherits from [ModelMixin](/docs/diffusers/pr_12595/en/api/models/overview#diffusers.ModelMixin). Check the superclass documentation for it's generic methods implemented
	for all models (such as downloading or saving).





	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>wrapper</name><anchor>diffusers.AutoencoderOobleck.decode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/utils/accelerate_utils.py#L43</source><parameters>[{"name": "args", "val": ""}, {"name": "*kwargs", "val": ""}]</parameters></docstring>


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>wrapper</name><anchor>diffusers.AutoencoderOobleck.encode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/utils/accelerate_utils.py#L43</source><parameters>[{"name": "args", "val": ""}, {"name": "*kwargs", "val": ""}]</parameters></docstring>


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>forward</name><anchor>diffusers.AutoencoderOobleck.forward</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L426</source><parameters>[{"name": "sample", "val": ": Tensor"}, {"name": "sample_posterior", "val": ": bool = False"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}]</parameters><paramsdesc>- sample (`torch.Tensor`) -- Input sample.
	- sample_posterior (`bool`, optional, defaults to `False`) --
	Whether to sample from the posterior.
	- return_dict (`bool`, optional, defaults to `True`) --
	Whether or not to return a `OobleckDecoderOutput` instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups></docstring>




	</div></div>

	## OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput</name><anchor>diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L203</source><parameters>[{"name": "sample", "val": ": Tensor"}]</parameters><paramsdesc>- sample (`torch.Tensor` of shape `(batch_size, audio_channels, sequence_length)`) --
	The decoded output sample from the last layer of the model.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Output of decoding method.




	</div>

	## OobleckDecoderOutput[[diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput</name><anchor>diffusers.models.autoencoders.autoencoder_oobleck.OobleckDecoderOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L203</source><parameters>[{"name": "sample", "val": ": Tensor"}]</parameters><paramsdesc>- sample (`torch.Tensor` of shape `(batch_size, audio_channels, sequence_length)`) --
	The decoded output sample from the last layer of the model.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Output of decoding method.




	</div>

	## AutoencoderOobleckOutput[[diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput</name><anchor>diffusers.models.autoencoders.autoencoder_oobleck.AutoencoderOobleckOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/autoencoder_oobleck.py#L188</source><parameters>[{"name": "latent_dist", "val": ": OobleckDiagonalGaussianDistribution"}]</parameters><paramsdesc>- latent_dist (`OobleckDiagonalGaussianDistribution`) --
	Encoded outputs of `Encoder` represented as the mean and standard deviation of
	`OobleckDiagonalGaussianDistribution`. `OobleckDiagonalGaussianDistribution` allows for sampling latents
	from the distribution.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Output of AutoencoderOobleck encoding method.




	</div>

	<EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/models/autoencoder_oobleck.md" />

Xet Storage Details

Size:: 7.93 kB
Xet hash:: c5fa6a7087da55947b8381c862ea1e4c7fab95361b79763a84bcfd86f0c51a1b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.