Robust Speech Quantizer (HuBERT / DinoSR / SpidR)

GitHub Repository

MLP-based robust speech quantizers trained with CTC loss and iterative pseudo-labeling on augmented audio, following Algayres et al., Interspeech 2023. Evaluated on K ∈ {100, 200, 500} vocabulary sizes.

Encoders

Encoder Checkpoint Layer Pre-training data
HuBERT Base hubert-base-ls960 6 LibriSpeech 960h
DinoSR original + SpidR-reproduced 5 LibriSpeech 960h
SpidR spidr-base 6 LibriSpeech 960h

Quick Start

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="iliasslasri/robust_speech_quantizer",
    filename="500_vocab_size/round_1/E1_best.pt"
)
config_path = hf_hub_download(
    repo_id="iliasslasri/robust_speech_quantizer",
    filename="500_vocab_size/config.yaml"
)

Augmentations

Augmentation Audio
Clean
Time Stretch
Pitch Shift
Reverberation
Noise
Echo
Random Noise
Pink Noise
Lowpass Filter
Highpass Filter
Bandpass Filter
Smooth
Boost Audio
Duck Audio
Up-Down Resample

Experiments

We trained quantizers across different encoders, codebook sizes, and augmentation strategies. The augmentation configurations are:

  • All augmentations, chained — all augmentations from the table above are enabled, and multiple augmentations are applied sequentially to each sample. The number of chained augmentations is sampled from a uniform distribution between 0 and 4.
  • All augmentations, single — all augmentations are enabled, but only one randomly chosen augmentation is applied per sample.
  • No extra augmentations, single — only the baseline augmentations (from the original paper) are used, with one applied per sample.
Encoder Layer Codebook Augmentation Strategy
HuBERT 6 500 All augmentations, chained
All augmentations, single
No extra augmentations, single
SpidR 6 256 No extra augmentations, single
All augmentations, chained
DinoSR (original) 5 256 All augmentations, chained
DinoSR (reproduced) 5 256 All augmentations, chained

Links

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iliasslasri/robust_speech_quantizer

Finetuned
(134)
this model

Dataset used to train iliasslasri/robust_speech_quantizer

Papers for iliasslasri/robust_speech_quantizer