Instructions to use Elyadata/AraBEST-RQ-600M-6k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- speechbrain
How to use Elyadata/AraBEST-RQ-600M-6k with speechbrain:
# interface not specified in config.json
- Notebooks
- Google Colab
- Kaggle
| datasets: | |
| - Elyadata/Ara-Best-RQ_dataset | |
| language: | |
| - ar | |
| library_name: speechbrain | |
| tags: | |
| - speech | |
| - ssl | |
| - arabic | |
| - dialect | |
| # Ara-BEST-RQ-600M-6k | |
| **Ara-BEST-RQ-600M-6k** is a 600M-parameter self-supervised speech representation model for Arabic and Arabic dialects. It is part of the Ara-BEST-RQ family introduced in **[Ara-Best-RQ: Multi Dialectal Arabic SSL](https://arxiv.org/abs/2603.21900)**. | |
| This model was pretrained on the **crawled Ara-BEST-RQ dataset**: 5,639h 04m 27s of Creative Commons Arabic speech collected from publicly available YouTube videos and segmented for self-supervised speech learning. | |
| - **Paper:** [Ara-Best-RQ: Multi Dialectal Arabic SSL](https://arxiv.org/abs/2603.21900) | |
| - **Dataset:** [Elyadata/Ara-Best-RQ_dataset](https://huggingface.co/datasets/Elyadata/Ara-Best-RQ_dataset) | |
| - **Implementation:** [elyadata/AraBEST-RQ](https://github.com/elyadata/AraBEST-RQ) | |
| ## Model Details | |
| ### Model Description | |
| Ara-BEST-RQ is a family of Arabic-focused self-supervised learning (SSL) speech models based on the BEST-RQ framework. The models are designed to learn speech representations that transfer well to Arabic speech processing tasks, including automatic speech recognition (ASR) and dialect identification (DID). | |
| This checkpoint corresponds to the **600M** variant pretrained on the **crawled 6k-hour dataset**. | |
| - **Model type:** Self-supervised speech representation model | |
| - **Architecture:** Conformer-based BEST-RQ encoder | |
| - **Parameters:** ~600M (611.3M) | |
| - **Training data:** crawled Ara-BEST-RQ dataset | |
| - **Languages:** Arabic, including multiple dialects | |
| - **Primary use:** Speech representation learning / downstream fine-tuning | |
| ### Architecture | |
| The 600M Ara-BEST-RQ model uses: | |
| - 24 Conformer encoder layers | |
| - Model dimension: 1024 | |
| - 8 attention heads | |
| - Feed-forward dimension: 4096 | |
| - GELU activations | |
| - Layer normalization before attention | |
| - Relative position multi-head attention | |
| - Convolutional front-end with two blocks | |
| - Random projection quantizer with 4096 codebook entries of dimension 16 | |
| ## Training Data | |
| The model was pretrained on the crawled Ara-BEST-RQ dataset: **5,639h 04m 27s** of Creative Commons speech data. | |
| The released dataset on Hugging Face provides **metadata only**: YouTube video identifiers and audio segment boundaries. No audio or video files are distributed as part of the dataset. | |
| Dataset link: [Elyadata/Ara-Best-RQ_dataset](https://huggingface.co/datasets/Elyadata/Ara-Best-RQ_dataset) | |
| ## Pretraining | |
| The paper reports the following pretraining losses after 300k updates for this model: | |
| | Training set | Train loss | Validation loss | | |
| |---|---:|---:| | |
| | Crawled | 3.53 | 3.70 | | |
| ## Evaluation | |
| The paper evaluates Ara-BEST-RQ models on automatic speech recognition and dialect identification tasks. The following results are reported for the **Ara-BEST-RQ-600M-6k** model. | |
| ### Automatic Speech Recognition | |
| WER scores on ASR benchmarks: | |
| | Dataset | WER | | |
| |---|---:| | |
| | Common Voice 19.0 Arabic | 19.50 | | |
| | MGB-3 | 30.83 | | |
| | MGB-5 | 55.78 | | |
| | TARIC-SLU | 22.41 | | |
| | Average | 32.13 | | |
| ### Dialect Identification | |
| Results on ADI-20: | |
| | Split | Accuracy | Weighted F1 | | |
| |---|---:|---:| | |
| | Validation | 92.86 | 92.87 | | |
| | Test | 91.05 | 91.04 | | |
| ## Usage | |
| This is a self-supervised pretrained model intended to be used as a speech encoder or as an initialization checkpoint for downstream fine-tuning. | |
| For training and fine-tuning recipes, please refer to the official implementation: | |
| ```bash | |
| git clone https://github.com/elyadata/AraBEST-RQ | |
| cd AraBEST-RQ | |
| ``` | |
| You can download the checkpoint from Hugging Face using: | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| model_dir = snapshot_download("Elyadata/AraBEST-RQ-600M-6k") | |
| print(model_dir) | |
| ``` | |
| Please refer to the repository configuration and SpeechBrain recipes for the correct model-loading interface. | |
| ### Fine-tuning with SpeechBrain | |
| To fine-tune this pretrained Ara-BEST-RQ checkpoint in a SpeechBrain recipe, adapt the `pretrainer` section of your YAML configuration so that it loads both the pretrained model checkpoint and the corresponding normalizer. | |
| Example: | |
| ```yaml | |
| pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer | |
| collect_in: !ref <save_folder> | |
| loadables: | |
| pt_model: !ref <pt_model> | |
| normalize: !ref <normalize> | |
| paths: | |
| pt_model: !ref <pt_model_path>/model.ckpt | |
| normalize: !ref <pt_model_path>/normalizer.ckpt | |
| ``` | |
| In your downstream recipe, make sure that: | |
| - `<pt_model>` points to the Ara-BEST-RQ pretrained model object used in your training graph. | |
| - `<normalize>` points to the normalization module used by the recipe. | |
| - `<pt_model_path>` points to the local directory containing `model.ckpt` and `normalizer.ckpt`. | |
| - `<save_folder>` is the experiment directory where SpeechBrain should collect and manage pretrained components. | |
| This setup allows SpeechBrain to initialize the downstream model from the Ara-BEST-RQ SSL checkpoint before fine-tuning on task-specific data. | |
| ## Citation | |
| If you use this model, please cite the Ara-BEST-RQ paper: | |
| ```bibtex | |
| @misc{elleuch2026arabestrqmultidialectalarabic, | |
| title={Ara-Best-RQ: Multi Dialectal Arabic SSL}, | |
| author={Haroun Elleuch and Ryan Whetten and Salima Mdhaffar and Yannick Estève and Fethi Bougares}, | |
| year={2026}, | |
| eprint={2603.21900}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2603.21900}, | |
| } | |
| ``` | |