| --- |
| language: |
| - en |
| tags: |
| - audio |
| - automatic-speech-recognition |
| - gqa |
| - rope |
| - pytorch |
| - safetensors |
| pipeline_tag: automatic-speech-recognition |
| license: other |
| license_name: gradient-ai-license-v1.0 |
| license_link: https://huggingface.co/gradient-research/license |
| gated: auto |
| extra_gated_heading: License Agreement Required |
| extra_gated_prompt: >- |
| By registering for access to this model, you agree to the strict terms and |
| conditions of the Gradient-AI License. This model is strictly prohibited from |
| being used for deception, weaponization, or illegal acts. |
| extra_gated_button_content: Acknowledge License and Request Access |
| extra_gated_fields: |
| I have read and agree to be bound by the Gradient-AI License: checkbox |
| Name / Organization: text |
| Intended Use Case: |
| type: select |
| options: |
| - Research |
| - Education |
| - label: Commercial (Requires Permission) |
| value: commercial |
| - label: Other |
| value: other |
| library_name: transformers |
| --- |
| |
| # Gradient-Transcribe1 (125M) |
|
|
| Gradient-Transcribe1 is a high-efficiency transformer-based model for automatic speech recognition (ASR). It incorporates modern architectural advancements such as **Grouped Query Attention (GQA)** and **Rotary Positional Embeddings (RoPE)** to deliver superior inference performance and long-context stability. |
|
|
| **Access to this model is gated.** Users must agree to the Gradient-AI License and provide their intended use case before downloading the weights. |
|
|
| ## Model Details |
|
|
| Gradient-Transcribe1 is a sequence-to-sequence encoder-decoder model optimized for 16kHz audio. Key architectural features include: |
|
|
| * **Grouped Query Attention (GQA):** Optimized for faster decoding and reduced KV cache memory footprint. |
| * **Rotary Positional Embeddings (RoPE):** Enhanced relative position encoding for better sequence length generalization. |
| * **Modern Activation & Norm:** Utilizing RMSNorm and SwiGLU for improved training stability. |
|
|
| ### Specifications |
|
|
| | Component | Configuration | |
| |----------------------|---------------| |
| | **Parameters** | 138,044,928 | |
| | **Hidden Size** | 768 | |
| | **Encoder Layers** | 8 | |
| | **Decoder Layers** | 10 | |
| | **Attention Heads** | 8 (Q), 4 (KV) | |
| | **Vocab Size** | 1024 | |
| | **Mel Bins** | 80 | |
|
|
| ## Usage |
|
|
| Due to the custom nature of this architecture, you must set `trust_remote_code=True` when loading the model. |
|
|
| ### Loading the Model |
| ```python |
| from transformers import AutoModel, AutoTokenizer |
| |
| # Load the model (requires approved access) |
| model = AutoModel.from_pretrained( |
| "your-username/gradient-transcribe1-125m", |
| trust_remote_code=True, |
| use_auth_token=True |
| ) |
| |
| # Load the tokenizer |
| tokenizer = AutoTokenizer.from_pretrained("your-username/gradient-transcribe1-125m") |
| Transcription Example |
| Python |
| import torch |
| import librosa |
| |
| # Load 16kHz audio |
| audio, _ = librosa.load("sample_audio.wav", sr=16000) |
| |
| # Note: Pre-processing to Mel-spectrogram must match the model's 80-bin configuration. |
| # transcription = model.generate(input_features) |
| ``` |
|
|
| Training Data |
| Gradient-Transcribe1 was trained on a combination of curated speech datasets and synthetic data to validate the performance of GQA in ASR tasks. It is currently optimized for English speech. |
|
|
| Limitations and Biases |
| Intended Use: This model is designed for research and educational purposes. Usage for deceptive, weaponized, or illegal acts is strictly prohibited. |
|
|
| Hallucinations: As a sequence-to-sequence model, it may generate text that does not exist in the audio, particularly in high-noise environments. |
|
|
| Domain Specificity: Performance may vary across different accents, dialects, and technical terminologies. |
|
|
| License |
| This model is licensed under the Gradient-AI License v1.0. By requesting access, you agree to abide by the terms specified at gradient-research/license. |