License Agreement Required

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

By registering for access to this model, you agree to the strict terms and conditions of the Gradient-AI License. This model is strictly prohibited from being used for deception, weaponization, or illegal acts.

Log in or Sign Up to review the conditions and access this model content.

Gradient-Transcribe1 (125M)

Gradient-Transcribe1 is a high-efficiency transformer-based model for automatic speech recognition (ASR). It incorporates modern architectural advancements such as Grouped Query Attention (GQA) and Rotary Positional Embeddings (RoPE) to deliver superior inference performance and long-context stability.

Access to this model is gated. Users must agree to the Gradient-AI License and provide their intended use case before downloading the weights.

Model Details

Gradient-Transcribe1 is a sequence-to-sequence encoder-decoder model optimized for 16kHz audio. Key architectural features include:

  • Grouped Query Attention (GQA): Optimized for faster decoding and reduced KV cache memory footprint.
  • Rotary Positional Embeddings (RoPE): Enhanced relative position encoding for better sequence length generalization.
  • Modern Activation & Norm: Utilizing RMSNorm and SwiGLU for improved training stability.

Specifications

Component Configuration
Parameters 138,044,928
Hidden Size 768
Encoder Layers 8
Decoder Layers 10
Attention Heads 8 (Q), 4 (KV)
Vocab Size 1024
Mel Bins 80

Usage

Due to the custom nature of this architecture, you must set trust_remote_code=True when loading the model.

Loading the Model

from transformers import AutoModel, AutoTokenizer

# Load the model (requires approved access)
model = AutoModel.from_pretrained(
    "your-username/gradient-transcribe1-125m", 
    trust_remote_code=True,
    use_auth_token=True
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/gradient-transcribe1-125m")
Transcription Example
Python
import torch
import librosa

# Load 16kHz audio
audio, _ = librosa.load("sample_audio.wav", sr=16000)

# Note: Pre-processing to Mel-spectrogram must match the model's 80-bin configuration.
# transcription = model.generate(input_features)

Training Data Gradient-Transcribe1 was trained on a combination of curated speech datasets and synthetic data to validate the performance of GQA in ASR tasks. It is currently optimized for English speech.

Limitations and Biases Intended Use: This model is designed for research and educational purposes. Usage for deceptive, weaponized, or illegal acts is strictly prohibited.

Hallucinations: As a sequence-to-sequence model, it may generate text that does not exist in the audio, particularly in high-noise environments.

Domain Specificity: Performance may vary across different accents, dialects, and technical terminologies.

License This model is licensed under the Gradient-AI License v1.0. By requesting access, you agree to abide by the terms specified at gradient-research/license.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Gradient-Research/Gradient-Transcribe1