Ill-Ness's picture
Create README.md
ba5cc40 verified
metadata
language:
  - en
tags:
  - audio
  - automatic-speech-recognition
  - gqa
  - rope
  - pytorch
  - safetensors
pipeline_tag: automatic-speech-recognition
license: other
license_name: gradient-ai-license-v1.0
license_link: https://huggingface.co/gradient-research/license
gated: auto
extra_gated_heading: License Agreement Required
extra_gated_prompt: >-
  By registering for access to this model, you agree to the strict terms and
  conditions of the Gradient-AI License. This model is strictly prohibited from
  being used for deception, weaponization, or illegal acts.
extra_gated_button_content: Acknowledge License and Request Access
extra_gated_fields:
  I have read and agree to be bound by the Gradient-AI License: checkbox
  Name / Organization: text
  Intended Use Case:
    type: select
    options:
      - Research
      - Education
      - label: Commercial (Requires Permission)
        value: commercial
      - label: Other
        value: other
library_name: transformers

Gradient-Transcribe1 (125M)

Gradient-Transcribe1 is a high-efficiency transformer-based model for automatic speech recognition (ASR). It incorporates modern architectural advancements such as Grouped Query Attention (GQA) and Rotary Positional Embeddings (RoPE) to deliver superior inference performance and long-context stability.

Access to this model is gated. Users must agree to the Gradient-AI License and provide their intended use case before downloading the weights.

Model Details

Gradient-Transcribe1 is a sequence-to-sequence encoder-decoder model optimized for 16kHz audio. Key architectural features include:

  • Grouped Query Attention (GQA): Optimized for faster decoding and reduced KV cache memory footprint.
  • Rotary Positional Embeddings (RoPE): Enhanced relative position encoding for better sequence length generalization.
  • Modern Activation & Norm: Utilizing RMSNorm and SwiGLU for improved training stability.

Specifications

Component Configuration
Parameters 138,044,928
Hidden Size 768
Encoder Layers 8
Decoder Layers 10
Attention Heads 8 (Q), 4 (KV)
Vocab Size 1024
Mel Bins 80

Usage

Due to the custom nature of this architecture, you must set trust_remote_code=True when loading the model.

Loading the Model

from transformers import AutoModel, AutoTokenizer

# Load the model (requires approved access)
model = AutoModel.from_pretrained(
    "your-username/gradient-transcribe1-125m", 
    trust_remote_code=True,
    use_auth_token=True
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/gradient-transcribe1-125m")
Transcription Example
Python
import torch
import librosa

# Load 16kHz audio
audio, _ = librosa.load("sample_audio.wav", sr=16000)

# Note: Pre-processing to Mel-spectrogram must match the model's 80-bin configuration.
# transcription = model.generate(input_features)

Training Data Gradient-Transcribe1 was trained on a combination of curated speech datasets and synthetic data to validate the performance of GQA in ASR tasks. It is currently optimized for English speech.

Limitations and Biases Intended Use: This model is designed for research and educational purposes. Usage for deceptive, weaponized, or illegal acts is strictly prohibited.

Hallucinations: As a sequence-to-sequence model, it may generate text that does not exist in the audio, particularly in high-noise environments.

Domain Specificity: Performance may vary across different accents, dialects, and technical terminologies.

License This model is licensed under the Gradient-AI License v1.0. By requesting access, you agree to abide by the terms specified at gradient-research/license.