Create README.md

ba5cc40 verified 1 day ago

3.89 kB

	---
	language:
	- en
	tags:
	- audio
	- automatic-speech-recognition
	- gqa
	- rope
	- pytorch
	- safetensors
	pipeline_tag: automatic-speech-recognition
	license: other
	license_name: gradient-ai-license-v1.0
	license_link: https://huggingface.co/gradient-research/license
	gated: auto
	extra_gated_heading: License Agreement Required
	extra_gated_prompt: >-
	By registering for access to this model, you agree to the strict terms and
	conditions of the Gradient-AI License. This model is strictly prohibited from
	being used for deception, weaponization, or illegal acts.
	extra_gated_button_content: Acknowledge License and Request Access
	extra_gated_fields:
	I have read and agree to be bound by the Gradient-AI License: checkbox
	Name / Organization: text
	Intended Use Case:
	type: select
	options:
	- Research
	- Education
	- label: Commercial (Requires Permission)
	value: commercial
	- label: Other
	value: other
	library_name: transformers
	---

	# Gradient-Transcribe1 (125M)

	Gradient-Transcribe1 is a high-efficiency transformer-based model for automatic speech recognition (ASR). It incorporates modern architectural advancements such as Grouped Query Attention (GQA) and Rotary Positional Embeddings (RoPE) to deliver superior inference performance and long-context stability.

	Access to this model is gated. Users must agree to the Gradient-AI License and provide their intended use case before downloading the weights.

	## Model Details

	Gradient-Transcribe1 is a sequence-to-sequence encoder-decoder model optimized for 16kHz audio. Key architectural features include:

	* Grouped Query Attention (GQA): Optimized for faster decoding and reduced KV cache memory footprint.
	* Rotary Positional Embeddings (RoPE): Enhanced relative position encoding for better sequence length generalization.
	* Modern Activation & Norm: Utilizing RMSNorm and SwiGLU for improved training stability.

	### Specifications

	\| Component \| Configuration \|
	\|----------------------\|---------------\|
	\| Parameters \| 138,044,928 \|
	\| Hidden Size \| 768 \|
	\| Encoder Layers \| 8 \|
	\| Decoder Layers \| 10 \|
	\| Attention Heads \| 8 (Q), 4 (KV) \|
	\| Vocab Size \| 1024 \|
	\| Mel Bins \| 80 \|

	## Usage

	Due to the custom nature of this architecture, you must set `trust_remote_code=True` when loading the model.

	### Loading the Model
	```python
	from transformers import AutoModel, AutoTokenizer

	# Load the model (requires approved access)
	model = AutoModel.from_pretrained(
	"your-username/gradient-transcribe1-125m",
	trust_remote_code=True,
	use_auth_token=True
	)

	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained("your-username/gradient-transcribe1-125m")
	Transcription Example
	Python
	import torch
	import librosa

	# Load 16kHz audio
	audio, _ = librosa.load("sample_audio.wav", sr=16000)

	# Note: Pre-processing to Mel-spectrogram must match the model's 80-bin configuration.
	# transcription = model.generate(input_features)
	```

	Training Data
	Gradient-Transcribe1 was trained on a combination of curated speech datasets and synthetic data to validate the performance of GQA in ASR tasks. It is currently optimized for English speech.

	Limitations and Biases
	Intended Use: This model is designed for research and educational purposes. Usage for deceptive, weaponized, or illegal acts is strictly prohibited.

	Hallucinations: As a sequence-to-sequence model, it may generate text that does not exist in the audio, particularly in high-noise environments.

	Domain Specificity: Performance may vary across different accents, dialects, and technical terminologies.

	License
	This model is licensed under the Gradient-AI License v1.0. By requesting access, you agree to abide by the terms specified at gradient-research/license.