RohitManglik's picture
Update README.md
11b732e verified
---
language:
- en
metrics:
- confusion_matrix
- accuracy
base_model:
- openai/whisper-small
pipeline_tag: audio-text-to-text
tags:
- Audio
- ASR
- Speech-to-text
- Text-to-sentimentClassification
license: cc-by-4.0
datasets:
- InfoBayAI/call_center_audio_dual_channel_en_in
- InfoBayAI/English-Podcast-ASR-Dataset
- InfoBayAI/Hindi-Podcast-ASR-Dataset
- InfoBayAI/call_center_audio_dual_channel_en_uk
---
**Model Description**
This model is a transformer-based sentiment classification system built using **DistilBERT** and trained on text data derived from the [InfoBay.AI](https://huggingface.co/collections/InfoBayAI/podcast-speech-and-conversational-audio-datasets) audio dataset.
The training pipeline converts raw conversational audio into structured text using **Whisper base**, followed by segmentation and sentiment labeling. The resulting text dataset is then used to train the sentiment classification model.
This approach enables the transformation of unstructured audio data into meaningful NLP intelligence, demonstrating the value of the dataset for downstream AI applications.
![infobay_pipeline](https://cdn-uploads.huggingface.co/production/uploads/693ab313ff1770594f99afee/gtuUnCIpOKLDtc1QBILuR.png)
**Training Pipeline**
The complete pipeline used for training is as follows:
**Raw Audio (InfoBay.AI Dataset) → Whisper ASR (Speech-to-Text) → Text Segmentation → Sentiment Labeling → DistilBERT Training**
Audio Source: InfoBay.AI podcast dataset
Transcription: Whisper base model
Data Processing: Sentence-level segmentation
Labeling: VADER-based sentiment scoring
Model Training: DistilBERT for 3-class sentiment classification
**Key Insight**
This model demonstrates that audio data alone can be converted into high-quality training data and used effectively to train transformer-based NLP models.
It validates the ability of the [InfoBay.AI](https://infobay.ai/) dataset to support:
Speech-to-text pipelines
Sentiment analysis systems
End-to-end conversational AI workflows
**Dataset Split**
Train/Test Split: 80% / 20%
Split Strategy: Stratified sampling (to preserve class distribution)
Label Encoding: Applied using LabelEncoder
**Training Hyperparameters**
Number of Epochs: 15
Train Batch Size: 16
Evaluation Batch Size: 16
Learning Rate: 2e-5
Optimizer: AdamW
Loss Function: Cross-Entropy Loss
ogging Directory: ./logs
Output Directory: ./results
**Model Performance**
The model demonstrates strong performance on the speech-derived dataset on internal evaluation:
Accuracy: ~98%
Macro F1-score: ~0.98
Weighted F1-score: ~0.99
**Classification Report**
| Class | Sentiment | Precision | Recall | F1-score | Support |
| ----- | --------- | --------- | ------ | -------- | ------- |
| 0 | Negative | 0.97 | 0.96 | 0.96 | 1,128 |
| 1 | Neutral | 0.99 | 0.99 | 0.99 | 7,865 |
| 2 | Positive | 0.98 | 0.98 | 0.98 | 2,658 |
---
**Evaluation Results**
The model was evaluated using standard speech recognition metrics:
Word Error Rate (WER): 9.172%
Character Error Rate (CER): 4.53%
---
**Usage**
Install dependencies
```bash
pip install -U transformers torch
```
```python
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import torch
import torch.nn.functional as F
repo_id = "InfoBayAI/Audio-to-Sentiment_Intelligence_Model"
tokenizer = DistilBertTokenizer.from_pretrained(
repo_id,
subfolder="sentiment-model"
)
model = DistilBertForSequenceClassification.from_pretrained(
repo_id,
subfolder="sentiment-model"
)
model.eval()
text = " Write your text "
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.nn.functional.softmax(outputs.logits, dim=1)
predicted_class = torch.argmax(probs, dim=1).item()
labels = ["Negative", "Neutral", "Positive"]
print("Text:", text)
print("Prediction:", labels[predicted_class])
print("Confidence:", probs[0][predicted_class].item())
```
**AUDIO-TO-TEXT**
```python
import whisper
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from transformers import pipeline
import os
import numpy as np
model = whisper.load_model("base")
audio_folder = r"C:\Users\3\Documents\AUDIO 2\b6"
print(os.path.exists(audio_Folder))
analyzer = SentimentIntensityAnalyzer()
data = []
sr = 1
# Loop through all audio files
for file in os.listdir(audio_folder):
if file.endswith((".wav", ".mp3")):
audio_path = os.path.join(audio_folder, file)
print("Processing:", file)
result = model.transcribe(audio_path, task="translate", fp16=False)
segment_id = 1
for segment in result["segments"]:
text = segment["text"]
# Sentiment score
sentiment_score = analyzer.polarity_scores(text)["compound"]
# Convert score to label
if sentiment_score > 0.05:
sentiment = "positive"
elif sentiment_score < -0.05:
sentiment = "negative"
else:
sentiment = "neutral"
data.append({
"sr_no": sr,
"call_id": file,
"segment_id": segment_id,
"start_time": segment["start"],
"end_time": segment["end"],
"text": text,
"sentiment": sentiment
})
sr += 1
segment_id += 1
df= pd.DataFrame(data)
df.to_csv("AUDIO", index=False)
print("dataset created ")
print(df.head())
```
---
**Considerations**
This model is trained on text derived from the InfoBay.AI audio dataset and is provided for research and evaluation purposes. The dataset contains a larger collection of high-quality conversational audio. For access to the full dataset or enterprise licensing inquiries, please visit our website [InfoBay.AI](https://infobay.ai/) or contact us directly.
Ph: (91) 8303174762
Email: vipul@infobay.ai