Update README.md

11b732e verified 1 day ago

6.18 kB

	---
	language:
	- en
	metrics:
	- confusion_matrix
	- accuracy
	base_model:
	- openai/whisper-small
	pipeline_tag: audio-text-to-text
	tags:
	- Audio
	- ASR
	- Speech-to-text
	- Text-to-sentimentClassification
	license: cc-by-4.0
	datasets:
	- InfoBayAI/call_center_audio_dual_channel_en_in
	- InfoBayAI/English-Podcast-ASR-Dataset
	- InfoBayAI/Hindi-Podcast-ASR-Dataset
	- InfoBayAI/call_center_audio_dual_channel_en_uk
	---

	Model Description

	This model is a transformer-based sentiment classification system built using DistilBERT and trained on text data derived from the [InfoBay.AI](https://huggingface.co/collections/InfoBayAI/podcast-speech-and-conversational-audio-datasets) audio dataset.

	The training pipeline converts raw conversational audio into structured text using Whisper base, followed by segmentation and sentiment labeling. The resulting text dataset is then used to train the sentiment classification model.

	This approach enables the transformation of unstructured audio data into meaningful NLP intelligence, demonstrating the value of the dataset for downstream AI applications.


	![infobay_pipeline](https://cdn-uploads.huggingface.co/production/uploads/693ab313ff1770594f99afee/gtuUnCIpOKLDtc1QBILuR.png)

	Training Pipeline

	The complete pipeline used for training is as follows:

	Raw Audio (InfoBay.AI Dataset) → Whisper ASR (Speech-to-Text) → Text Segmentation → Sentiment Labeling → DistilBERT Training

	Audio Source: InfoBay.AI podcast dataset
	Transcription: Whisper base model
	Data Processing: Sentence-level segmentation
	Labeling: VADER-based sentiment scoring
	Model Training: DistilBERT for 3-class sentiment classification


	Key Insight

	This model demonstrates that audio data alone can be converted into high-quality training data and used effectively to train transformer-based NLP models.

	It validates the ability of the [InfoBay.AI](https://infobay.ai/) dataset to support:

	Speech-to-text pipelines
	Sentiment analysis systems
	End-to-end conversational AI workflows



	Dataset Split

	Train/Test Split: 80% / 20%
	Split Strategy: Stratified sampling (to preserve class distribution)
	Label Encoding: Applied using LabelEncoder



	Training Hyperparameters

	Number of Epochs: 15
	Train Batch Size: 16
	Evaluation Batch Size: 16
	Learning Rate: 2e-5
	Optimizer: AdamW
	Loss Function: Cross-Entropy Loss
	ogging Directory: ./logs
	Output Directory: ./results



	Model Performance

	The model demonstrates strong performance on the speech-derived dataset on internal evaluation:

	Accuracy: ~98%
	Macro F1-score: ~0.98
	Weighted F1-score: ~0.99



	Classification Report

	\| Class \| Sentiment \| Precision \| Recall \| F1-score \| Support \|
	\| ----- \| --------- \| --------- \| ------ \| -------- \| ------- \|
	\| 0 \| Negative \| 0.97 \| 0.96 \| 0.96 \| 1,128 \|
	\| 1 \| Neutral \| 0.99 \| 0.99 \| 0.99 \| 7,865 \|
	\| 2 \| Positive \| 0.98 \| 0.98 \| 0.98 \| 2,658 \|

	---

	Evaluation Results

	The model was evaluated using standard speech recognition metrics:

	Word Error Rate (WER): 9.172%
	Character Error Rate (CER): 4.53%

	---

	Usage

	Install dependencies
	```bash
	pip install -U transformers torch
	```

	```python
	from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
	import torch
	import torch.nn.functional as F

	repo_id = "InfoBayAI/Audio-to-Sentiment_Intelligence_Model"

	tokenizer = DistilBertTokenizer.from_pretrained(
	repo_id,
	subfolder="sentiment-model"
	)

	model = DistilBertForSequenceClassification.from_pretrained(
	repo_id,
	subfolder="sentiment-model"
	)

	model.eval()

	text = " Write your text "

	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.nn.functional.softmax(outputs.logits, dim=1)

	predicted_class = torch.argmax(probs, dim=1).item()

	labels = ["Negative", "Neutral", "Positive"]

	print("Text:", text)
	print("Prediction:", labels[predicted_class])
	print("Confidence:", probs[0][predicted_class].item())

	```
	AUDIO-TO-TEXT

	```python
	import whisper
	import pandas as pd
	from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
	from transformers import pipeline
	import os
	import numpy as np

	model = whisper.load_model("base")
	audio_folder = r"C:\Users\3\Documents\AUDIO 2\b6"
	print(os.path.exists(audio_Folder))


	analyzer = SentimentIntensityAnalyzer()

	data = []
	sr = 1

	# Loop through all audio files
	for file in os.listdir(audio_folder):

	if file.endswith((".wav", ".mp3")):

	audio_path = os.path.join(audio_folder, file)

	print("Processing:", file)

	result = model.transcribe(audio_path, task="translate", fp16=False)

	segment_id = 1

	for segment in result["segments"]:

	text = segment["text"]

	# Sentiment score
	sentiment_score = analyzer.polarity_scores(text)["compound"]

	# Convert score to label
	if sentiment_score > 0.05:
	sentiment = "positive"
	elif sentiment_score < -0.05:
	sentiment = "negative"
	else:
	sentiment = "neutral"

	data.append({
	"sr_no": sr,
	"call_id": file,
	"segment_id": segment_id,
	"start_time": segment["start"],
	"end_time": segment["end"],
	"text": text,
	"sentiment": sentiment
	})

	sr += 1
	segment_id += 1

	df= pd.DataFrame(data)

	df.to_csv("AUDIO", index=False)
	print("dataset created ")

	print(df.head())

	```
	---
	Considerations

	This model is trained on text derived from the InfoBay.AI audio dataset and is provided for research and evaluation purposes. The dataset contains a larger collection of high-quality conversational audio. For access to the full dataset or enterprise licensing inquiries, please visit our website [InfoBay.AI](https://infobay.ai/) or contact us directly.


	Ph: (91) 8303174762
	Email: vipul@infobay.ai