| --- |
| language: |
| - en |
| metrics: |
| - confusion_matrix |
| - accuracy |
| base_model: |
| - openai/whisper-small |
| pipeline_tag: audio-text-to-text |
| tags: |
| - Audio |
| - ASR |
| - Speech-to-text |
| - Text-to-sentimentClassification |
| license: cc-by-4.0 |
| datasets: |
| - InfoBayAI/call_center_audio_dual_channel_en_in |
| - InfoBayAI/English-Podcast-ASR-Dataset |
| - InfoBayAI/Hindi-Podcast-ASR-Dataset |
| - InfoBayAI/call_center_audio_dual_channel_en_uk |
| --- |
| |
| **Model Description** |
|
|
| This model is a transformer-based sentiment classification system built using **DistilBERT** and trained on text data derived from the [InfoBay.AI](https://huggingface.co/collections/InfoBayAI/podcast-speech-and-conversational-audio-datasets) audio dataset. |
|
|
| The training pipeline converts raw conversational audio into structured text using **Whisper base**, followed by segmentation and sentiment labeling. The resulting text dataset is then used to train the sentiment classification model. |
|
|
| This approach enables the transformation of unstructured audio data into meaningful NLP intelligence, demonstrating the value of the dataset for downstream AI applications. |
|
|
|
|
|  |
|
|
| **Training Pipeline** |
|
|
| The complete pipeline used for training is as follows: |
|
|
| **Raw Audio (InfoBay.AI Dataset) → Whisper ASR (Speech-to-Text) → Text Segmentation → Sentiment Labeling → DistilBERT Training** |
|
|
| Audio Source: InfoBay.AI podcast dataset |
| Transcription: Whisper base model |
| Data Processing: Sentence-level segmentation |
| Labeling: VADER-based sentiment scoring |
| Model Training: DistilBERT for 3-class sentiment classification |
| |
|
|
| **Key Insight** |
|
|
| This model demonstrates that audio data alone can be converted into high-quality training data and used effectively to train transformer-based NLP models. |
|
|
| It validates the ability of the [InfoBay.AI](https://infobay.ai/) dataset to support: |
|
|
| Speech-to-text pipelines |
| Sentiment analysis systems |
| End-to-end conversational AI workflows |
| |
|
|
|
|
| **Dataset Split** |
|
|
| Train/Test Split: 80% / 20% |
| Split Strategy: Stratified sampling (to preserve class distribution) |
| Label Encoding: Applied using LabelEncoder |
| |
|
|
|
|
| **Training Hyperparameters** |
|
|
| Number of Epochs: 15 |
| Train Batch Size: 16 |
| Evaluation Batch Size: 16 |
| Learning Rate: 2e-5 |
| Optimizer: AdamW |
| Loss Function: Cross-Entropy Loss |
| ogging Directory: ./logs |
| Output Directory: ./results |
| |
|
|
|
|
| **Model Performance** |
|
|
| The model demonstrates strong performance on the speech-derived dataset on internal evaluation: |
|
|
| Accuracy: ~98% |
| Macro F1-score: ~0.98 |
| Weighted F1-score: ~0.99 |
| |
|
|
|
|
| **Classification Report** |
|
|
| | Class | Sentiment | Precision | Recall | F1-score | Support | |
| | ----- | --------- | --------- | ------ | -------- | ------- | |
| | 0 | Negative | 0.97 | 0.96 | 0.96 | 1,128 | |
| | 1 | Neutral | 0.99 | 0.99 | 0.99 | 7,865 | |
| | 2 | Positive | 0.98 | 0.98 | 0.98 | 2,658 | |
|
|
| --- |
|
|
| **Evaluation Results** |
|
|
| The model was evaluated using standard speech recognition metrics: |
| |
| Word Error Rate (WER): 9.172% |
| Character Error Rate (CER): 4.53% |
| |
| --- |
|
|
| **Usage** |
|
|
| Install dependencies |
| ```bash |
| pip install -U transformers torch |
| ``` |
|
|
| ```python |
| from transformers import DistilBertTokenizer, DistilBertForSequenceClassification |
| import torch |
| import torch.nn.functional as F |
| |
| repo_id = "InfoBayAI/Audio-to-Sentiment_Intelligence_Model" |
| |
| tokenizer = DistilBertTokenizer.from_pretrained( |
| repo_id, |
| subfolder="sentiment-model" |
| ) |
| |
| model = DistilBertForSequenceClassification.from_pretrained( |
| repo_id, |
| subfolder="sentiment-model" |
| ) |
| |
| model.eval() |
| |
| text = " Write your text " |
| |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) |
| |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| probs = torch.nn.functional.softmax(outputs.logits, dim=1) |
| |
| predicted_class = torch.argmax(probs, dim=1).item() |
| |
| labels = ["Negative", "Neutral", "Positive"] |
| |
| print("Text:", text) |
| print("Prediction:", labels[predicted_class]) |
| print("Confidence:", probs[0][predicted_class].item()) |
| |
| ``` |
| **AUDIO-TO-TEXT** |
|
|
| ```python |
| import whisper |
| import pandas as pd |
| from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer |
| from transformers import pipeline |
| import os |
| import numpy as np |
| |
| model = whisper.load_model("base") |
| audio_folder = r"C:\Users\3\Documents\AUDIO 2\b6" |
| print(os.path.exists(audio_Folder)) |
| |
| |
| analyzer = SentimentIntensityAnalyzer() |
| |
| data = [] |
| sr = 1 |
| |
| # Loop through all audio files |
| for file in os.listdir(audio_folder): |
| |
| if file.endswith((".wav", ".mp3")): |
| |
| audio_path = os.path.join(audio_folder, file) |
| |
| print("Processing:", file) |
| |
| result = model.transcribe(audio_path, task="translate", fp16=False) |
| |
| segment_id = 1 |
| |
| for segment in result["segments"]: |
| |
| text = segment["text"] |
| |
| # Sentiment score |
| sentiment_score = analyzer.polarity_scores(text)["compound"] |
| |
| # Convert score to label |
| if sentiment_score > 0.05: |
| sentiment = "positive" |
| elif sentiment_score < -0.05: |
| sentiment = "negative" |
| else: |
| sentiment = "neutral" |
| |
| data.append({ |
| "sr_no": sr, |
| "call_id": file, |
| "segment_id": segment_id, |
| "start_time": segment["start"], |
| "end_time": segment["end"], |
| "text": text, |
| "sentiment": sentiment |
| }) |
| |
| sr += 1 |
| segment_id += 1 |
| |
| df= pd.DataFrame(data) |
| |
| df.to_csv("AUDIO", index=False) |
| print("dataset created ") |
| |
| print(df.head()) |
| |
| ``` |
| --- |
| **Considerations** |
|
|
| This model is trained on text derived from the InfoBay.AI audio dataset and is provided for research and evaluation purposes. The dataset contains a larger collection of high-quality conversational audio. For access to the full dataset or enterprise licensing inquiries, please visit our website [InfoBay.AI](https://infobay.ai/) or contact us directly. |
|
|
| |
| Ph: (91) 8303174762 |
| Email: vipul@infobay.ai |