Spaces:

Vijesh251
/

MoodSyncAI

Sleeping

App Files Files Community

MoodSyncAI / README.md

vijesh418

Shorten HF short_description to <=60 chars

dd1f5f6 2 days ago

preview code

raw

history blame contribute delete

3.56 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: MoodSyncAI
emoji: 🎭
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Multi-modal emotion analyser (face, text, audio)

🎭 MoodSyncAI

Multi-Modal Sentiment & Emotion Analyser — combines facial emotion (Vision Transformer), text sentiment (Transformer), a fusion layer (with mismatch detection), and a generative model that summarises the emotional state in plain language. Includes a webcam / short-video timeline view.

All models are 100% free & open-source (Hugging Face Hub).

Components

Stage	Model	Type	Requirement satisfied
Visual emotion	`trpakov/vit-face-expression`	ViT	CNN/ViT for facial emotion ✅
Text sentiment	`j-hartmann/emotion-english-distilroberta-base`	Transformer	RNN/LSTM/Transformer ✅
Speech-to-text	`openai/whisper-tiny`	Whisper encoder-decoder	Audio → text channel ✅
Fusion	Valence-aligned multimodal fusion	rule-based + weighted	Fusion + mismatch ✅
Generative	`google/flan-t5-base`	seq2seq Transformer	Generative summary ✅
Webcam / video	OpenCV frame sampling + Plotly timeline	—	Real-time / video input ✅
Attention viz	ViT attention rollout + last-layer text attention	interpretability	Attention visualisation ✅

Run

Prerequisite: Python 3.10 – 3.13 (CPU is enough — no GPU required, no system ffmpeg required).

# 1. Clone / copy this folder onto the new machine, then:
cd "<path-to-folder>"

# 2. Create a virtual env
python -m venv .venv
.\.venv\Scripts\Activate.ps1        # Windows
# source .venv/bin/activate         # macOS / Linux

# 3. Install (use --only-binary to skip Rust/MSVC compilation on Py3.13)
python -m pip install --upgrade pip
pip install -r requirements.txt --only-binary=:all:

# 4. Launch
python app.py

Browser opens at http://127.0.0.1:7860.

To stop the app: press Ctrl+C in the terminal running python app.py.

First launch only: downloads ~~1.2 GB of models from Hugging Face into `~~/.cache/huggingface/` (cached for all future runs, fully offline afterwards).

That's it — no system packages, no ffmpeg, no GPU, no model files to download manually.

Tabs

🖼️ Image + Text — upload a face photo + type the spoken sentence → visual emotion bars, text emotion bars, fusion badge, generative summary. Optional attention-rollout heatmap on the face + per-token attention HTML when the toggle is on.
📹 Webcam / Video + Text — record a 3–10 s clip in the browser → per-frame emotion timeline chart, aggregated bars, fusion, summary.
🎙️ Audio + Image — record/upload audio + face photo. Whisper transcribes the audio; the transcript drives the text channel; full fusion + summary.
🎬 Video with Audio — record/upload a video with sound. Audio is extracted (imageio-ffmpeg), transcribed by Whisper, fed to the text classifier; frames produce the visual timeline; fused result + summary — no typing needed.
ℹ️ About — architecture & fusion logic.

Fusion / mismatch rule

Each modality's emotion distribution is mapped to a valence in [-1, +1].

Opposite-sign valences → MISMATCH DETECTED (amber 🟠)
Small delta → ALIGNED (green 🟢)
Otherwise → PARTIALLY ALIGNED (yellow 🟡)

The generative model is prompted with the structured signals and writes a 2–3 sentence empathetic summary.