This is the model used in the papers

N. Mousavi and F. Burkhardt: The Emotional Portrayal of an Ordinary Talk, Proc. ESSV 2026
Mousavi, Burkhardt and Schuller: Modeling Emotion in German Ordinary Speech, to be published

We used the embeddings of a transformer model that give emotional dimension values (trained on MSPPodcast: audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim) to train a Multi Layer Perceptron with layers = [1024, 64] , default learning rate (.0001) and Adam optimizer, no dropout, patience set to 10.

With the nkululeko framework Training data was the test set of Berlin Emodb and the whole of Italian Emovo database, for classification from audio to ["happy", "angry", "sad", "scared", "neutral"]. Cross-domain evaluation with Ravdess database, without the songs, resulted in .561 UAR

Here's the screenshot of this outcome:

We attach a test_model.py script to this model, so you should be able to try it yourself:

Usage: test_model.py [OPTIONS] MODEL AUDIO

  Predict emotion from an audio file using a nkululeko MLP + audwav2vec2
  model.

  MODEL  Path to the .model file (torch state dict saved by nkululeko).
  AUDIO  Path to the audio file (must be 16 kHz mono WAV).

  Example:
    uv run test_model.py my_experiment_0_011.model sample.wav
    uv run test_model.py my_experiment_0_011.model sample.wav --w2v2-root /data/audmodel/

Options:
  --w2v2-root DIR  Directory where the w2v2 onnx model is cached or will be
                   downloaded to.  [default: ./audmodel/]
  -h, --help       Show this message and exit.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for felixbur/modelingEmotionInGermanOrdinarySpeech

Base model

audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim

Finetuned

(4)

this model