Chapter

Text, Speech and Dialogue

Volume 5246 of the series Lecture Notes in Computer Science pp 525-534

Quantification of Segmentation and F0 Errors and Their Effect on Emotion Recognition

  • Stefan SteidlAffiliated withLehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg
  • , Anton BatlinerAffiliated withLehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg
  • , Elmar NöthAffiliated withLehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg
  • , Joachim HorneggerAffiliated withLehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Prosodic features modelling pitch, energy, and duration play a major role in speech emotion recognition. Our word level features, especially duration and pitch features, rely on correct word segmentation and F0 extraction. For the FAU Aibo Emotion Corpus, the automatic segmentation of a forced alignment of the spoken word sequence and the automatically extracted F0 values have been manually corrected. Frequencies of different types of segmentation and F0errors are given and their influence on emotion recognition using different groups of prosodic features is evaluated. The classification results show that the impact of these errors on emotion recognition is small.