Quantification of Segmentation and F0 Errors and Their Effect on Emotion Recognition

  • Stefan Steidl
  • Anton Batliner
  • Elmar Nöth
  • Joachim Hornegger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5246)

Abstract

Prosodic features modelling pitch, energy, and duration play a major role in speech emotion recognition. Our word level features, especially duration and pitch features, rely on correct word segmentation and F0 extraction. For the FAU Aibo Emotion Corpus, the automatic segmentation of a forced alignment of the spoken word sequence and the automatically extracted F0 values have been manually corrected. Frequencies of different types of segmentation and F0errors are given and their influence on emotion recognition using different groups of prosodic features is evaluated. The classification results show that the impact of these errors on emotion recognition is small.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    de Cheveigné, A., Kawahara, H.: Comparative Evaluation of F0 estimation algorithms. In: Proc. Eurospeech 2001, Aalborg, Denmark, pp. 2451–2454.Google Scholar
  2. 2.
    Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: The Impact of F0 Extraction Errors on the Classification of Prominence and Emotion. In: Proc. ICPhS 2007, Saarbrücken, Germany, pp. 2201–2204 (2007)Google Scholar
  3. 3.
    Stemmer, G.: Modeling Variability in Speech Recognition. Logos Verlag, Berlin (2005)Google Scholar
  4. 4.
    Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech coding and synthesis, pp. 495–518. Elsevier Science, Amsterdam (1995)Google Scholar
  5. 5.
    Batliner, A., Steidl, S., Nöth, E.: Laryngealizations and Emotions: How Many Babushkas? In: Proc. of International Workshop on Paralinguistic Speech – between Models and Data (ParaLing 2007), DFKI, Saarbrücken, Germany, pp. 17–22 (2007)Google Scholar
  6. 6.
    Batliner, A., Burger, S., Kießling, A.: MÜSLI: A Classification Scheme For Laryngealizations. In: House, D., Touati, P. (eds.) Proc. of an ESCA Workshop on Prosody, Lund University, Lund, Sweden, pp. 176–179 (1993)Google Scholar
  7. 7.
    Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: How to find trouble in communication. Speech Communication 40, 117–143 (2003)CrossRefGoogle Scholar
  8. 8.
    Zell, A., Mache, N., Sommer, T., Korb, T.: The SNNS Neural Network Simulator. In: Radig, B. (ed.) Proc. of Mustererkennung 1991, 13. DAGM-Symposium, München, Germany, Informatik-Fachberichte, vol. 290, pp. 454–461. Springer, Heidelberg (1991)Google Scholar
  9. 9.
    Kochanski, G., Grabe, E., Coleman, J., Rosner, B.: Loudness predicts Prominence. Fundamental Frequency lends little. JASA 11, 1038–1054 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Stefan Steidl
    • 1
  • Anton Batliner
    • 1
  • Elmar Nöth
    • 1
  • Joachim Hornegger
    • 1
  1. 1.Lehrstuhl für MustererkennungFriedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany

Personalised recommendations