KI - Künstliche Intelligenz

, Volume 25, Issue 3, pp 213–223 | Cite as

An Evaluation of Emotion Units and Feature Types for Real-Time Speech Emotion Recognition

  • Thurid VogtEmail author
  • Elisabeth André


Emotion recognition from speech in real-time is an upcoming research topic and the consideration of real-time constraints concerns all aspects of the recognition system. We present here a comparison of units and feature types for speech emotion recognition. To our knowledge, a comprehensive comparison of many different units on several databases is still missing in the literature and we also discuss units with special emphasis on real-time processing, that is, we do not only consider accuracy but also speed and ease of calculation. For the feature types, we also use only features that can be extracted fully automatically in real-time and look at which types best characterise which emotion classes. Gained insights are used as validation of methodology for our online speech emotion recognition system EmoVoice.


Speech emotion recognition Emotion units Acoustic features 



This work was partially funded by the EC in the Integrated Projects CALLAS (IST-34800) and CEEDS (258749) and the IRIS Network of Excellence (231824).


  1. 1.
    Batliner A et al. (2003) How to find trouble in communication. Speech Commun 40(1–2):117–143 zbMATHCrossRefGoogle Scholar
  2. 2.
    Batliner A et al. (2003) We are not amused—but how do you know? User states in a multi-modal dialogue system. In: Proc of INTERSPEECH, Geneva, Switzerland Google Scholar
  3. 3.
    Batliner A et al. (2006) Combining efforts for improving automatic classification of emotional user states. In: Proc of IS-LTC, Ljubljana, Slovenia Google Scholar
  4. 4.
    Batliner A et al. (2010) Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach. Adv Hum Comput Interact. doi: 10.1155/2010/782802 Google Scholar
  5. 5.
    Batliner A et al. (2011) Whodunnit—searching for the most important feature types signalling emotion-related user states in speech. Comput Speech Lang 25:4–28 CrossRefGoogle Scholar
  6. 6.
    Burkhardt F et al. (2005) A database of German emotional speech. In: Proc of INTERSPEECH, Lisbon, Portugal Google Scholar
  7. 7.
    Busso C et al. (2007) Using neutral speech models for emotional speech analysis. In: Proc of INTERSPEECH, Antwerp Belgium Google Scholar
  8. 8.
    Devillers L et al. (2005) Challenges in real-life emotion annotation and machine learning based detection. Neural Netw 18(4):407–422 CrossRefGoogle Scholar
  9. 9.
    Eyben F et al. (2009) openEAR—introducing the Munich open-source emotion and affect recognition toolkit. In: Proc of ACII, Amsterdam, The Netherlands Google Scholar
  10. 10.
    Fernandez R, Picard RW (2005) Classical and novel discriminant features for affect recognition from speech. In: Proc of INTERSPEECH, Lisbon, Portugal, pp 473–476 Google Scholar
  11. 11.
    Fink G (1999) Developing HMM-based recognizers with ESMERALDA. In: Proc of TSD, Plzen, Czech Republic Google Scholar
  12. 12.
    Haindl M et al. (2006) Feature selection based on mutual correlation. In: Proc of IBEROAM, Congress on pattern recognit, Cancun, Mexico Google Scholar
  13. 13.
    Hall MA (1998) Correlation-based feature subset selection for machine learning. Master’s thesis, University of Waikato, Hamilton, New Zealand Google Scholar
  14. 14.
    Huang R, Ma C (2006) Toward a speaker-independent real-time affect detection system. In: Proc of ICPR, Hong Kong, China Google Scholar
  15. 15.
    Kießling A (1996) Extraktion und Klassifikation prosodischer Merkmale in der automatischen Sprachverarbeitung. PhD thesis, University Erlangen-Nuremberg, Germany Google Scholar
  16. 16.
    Kim EH et al. (2009) Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Trans Mechatron 14(3):317–325 CrossRefGoogle Scholar
  17. 17.
    Lee CM, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 13(2):293–303 CrossRefGoogle Scholar
  18. 18.
    Lee WS et al. (2008) Speech emotion recognition using spectral entropy. In: Proc of ICIRA, Wuhan, China Google Scholar
  19. 19.
    Litman D, Forbes-Riley K (2006) Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun 48(5):559–590 CrossRefGoogle Scholar
  20. 20.
    Luengo I et al. (2009) Combining spectral and prosodic information for emotion recognition in the INTERSPEECH 2009 emotion challenge. In: Proc of INTERSPEECH, Brighton, UK Google Scholar
  21. 21.
    Lugger M, Yang B (2007) An incremental analysis of different feature groups in speaker independent emotion recognition. In: Proc of ICPhS, Saarbrücken, Germany, pp 2149–2152 Google Scholar
  22. 22.
    Lugger M, Yang B (2009) On the relevance of high-level features for speaker independent emotion recognition of spontaneous speech. In: Proc of INTERSPEECH, Brighton, UK Google Scholar
  23. 23.
    Mansoorizadeh M, Charkari NM (2007) Speech emotion recognition: Comparison of speech segmentation approaches. In: Proc of IKT, Mashad, Iran Google Scholar
  24. 24.
    Nwe TL et al. (2003) Speech emotion recognition using hidden Markov models. Speech Commun 41(4):603–623 CrossRefGoogle Scholar
  25. 25.
    Oudeyer PY (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum-Comput Stud 59(1–2):157–183 Google Scholar
  26. 26.
    Schiel F et al. (2002) The SmartKom multimodal corpus at BAS. In: Proc of LREC, Las Palmas, Gran Canaria, Spain Google Scholar
  27. 27.
    Schuller B, Devillers L (2010) Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm. In: Proc of Interspeech, Makuhari, Japan Google Scholar
  28. 28.
    Schuller B, Rigoll G (2006) Timing levels in segment-based speech emotion recognition. In: Proc of INTERSPEECH, Pittsburgh, PA, USA Google Scholar
  29. 29.
    Schuller B et al. (2003) Hidden Markov Model-based speech emotion recognition. In: Proc of ICASSP, Hong Kong, China Google Scholar
  30. 30.
    Schuller B et al. (2005) Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proc of INTERSPEECH, Lisbon, Portugal Google Scholar
  31. 31.
    Schuller B et al. (2007) The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proc of INTERSPEECH, Antwerp. Belgium Google Scholar
  32. 32.
    Sethu V et al. (2009) Pitch contour parameterisation based on linear stylisation for emotion recognition. In: Proc of INTERSPEECH, Brighton, UK Google Scholar
  33. 33.
    Shami MT, Kamel MS (2005) Segment-based approach to the recognition of emotions in speech. In: Proc of ICME, Amsterdam, The Netherlands Google Scholar
  34. 34.
    Steidl S (2009) Automatic classification of emotion-related user states in spontaneous children’s speech. Logos Verlag, Berlin Google Scholar
  35. 35.
    Steininger S et al. (2002) Development of user-state conventions for the multimodal corpus in SmartKom. In: Proc of LREC workshop on multimodal res and multimodal syst eval, Las Palmas, Gran Canaria, Spain Google Scholar
  36. 36.
    Sun R et al. (2009) Investigating glottal parameters for differentiating emotional categories with similar prosodics. In: Proc of ICASSP, Taipei, Taiwan Google Scholar
  37. 37.
    Tato R et al. (2002) Emotional space improves emotion recognition. In: Proc of INTERSPEECH, Denver, CO, USA Google Scholar
  38. 38.
    Vlasenko B et al. (2008) Balancing spoken content adaptation and unit length in the recognition of emotion and interest. In: Proc of INTERSPEECH, Brisbane, Australia Google Scholar
  39. 39.
    Vogt T, André E (2005) Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: Proc of ICME, Amsterdam, The Netherlands Google Scholar
  40. 40.
    Vogt T et al. (2008) EmoVoice—a framework for online recognition of emotions from voice. In: Proc of PIT, Irsee, Germany, pp 188–199 Google Scholar
  41. 41.
    Wagner J et al. (2007) A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In: Proc of ACII, Lisbon, Portugal Google Scholar
  42. 42.
    Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Kaufmann, San Francisco zbMATHGoogle Scholar
  43. 43.
    Yun S, Yoo CD (2009) Speech emotion recognition via a max-margin framework incorporating a loss function based on the Watson and Tellegen’s emotion model. In: Proc of ICASSP, Taipei, Taiwan Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Lab for Human Centered MultimediaAugsburg UniversityAugsburgGermany

Personalised recommendations