Automatic Recognition of Emotions from Speech: A Review of the Literature and Recommendations for Practical Realisation

  • Thurid Vogt
  • Elisabeth André
  • Johannes Wagner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4868)


In this article we give guidelines on how to address the major technical challenges of automatic emotion recognition from speech in human-computer interfaces, which include audio segmentation to find appropriate units for emotions, extraction of emotion relevant features, classification of emotions, and training databases with emotional speech. Research so far has mostly dealt with offline evaluation of vocal emotions, and online processing has hardly been addressed. Online processing is, however, a necessary prerequisite for the realization of human-computer interfaces that analyze and respond to the user’s emotions while he or she is interacting with an application. By means of a sample application, we demonstrate how the challenges arising from online processing may be solved. The overall objective of the paper is to help readers to assess the feasibility of human-computer interfaces that are sensitive to the user’s emotional voice and to provide them with guidelines of how to technically realize such interfaces.


Emotion Recognition Automatic Recognition Voice Activity Detection Speech Emotion Recognition Natural Emotion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Murray, I., Arnott, J.: Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the Acoustical Society of America 93(2), 1097–1108 (1993)CrossRefGoogle Scholar
  2. 2.
    Wilting, J., Krahmer, E., Swerts, M.: Real vs. acted emotional speech. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)Google Scholar
  3. 3.
    Velten, E.: A laboratory task for induction of mood states. Behavior Research & Therapy 6, 473–482 (1968)CrossRefGoogle Scholar
  4. 4.
    Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: Proceedings of ICSLP, Philadelphia, USA (1996)Google Scholar
  5. 5.
    Devillers, L., Vidrascu, L., Lamel, L.: Challenges in real-life emotion annotation and machine learning based detection. Neural Networks 18(4), 407–422 (2005)CrossRefGoogle Scholar
  6. 6.
    Litman, D.J., Forbes-Riley, K.: Predicting student emotions in computer-human tutoring dialogues. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain (2004)Google Scholar
  7. 7.
    Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)Google Scholar
  8. 8.
    Engberg, I.S., Hansen, A.V.: Documentation of the Danish Emotional Speech Database (DES). Technical report. Aalborg University, Aalborg, Denmark (1996)Google Scholar
  9. 9.
    Schiel, F., Steininger, S., Türk, U.: The SmartKom multimodal corpus at BAS. In: Proceedings of the 3rd Language Resources & Evaluation Conference (LREC) 2002, Las Palmas, Gran Canaria, Spain, pp. 200–206 (2002)Google Scholar
  10. 10.
    Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M., Wong, M.: You stupid tin box - children interacting with the AIBO robot: A cross-linguistic emotional speech corpus. In: Proceedings of the 4th International Conference of Language Resources and Evaluation LREC 2004, Lisbon, pp. 171–174 (2004)Google Scholar
  11. 11.
    Tato, R., Santos, R., Kompe, R., Pardo, J.M.: Emotional space improves emotion recognition. In: Proceedings International Conference on Spoken Language Processing, Denver, pp. 2029–2032 (2002)Google Scholar
  12. 12.
    Yu, C., Aoki, P.M., Woodruff, A.: Detecting user engagement in everyday conversations. In: Proceedings of Interspeech 2004 — ICSLP, Jeju, Korea, pp. 1329–1332 (2004)Google Scholar
  13. 13.
    Grimm, M., Kroschel, K., Harris, H., Nass, C., Schuller, B., Rigoll, G., Moosmayr, T.: On the necessity and feasibility of detecting a driver‘s emotional state while driving. In: International Conference on Affective Computing and Intelligent Interaction, Lisbon, Portugal, pp. 126–138 (2007)Google Scholar
  14. 14.
    Kollias, S.: ERMIS — Emotionally Rich Man-machine Intelligent System. (2002) retrieved: 09.02.2007,
  15. 15.
    Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., Devillers, L., Vidrascu, L., Amir, N., Kessous, L., Aharonson, V.: Combining efforts for improving automatic classification of emotional user states. In: IS-LTC 2006, Ljubljana, Slovenia (2006)Google Scholar
  16. 16.
    Batliner, A., Fischer, K., Huber, R., Spilker, J., Nöth, E.: How to find trouble in communication. Speech Communication 40, 117–143 (2003)zbMATHCrossRefGoogle Scholar
  17. 17.
    Picard, R.W.: Affective Computing. MIT Press, Cambridge (1998)Google Scholar
  18. 18.
    Madan, A.: Jerk-O-Meter: Speech-Feature Analysis Provides Feedback on Your Phone Interactions (2005), retrieved: 28.06.2007,
  19. 19.
    Burkhardt, F., van Ballegooy, M., Englert, R., Huber, R.: An emotion-aware voice portal. In: Electronic Speech Signal Processing Conference, Prague, Czech Republic (2005)Google Scholar
  20. 20.
    Riccardi, G., Hakkani-Tür, D.: Grounding emotions in human-machine conversational systems. In: Proceedings of Intelligent Technologies for Interactive Entertainment, INTETAIN, Madonna di Campiglio, Italy (2005)Google Scholar
  21. 21.
    Ai, H., Litman, D.J., Forbes-Riley, K., Rotaru, M., Tetreault, J., Purandare, A.: Using system and user performance features to improve emotion detection in spoken tutoring dialogs. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)Google Scholar
  22. 22.
    Jones, C., Jonsson, I.: Using Paralinguistic Cues in Speech to Recognise Emotions in Older Car Drivers. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)Google Scholar
  23. 23.
    Schuller, B., Rigoll, G., Grimm, M., Kroschel, K., Moosmayr, T., Ruske, G.: Effects of in-car noise-conditions on the recognition of emotion within speech. In: Proc. of the DAGA 2007, Stuttgart, Germany (2007)Google Scholar
  24. 24.
    Jones, C., Sutherland, J.: Acoustic Emotion Recognition for Affective Computer Gaming. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)Google Scholar
  25. 25.
    Jones, C., Deeming, A.: Affective Human-Robotic Interaction. In: Peter, C., Beale, R. (eds.) Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)Google Scholar
  26. 26.
    Hegel, F., Spexard, T., Vogt, T., Horstmann, G., Wrede, B.: Playing a different imitation game: Interaction with an empathic android robot. In: Proc. 2006 IEEE-RAS International Conference on Humanoid Robots (Humanoids 2006) (2006)Google Scholar
  27. 27.
    Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: Proceedings of International Conference on Multimedia & Expo., Amsterdam, The Netherlands (2005)Google Scholar
  28. 28.
    Fernandez, R., Picard, R.W.: Classical and novel discriminant features for affect recognition from speech. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)Google Scholar
  29. 29.
    Oudeyer, P.Y.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies 59(1–2), 157–183 (2003)Google Scholar
  30. 30.
    Schuller, B., Müller, R., Lang, M., Rigoll, G.: Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles. In: Proceedings of Interspeech 2005, Lisbon, Portugal (2005)Google Scholar
  31. 31.
    Nicholas, G., Rotaru, M., Litman, D.J.: Exploiting word-level features for emotion recognition. In: Proceedings of the IEEE/ACL Workshop on Spoken Language Technology, Aruba (2006)Google Scholar
  32. 32.
    Batliner, A., Zeißler, V., Frank, C., Adelhardt, J., Shi, R.P., Nöth, E.: We are not amused - but how do you know? User states in a multi-modal dialogue system. In: Proceedings of Eurospeech 2003, Geneva, Switzerland, pp. 733–736 (2003)Google Scholar
  33. 33.
    Kwon, O.W., Chan, K., Hao, J., Lee, T.W.: Emotion recognition by speech signals. In: Proceedings of Eurospeech 2003, Geneva, Switzerland, pp. 125–128 (2003)Google Scholar
  34. 34.
    Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Transaction on speech and audio processing 13(2), 293–303 (2005)CrossRefGoogle Scholar
  35. 35.
    Zhang, S.,, P.: C.C., Kong, F.: Automatic emotion recognition of speech signal in mandarin. In: Proceedings of Interspeech 2006 — ICSLP, Pittsburgh, PA, USA (2006)Google Scholar
  36. 36.
    Vogt, T., André, E.: Improving automatic emotion recognition from speech via gender differentiation. In: Proc. Language Resources and Evaluation Conference (LREC 2006), Genoa (2006)Google Scholar
  37. 37.
    Wagner, J., Vogt, T., André, E.: A systematic comparison of different hmm designs for emotion recognition from acted and spontaneous speech. In: International Conference on Affective Computing and Intelligent Interaction (ACII), Lisbon, Portugal, pp. 114–125 (2007)Google Scholar
  38. 38.
    Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden markov models. Speech Communication 41, 603–623 (2003)CrossRefGoogle Scholar
  39. 39.
    Petrushin, V.A.: Creating emotion recognition agents for speech signal. In: Dautenhahn, K., Bond, A.H., Canamero, L., Edmonds, B. (eds.) Socially Intelligent Agents. Creating Relationships with Computers and Robots, pp. 77–84. Kluwer Academic Publishers, Dordrecht (2002)Google Scholar
  40. 40.
    Scherer, K.R., Banse, R., Walbott, H.G., Goldbeck, T.: Vocal clues in emotion encoding and decoding. Motivation and Emotion 15, 123–148 (1991)CrossRefGoogle Scholar
  41. 41.
    Polzin, T.S., Waibel, A.H.: Detecting emotions in speech. In: Proceedings of Cooperative Multimodal Communications, Tilburg, The Netherlands (1998)Google Scholar
  42. 42.
    Polzin, T.S., Waibel, A.H.: Emotion-sensitive human-computer interfaces. In: Workshop on Speech and Emotion, Newcastle, Northern Ireland, UK, pp. 201–206 (2000)Google Scholar
  43. 43.
    Lee, C.M., Yildirim, S., Bulut, M., Kazemzadeh, A.: Emotion recognition based on phoneme classes. In: Proceedings of Interspeech 2004 — ICSLP, Jeju, Korea (2004)Google Scholar
  44. 44.
    Nogueiras, A., Moreno, A., Bonafonte, A., No, J.M.: Speech emotion recognition using hidden markov models. In: Proceedings of Eurospeech, Aalborg, Denmark (2001)Google Scholar
  45. 45.
    Gratch, J., Okhmatovskaia, A., Lamothe, F., Marsella, S., Morales, M., van der Werf, R.J., Morency, L.P.: Virtual rapport. In: 6th International Conference on Intelligent Virtual Agents, Marina del Rey, USA (2006)Google Scholar
  46. 46.
    de Rosis, F., Pelachaud, C., Poggi, I., Carofiglio, V., de Carolis, B.: From Greta’s mind to her face: modelling the dynamics of affective states in a conversational embodied agent. International Journal of Human-Computer Studies 59, 81–118 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Thurid Vogt
    • 1
  • Elisabeth André
    • 1
  • Johannes Wagner
    • 1
  1. 1.Multimedia Concepts and ApplicationsUniversity of AugsburgAugsburgGermany

Personalised recommendations