Skip to main content

Emotion Perception and Recognition from Speech

  • Chapter
Affective Information Processing

Abstract

With the increasing role of speech interfaces in human—computer interac¬tion applications, automatically recognizing emotions from human speech becomes more and more important. This chapter begins by introducing the correlations be¬tween basic speech features such as pitch, intensity, formants, MFCC, and so on, and the emotions. Several recognition methods are then described to illustrate the performance of the previously proposed models, including support vector machine (SVM), K-nearest neighbors (KNN), neural networks, and the like.

To give a more practical description of an emotion recognition procedure, a new approach to emotion recognition is provided as a case study. In this case study, the Intonation Groups (IGs) of the input speech signals are first defined and extracted for feature extraction. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to char¬acterize the feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Min¬imum Classification Error (MCE) algorithm. The IG-based feature vectors compen¬sated by the compensation vectors are used to train the Gaussian Mixture Models (GMMs) for each emotional state. The emotional state with the GMM having the maximal likelihood ratio is determined as the emotion state output.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bhatti, M. W., Wang, Y., & Guan, L. (2004). A neural network approach for human emotion recognition in speech. IEEE International Symposium on Circuits and Systems, Vancouver, Canada (pp. 181–184).

    Google Scholar 

  2. Borod, J. C., & Madigan, N. K. (2000). Neuropsychology of emotion and emotional disorders: An overview and research directions. In J. C. Borod (Ed.), The Neuropsychology of Emotion (pp. 3–28). New York: Oxford University Press.

    Google Scholar 

  3. Breazeal, C., & Aryananda, L. (2002). Recognition of affective communicative intent in robot-directed speech. Autonomic Robots, 12, 83–104.

    Article  MATH  Google Scholar 

  4. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Article  MATH  Google Scholar 

  5. Chuang, Z. J., & Wu, C. H. (2004). Multi-modal emotion recognition from speech and text. International Journal of Computational Linguistics and Chinese Language Processing, 9(2), 1–18

    Google Scholar 

  6. Cichosz, J., & Slot, K. (2005). Low-dimensional feature space derivation for emotion recognition. In Interspeech 2005, Lisbon, Portugal (pp. 477–480).

    Google Scholar 

  7. Cichosz, J., & Slot, K. (2007). Emotion recognition in speech signal using emotion-extracting binary decision trees. ACII2007. http://www.di.uniba.it/intint/DC-ACII07/Chicosz.pdf

  8. Cover, T. T., & Hart, P. E. (1967). Nearest neighbour pattern classification. IEEE Transactions on Information Theory, 13, 21–27.

    Article  MATH  Google Scholar 

  9. Damasio, A. (1994). Descartes' error: Emotion, reason and the human brain. New York: Grosset/Putnam.

    Google Scholar 

  10. Deng, L., Droppo, J., & Acero, A. (2003). Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition. IEEE Transactions on Speech and Au-dio, 11(6), 568–580.

    Article  Google Scholar 

  11. Devillers, L., Vidrascu, L., & Lamel, L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks, 18, 407–422.

    Article  Google Scholar 

  12. D'Mello, S. Picard, R. W. & Graesser, A. (2007). Towards an affect-sensitive autotutor. IEEE Intelligent Systems, Special issue on intelligent educational systems, 53–61.

    Google Scholar 

  13. Engberg, I. S., & Hansen, A. V. (1996). Documentation of the Danish emotional speech database (DES). Internal AAU Report, Center for Person Kommunikation, Denmark.

    Google Scholar 

  14. Fernandez, R., & Picard, R. W. (2005). Classical and novel discriminant features for affect recognition from speech. In Interspeech 2005, Lisbon, Portugal (pp. 473–476).

    Google Scholar 

  15. Fragopanagos, N. & Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Networks, 18(4), 389–405.

    Article  Google Scholar 

  16. Huber, R., Noth, E., Batliner, A., Buckow, J., Warnke, V., & Niemann, H. (1998). You beep machine - emotion in automatic speech understanding systems. In Proceedings of the Workshop on Text, Speech, and Dialog. Masark University (pp. 223–228).

    Google Scholar 

  17. Inanoglu, Z., & Caneel, R. (2005). Emotive alertHMM-based emotion detection in voicemail messages. In IEEE Intelligent User Interfaces '05, San Diego (pp. 251–253).

    Google Scholar 

  18. Katz, G., Cohn, J., & Moore, C. (1996). A combination of vocal F0 dynamic and summary features discriminates between pragmatic categories of infant-directed speech. Child Development, 67, 205–217.

    Google Scholar 

  19. Kwon O., Chan K., Hao J., & Lee T. (2003). Emotion recognition by speech signals. In Proceedings of Eurospeech 2003, Geneva (pp. 125–128).

    Google Scholar 

  20. Lee, C.-H., Clements, M., Dusan, S., Fosler-Lussier, E., Johnson, K., Juang, B.-H., & Rabiner, L. (2007). An overview on automatic speech attribute transcription (ASAT). In Proceedings of Interspeech 2007, August 27–31, Antwerp, Belgium (pp. 1825–1828).

    Google Scholar 

  21. Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.

    Article  Google Scholar 

  22. Levity, M., Huberz, R., Batlinery, A., & Noeth, E. (2001). Use of prosodic speech characteristics for automated detection of alcohol intoxication. In Prosody in Speech Recognition and Understanding, Molly Pitcher Inn, Red Bank, NJ.

    Google Scholar 

  23. Liscombe, J., Riccardi, G., & Hakkani-Tr, D. (2005). Using context to improve emotion detection in spoken dialogue systems. In Proceedings of Interspeech, Lisbon, Portugal (pp. 1845–1848).

    Google Scholar 

  24. Litman D., & Silliman, S. (2004). Itspoke. An intelligent tutoring spoken dialogue system. In Proceedings of the 4th Meeting of HLT/NAACL (Companion Proceedings), Boston, May (pp. 233–236).

    Google Scholar 

  25. Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49, 98–112.

    Article  Google Scholar 

  26. Murray, I. R., & Arnott, J. L. (1993). Towards the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the Acoustic Society of America, 93(2), 1097–1108.

    Article  Google Scholar 

  27. Nakatsu, R., Solomides, A., & Tosa, N. (1999). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS '99) (vol. 2; pp. 804–808).

    Google Scholar 

  28. Nwe, T., Foo, S., & De Silva, L. (2003). Speech emotion recognition using hidden Markov models. Speech Communications, 41(4), 603–623.

    Article  Google Scholar 

  29. Ortony, A., & Turner, T. J. (1990). What's basic about basic emotions? Psychological Review, 97, 315–331.

    Article  Google Scholar 

  30. Oudeyer, P. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human—Computer Studies, 59, 157–183.

    Article  Google Scholar 

  31. Paeschke, A., & Sendlmeier, W. (2000). Prosodic characteristics of emotional speech: Measurements of fundamental frequency movements. In Proceedings of ISCA ITRW on Speech and Emotion, Belfast (pp. 75–80).

    Google Scholar 

  32. Pantic, M., & Rothkrantz, L. J. K. (2003). Toward an affect-sensitive multimodal human—computer interaction. Proceedings of the IEEE, 91(9), 1370–1390

    Article  Google Scholar 

  33. Parrott, W. (2001). Emotions in social psychology, Philadelphia: Psychology Press.

    Google Scholar 

  34. Petrushin, V. (2000). Emotion recognition in speech signal: Experimental study, development, and application. In Proceedings of the Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing (pp. 222–225).

    Google Scholar 

  35. Petrushin, V. A. (1999). Emotion in speech recognition and application to call centers. In Proceedings of Artificial Neural Networks In Engineering (ANNIE 99) (pp. 7–10).

    Google Scholar 

  36. Picard, R. W., Vyzas, E., & Healey, J. (2001). Toward machine emotional intelligence: analysis of affective physiological state. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), 1175–1191

    Article  Google Scholar 

  37. Pierre-Yves, O. (2003). The production and recognition of emotions in speech: Features and algorithms. International Journal of Human-Computer Studies, 59, 157–183

    Article  Google Scholar 

  38. Rahurkar, M. A., & Hansen, J. H. L.(2003). Frequency distribution based weighted sub-band approach for classification of emotional/stressful content in speech. In Eighth European Conference on Speech Communication and Technology, Geneva (pp. 721–724).

    Google Scholar 

  39. Reeves, B., & Nass, C. (1996). The media equation: How people treat computers, television, and new media like real people and places. University of Chicago Press, Chicago.

    Google Scholar 

  40. Scherer, K. R. (1999). Appraisal theory. In T. Dalgleish, & M. Power (Eds.), Handbook of cognition and emotion (pp. 637–663). New York: John Wiley.

    Chapter  Google Scholar 

  41. Shami, M., & Kamel, M. (2005). Segment-based approach to the recognition of emotions in speech. In IEEE Conference on Multimedia and Expo (ICME05), Amsterdam, The Netherlands. http://ieeexplore.ieee.org/iel5/10203/32544/01521436.pdf?tp=&isnumber= &arnumber=1521436

  42. Shami, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication 49, 201–212.

    Article  Google Scholar 

  43. Shriberg, E. (2005). Spontaneous speech: How people really talk and why engineers should care. In Eurospeech 2005, Lisbon, Portugal.

    Google Scholar 

  44. Slaney, M., & McRoberts, G. (2003). A recognition system for affective vocalization. Speech Communication 39, 367–384.

    Article  MATH  Google Scholar 

  45. Ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication 40(1–2), 213–225.

    Article  MATH  Google Scholar 

  46. Vapnik, V. (2005). The nature of statistical learning theory. New York: Springer-Verlag.

    Google Scholar 

  47. Ververidis, D., Kotropoulos, C., & Pitas, I. (2005). Automatic emotional speech classification. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal (pp. 593–596).

    Google Scholar 

  48. Wu, J., & Huo, Q. (2002). An environment compensated minimum classification error training approach and its evaluation on aurora2 database. In Seventh International Conference on Spoken Language, Denver (pp. 453–456).

    Google Scholar 

  49. Yacoub, S., Simske, S., Lin, X., & Burns, J. (2003). Recognition of emotion in interactive voice systems. In: Proceedings of Eurospeech 2003, Eighth European Conference on Speech Communication and Technology, Geneva (pp. 729–732).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag London Limited

About this chapter

Cite this chapter

Wu, CH., Yeh, JF., Chuang, ZJ. (2009). Emotion Perception and Recognition from Speech. In: Tao, J., Tan, T. (eds) Affective Information Processing. Springer, London. https://doi.org/10.1007/978-1-84800-306-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-306-4_6

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-305-7

  • Online ISBN: 978-1-84800-306-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics