Emotion Perception and Recognition from Speech

Wu, Chung-Hsien; Yeh, Jui-Feng; Chuang, Ze-Jing

doi:10.1007/978-1-84800-306-4_6

Chung-Hsien Wu²,
Jui-Feng Yeh³ &
Ze-Jing Chuang²

1344 Accesses
11 Citations

Abstract

With the increasing role of speech interfaces in human—computer interac¬tion applications, automatically recognizing emotions from human speech becomes more and more important. This chapter begins by introducing the correlations be¬tween basic speech features such as pitch, intensity, formants, MFCC, and so on, and the emotions. Several recognition methods are then described to illustrate the performance of the previously proposed models, including support vector machine (SVM), K-nearest neighbors (KNN), neural networks, and the like.

To give a more practical description of an emotion recognition procedure, a new approach to emotion recognition is provided as a case study. In this case study, the Intonation Groups (IGs) of the input speech signals are first defined and extracted for feature extraction. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to char¬acterize the feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Min¬imum Classification Error (MCE) algorithm. The IG-based feature vectors compen¬sated by the compensation vectors are used to train the Gaussian Mixture Models (GMMs) for each emotional state. The emotional state with the GMM having the maximal likelihood ratio is determined as the emotion state output.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bhatti, M. W., Wang, Y., & Guan, L. (2004). A neural network approach for human emotion recognition in speech. IEEE International Symposium on Circuits and Systems, Vancouver, Canada (pp. 181–184).
Google Scholar
Borod, J. C., & Madigan, N. K. (2000). Neuropsychology of emotion and emotional disorders: An overview and research directions. In J. C. Borod (Ed.), The Neuropsychology of Emotion (pp. 3–28). New York: Oxford University Press.
Google Scholar
Breazeal, C., & Aryananda, L. (2002). Recognition of affective communicative intent in robot-directed speech. Autonomic Robots, 12, 83–104.
Article MATH Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Article MATH Google Scholar
Chuang, Z. J., & Wu, C. H. (2004). Multi-modal emotion recognition from speech and text. International Journal of Computational Linguistics and Chinese Language Processing, 9(2), 1–18
Google Scholar
Cichosz, J., & Slot, K. (2005). Low-dimensional feature space derivation for emotion recognition. In Interspeech 2005, Lisbon, Portugal (pp. 477–480).
Google Scholar
Cichosz, J., & Slot, K. (2007). Emotion recognition in speech signal using emotion-extracting binary decision trees. ACII2007. http://www.di.uniba.it/intint/DC-ACII07/Chicosz.pdf
Cover, T. T., & Hart, P. E. (1967). Nearest neighbour pattern classification. IEEE Transactions on Information Theory, 13, 21–27.
Article MATH Google Scholar
Damasio, A. (1994). Descartes' error: Emotion, reason and the human brain. New York: Grosset/Putnam.
Google Scholar
Deng, L., Droppo, J., & Acero, A. (2003). Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition. IEEE Transactions on Speech and Au-dio, 11(6), 568–580.
Article Google Scholar
Devillers, L., Vidrascu, L., & Lamel, L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks, 18, 407–422.
Article Google Scholar
D'Mello, S. Picard, R. W. & Graesser, A. (2007). Towards an affect-sensitive autotutor. IEEE Intelligent Systems, Special issue on intelligent educational systems, 53–61.
Google Scholar
Engberg, I. S., & Hansen, A. V. (1996). Documentation of the Danish emotional speech database (DES). Internal AAU Report, Center for Person Kommunikation, Denmark.
Google Scholar
Fernandez, R., & Picard, R. W. (2005). Classical and novel discriminant features for affect recognition from speech. In Interspeech 2005, Lisbon, Portugal (pp. 473–476).
Google Scholar
Fragopanagos, N. & Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Networks, 18(4), 389–405.
Article Google Scholar
Huber, R., Noth, E., Batliner, A., Buckow, J., Warnke, V., & Niemann, H. (1998). You beep machine - emotion in automatic speech understanding systems. In Proceedings of the Workshop on Text, Speech, and Dialog. Masark University (pp. 223–228).
Google Scholar
Inanoglu, Z., & Caneel, R. (2005). Emotive alertHMM-based emotion detection in voicemail messages. In IEEE Intelligent User Interfaces '05, San Diego (pp. 251–253).
Google Scholar
Katz, G., Cohn, J., & Moore, C. (1996). A combination of vocal F0 dynamic and summary features discriminates between pragmatic categories of infant-directed speech. Child Development, 67, 205–217.
Google Scholar
Kwon O., Chan K., Hao J., & Lee T. (2003). Emotion recognition by speech signals. In Proceedings of Eurospeech 2003, Geneva (pp. 125–128).
Google Scholar
Lee, C.-H., Clements, M., Dusan, S., Fosler-Lussier, E., Johnson, K., Juang, B.-H., & Rabiner, L. (2007). An overview on automatic speech attribute transcription (ASAT). In Proceedings of Interspeech 2007, August 27–31, Antwerp, Belgium (pp. 1825–1828).
Google Scholar
Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.
Article Google Scholar
Levity, M., Huberz, R., Batlinery, A., & Noeth, E. (2001). Use of prosodic speech characteristics for automated detection of alcohol intoxication. In Prosody in Speech Recognition and Understanding, Molly Pitcher Inn, Red Bank, NJ.
Google Scholar
Liscombe, J., Riccardi, G., & Hakkani-Tr, D. (2005). Using context to improve emotion detection in spoken dialogue systems. In Proceedings of Interspeech, Lisbon, Portugal (pp. 1845–1848).
Google Scholar
Litman D., & Silliman, S. (2004). Itspoke. An intelligent tutoring spoken dialogue system. In Proceedings of the 4th Meeting of HLT/NAACL (Companion Proceedings), Boston, May (pp. 233–236).
Google Scholar
Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49, 98–112.
Article Google Scholar
Murray, I. R., & Arnott, J. L. (1993). Towards the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the Acoustic Society of America, 93(2), 1097–1108.
Article Google Scholar
Nakatsu, R., Solomides, A., & Tosa, N. (1999). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS '99) (vol. 2; pp. 804–808).
Google Scholar
Nwe, T., Foo, S., & De Silva, L. (2003). Speech emotion recognition using hidden Markov models. Speech Communications, 41(4), 603–623.
Article Google Scholar
Ortony, A., & Turner, T. J. (1990). What's basic about basic emotions? Psychological Review, 97, 315–331.
Article Google Scholar
Oudeyer, P. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human—Computer Studies, 59, 157–183.
Article Google Scholar
Paeschke, A., & Sendlmeier, W. (2000). Prosodic characteristics of emotional speech: Measurements of fundamental frequency movements. In Proceedings of ISCA ITRW on Speech and Emotion, Belfast (pp. 75–80).
Google Scholar
Pantic, M., & Rothkrantz, L. J. K. (2003). Toward an affect-sensitive multimodal human—computer interaction. Proceedings of the IEEE, 91(9), 1370–1390
Article Google Scholar
Parrott, W. (2001). Emotions in social psychology, Philadelphia: Psychology Press.
Google Scholar
Petrushin, V. (2000). Emotion recognition in speech signal: Experimental study, development, and application. In Proceedings of the Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing (pp. 222–225).
Google Scholar
Petrushin, V. A. (1999). Emotion in speech recognition and application to call centers. In Proceedings of Artificial Neural Networks In Engineering (ANNIE 99) (pp. 7–10).
Google Scholar
Picard, R. W., Vyzas, E., & Healey, J. (2001). Toward machine emotional intelligence: analysis of affective physiological state. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), 1175–1191
Article Google Scholar
Pierre-Yves, O. (2003). The production and recognition of emotions in speech: Features and algorithms. International Journal of Human-Computer Studies, 59, 157–183
Article Google Scholar
Rahurkar, M. A., & Hansen, J. H. L.(2003). Frequency distribution based weighted sub-band approach for classification of emotional/stressful content in speech. In Eighth European Conference on Speech Communication and Technology, Geneva (pp. 721–724).
Google Scholar
Reeves, B., & Nass, C. (1996). The media equation: How people treat computers, television, and new media like real people and places. University of Chicago Press, Chicago.
Google Scholar
Scherer, K. R. (1999). Appraisal theory. In T. Dalgleish, & M. Power (Eds.), Handbook of cognition and emotion (pp. 637–663). New York: John Wiley.
Chapter Google Scholar
Shami, M., & Kamel, M. (2005). Segment-based approach to the recognition of emotions in speech. In IEEE Conference on Multimedia and Expo (ICME05), Amsterdam, The Netherlands. http://ieeexplore.ieee.org/iel5/10203/32544/01521436.pdf?tp=&isnumber= &arnumber=1521436
Shami, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication 49, 201–212.
Article Google Scholar
Shriberg, E. (2005). Spontaneous speech: How people really talk and why engineers should care. In Eurospeech 2005, Lisbon, Portugal.
Google Scholar
Slaney, M., & McRoberts, G. (2003). A recognition system for affective vocalization. Speech Communication 39, 367–384.
Article MATH Google Scholar
Ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication 40(1–2), 213–225.
Article MATH Google Scholar
Vapnik, V. (2005). The nature of statistical learning theory. New York: Springer-Verlag.
Google Scholar
Ververidis, D., Kotropoulos, C., & Pitas, I. (2005). Automatic emotional speech classification. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal (pp. 593–596).
Google Scholar
Wu, J., & Huo, Q. (2002). An environment compensated minimum classification error training approach and its evaluation on aurora2 database. In Seventh International Conference on Spoken Language, Denver (pp. 453–456).
Google Scholar
Yacoub, S., Simske, S., Lin, X., & Burns, J. (2003). Recognition of emotion in interactive voice systems. In: Proceedings of Eurospeech 2003, Eighth European Conference on Speech Communication and Technology, Geneva (pp. 729–732).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan
Chung-Hsien Wu & Ze-Jing Chuang
National Chia-Yi University, Chia Yi
Jui-Feng Yeh

Authors

Chung-Hsien Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jui-Feng Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Ze-Jing Chuang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Haidian, Beijing, 100080, P.R. China
Jianhua Tao & Tieniu Tan BSc, MSc, PhD &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wu, CH., Yeh, JF., Chuang, ZJ. (2009). Emotion Perception and Recognition from Speech. In: Tao, J., Tan, T. (eds) Affective Information Processing. Springer, London. https://doi.org/10.1007/978-1-84800-306-4_6

Download citation

DOI: https://doi.org/10.1007/978-1-84800-306-4_6
Publisher Name: Springer, London
Print ISBN: 978-1-84800-305-7
Online ISBN: 978-1-84800-306-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics