Abstract
With the increasing role of speech interfaces in human—computer interac¬tion applications, automatically recognizing emotions from human speech becomes more and more important. This chapter begins by introducing the correlations be¬tween basic speech features such as pitch, intensity, formants, MFCC, and so on, and the emotions. Several recognition methods are then described to illustrate the performance of the previously proposed models, including support vector machine (SVM), K-nearest neighbors (KNN), neural networks, and the like.
To give a more practical description of an emotion recognition procedure, a new approach to emotion recognition is provided as a case study. In this case study, the Intonation Groups (IGs) of the input speech signals are first defined and extracted for feature extraction. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to char¬acterize the feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Min¬imum Classification Error (MCE) algorithm. The IG-based feature vectors compen¬sated by the compensation vectors are used to train the Gaussian Mixture Models (GMMs) for each emotional state. The emotional state with the GMM having the maximal likelihood ratio is determined as the emotion state output.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bhatti, M. W., Wang, Y., & Guan, L. (2004). A neural network approach for human emotion recognition in speech. IEEE International Symposium on Circuits and Systems, Vancouver, Canada (pp. 181–184).
Borod, J. C., & Madigan, N. K. (2000). Neuropsychology of emotion and emotional disorders: An overview and research directions. In J. C. Borod (Ed.), The Neuropsychology of Emotion (pp. 3–28). New York: Oxford University Press.
Breazeal, C., & Aryananda, L. (2002). Recognition of affective communicative intent in robot-directed speech. Autonomic Robots, 12, 83–104.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Chuang, Z. J., & Wu, C. H. (2004). Multi-modal emotion recognition from speech and text. International Journal of Computational Linguistics and Chinese Language Processing, 9(2), 1–18
Cichosz, J., & Slot, K. (2005). Low-dimensional feature space derivation for emotion recognition. In Interspeech 2005, Lisbon, Portugal (pp. 477–480).
Cichosz, J., & Slot, K. (2007). Emotion recognition in speech signal using emotion-extracting binary decision trees. ACII2007. http://www.di.uniba.it/intint/DC-ACII07/Chicosz.pdf
Cover, T. T., & Hart, P. E. (1967). Nearest neighbour pattern classification. IEEE Transactions on Information Theory, 13, 21–27.
Damasio, A. (1994). Descartes' error: Emotion, reason and the human brain. New York: Grosset/Putnam.
Deng, L., Droppo, J., & Acero, A. (2003). Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition. IEEE Transactions on Speech and Au-dio, 11(6), 568–580.
Devillers, L., Vidrascu, L., & Lamel, L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks, 18, 407–422.
D'Mello, S. Picard, R. W. & Graesser, A. (2007). Towards an affect-sensitive autotutor. IEEE Intelligent Systems, Special issue on intelligent educational systems, 53–61.
Engberg, I. S., & Hansen, A. V. (1996). Documentation of the Danish emotional speech database (DES). Internal AAU Report, Center for Person Kommunikation, Denmark.
Fernandez, R., & Picard, R. W. (2005). Classical and novel discriminant features for affect recognition from speech. In Interspeech 2005, Lisbon, Portugal (pp. 473–476).
Fragopanagos, N. & Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Networks, 18(4), 389–405.
Huber, R., Noth, E., Batliner, A., Buckow, J., Warnke, V., & Niemann, H. (1998). You beep machine - emotion in automatic speech understanding systems. In Proceedings of the Workshop on Text, Speech, and Dialog. Masark University (pp. 223–228).
Inanoglu, Z., & Caneel, R. (2005). Emotive alertHMM-based emotion detection in voicemail messages. In IEEE Intelligent User Interfaces '05, San Diego (pp. 251–253).
Katz, G., Cohn, J., & Moore, C. (1996). A combination of vocal F0 dynamic and summary features discriminates between pragmatic categories of infant-directed speech. Child Development, 67, 205–217.
Kwon O., Chan K., Hao J., & Lee T. (2003). Emotion recognition by speech signals. In Proceedings of Eurospeech 2003, Geneva (pp. 125–128).
Lee, C.-H., Clements, M., Dusan, S., Fosler-Lussier, E., Johnson, K., Juang, B.-H., & Rabiner, L. (2007). An overview on automatic speech attribute transcription (ASAT). In Proceedings of Interspeech 2007, August 27–31, Antwerp, Belgium (pp. 1825–1828).
Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.
Levity, M., Huberz, R., Batlinery, A., & Noeth, E. (2001). Use of prosodic speech characteristics for automated detection of alcohol intoxication. In Prosody in Speech Recognition and Understanding, Molly Pitcher Inn, Red Bank, NJ.
Liscombe, J., Riccardi, G., & Hakkani-Tr, D. (2005). Using context to improve emotion detection in spoken dialogue systems. In Proceedings of Interspeech, Lisbon, Portugal (pp. 1845–1848).
Litman D., & Silliman, S. (2004). Itspoke. An intelligent tutoring spoken dialogue system. In Proceedings of the 4th Meeting of HLT/NAACL (Companion Proceedings), Boston, May (pp. 233–236).
Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49, 98–112.
Murray, I. R., & Arnott, J. L. (1993). Towards the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. Journal of the Acoustic Society of America, 93(2), 1097–1108.
Nakatsu, R., Solomides, A., & Tosa, N. (1999). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS '99) (vol. 2; pp. 804–808).
Nwe, T., Foo, S., & De Silva, L. (2003). Speech emotion recognition using hidden Markov models. Speech Communications, 41(4), 603–623.
Ortony, A., & Turner, T. J. (1990). What's basic about basic emotions? Psychological Review, 97, 315–331.
Oudeyer, P. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human—Computer Studies, 59, 157–183.
Paeschke, A., & Sendlmeier, W. (2000). Prosodic characteristics of emotional speech: Measurements of fundamental frequency movements. In Proceedings of ISCA ITRW on Speech and Emotion, Belfast (pp. 75–80).
Pantic, M., & Rothkrantz, L. J. K. (2003). Toward an affect-sensitive multimodal human—computer interaction. Proceedings of the IEEE, 91(9), 1370–1390
Parrott, W. (2001). Emotions in social psychology, Philadelphia: Psychology Press.
Petrushin, V. (2000). Emotion recognition in speech signal: Experimental study, development, and application. In Proceedings of the Sixth International Conference on Spoken Language Processing (ICSLP 2000), Beijing (pp. 222–225).
Petrushin, V. A. (1999). Emotion in speech recognition and application to call centers. In Proceedings of Artificial Neural Networks In Engineering (ANNIE 99) (pp. 7–10).
Picard, R. W., Vyzas, E., & Healey, J. (2001). Toward machine emotional intelligence: analysis of affective physiological state. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(10), 1175–1191
Pierre-Yves, O. (2003). The production and recognition of emotions in speech: Features and algorithms. International Journal of Human-Computer Studies, 59, 157–183
Rahurkar, M. A., & Hansen, J. H. L.(2003). Frequency distribution based weighted sub-band approach for classification of emotional/stressful content in speech. In Eighth European Conference on Speech Communication and Technology, Geneva (pp. 721–724).
Reeves, B., & Nass, C. (1996). The media equation: How people treat computers, television, and new media like real people and places. University of Chicago Press, Chicago.
Scherer, K. R. (1999). Appraisal theory. In T. Dalgleish, & M. Power (Eds.), Handbook of cognition and emotion (pp. 637–663). New York: John Wiley.
Shami, M., & Kamel, M. (2005). Segment-based approach to the recognition of emotions in speech. In IEEE Conference on Multimedia and Expo (ICME05), Amsterdam, The Netherlands. http://ieeexplore.ieee.org/iel5/10203/32544/01521436.pdf?tp=&isnumber= &arnumber=1521436
Shami, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication 49, 201–212.
Shriberg, E. (2005). Spontaneous speech: How people really talk and why engineers should care. In Eurospeech 2005, Lisbon, Portugal.
Slaney, M., & McRoberts, G. (2003). A recognition system for affective vocalization. Speech Communication 39, 367–384.
Ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication 40(1–2), 213–225.
Vapnik, V. (2005). The nature of statistical learning theory. New York: Springer-Verlag.
Ververidis, D., Kotropoulos, C., & Pitas, I. (2005). Automatic emotional speech classification. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal (pp. 593–596).
Wu, J., & Huo, Q. (2002). An environment compensated minimum classification error training approach and its evaluation on aurora2 database. In Seventh International Conference on Spoken Language, Denver (pp. 453–456).
Yacoub, S., Simske, S., Lin, X., & Burns, J. (2003). Recognition of emotion in interactive voice systems. In: Proceedings of Eurospeech 2003, Eighth European Conference on Speech Communication and Technology, Geneva (pp. 729–732).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag London Limited
About this chapter
Cite this chapter
Wu, CH., Yeh, JF., Chuang, ZJ. (2009). Emotion Perception and Recognition from Speech. In: Tao, J., Tan, T. (eds) Affective Information Processing. Springer, London. https://doi.org/10.1007/978-1-84800-306-4_6
Download citation
DOI: https://doi.org/10.1007/978-1-84800-306-4_6
Publisher Name: Springer, London
Print ISBN: 978-1-84800-305-7
Online ISBN: 978-1-84800-306-4
eBook Packages: Computer ScienceComputer Science (R0)