Abstract
This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech. The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform (FFT) spectral entropy, delta FFT spectral entropy, Mel-frequency filter bank (MFB) spectral entropy, and Delta MFB spectral entropy. Spectral-based entropy features are simple. They reflect frequency characteristic and changing characteristic in frequency of speech. We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores. This reduces the false recognition rate to improve overall performance. Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results, respectively. These scores are first obtained from a pattern recognition procedure. The pattern recognition phase uses the Gaussian mixture model (GMM). We classify the four emotional states as anger, sadness, happiness and neutrality. The proposed method is evaluated using 45 sentences in each emotion for 30 subjects, 15 males and 15 females. Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy, Zero Crossing Rate (ZCR), linear prediction coefficient (LPC), and pitch parameters. We demonstrate the effectiveness of the proposed approach. One of the proposed features, combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods. We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.
Similar content being viewed by others
References
Tabatabaei T S, Krishnan S, Guergachi A. Emotional recognition using novel speech signal features. Circuits and Systems, ISCAS 2007. IEEE International Symposium on 27–30 May, 2007. 345–348
Borchert M, Dusterhoft A. Emotion in speech-experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. Natural Language Processing and Knowledge Engineering, IEEE NLP-KE’ 05. Proceedings of 2005 IEEE International Conference on 30 Oct.–1 Nov., 2005. 147–151
Noda T, Yano Y, Doki S, et al. Adaptive emotion recognition in speech by feature selection based on KL-divergence. IEEE International Conference on System, Man, and Cybernetics October 8–11, Taipei, Taiwan, 2006. 1921–1926
Kim S I, Lee S H, Shin W J, et al. Recognition of emotional states in speech using Hidden Markov Model. Proceeding of KFIS Fall Conference, 2004, 14(2): 560–563
Zhao L, Cao Y, Wang Z, et al. Speech emotional recognition using global and time sequence structure features with MMD. Lecture Notes Computer Science, LNCS 3784. Berlin-Heidelberg: Springer-Verlag, 2005. 311–318
Schuller B, Rigoll G, Lang M. Hidden Markov Model-based speech emotion recognition. Proc. ICASSP, Hong Kong, China, 2003. 401–404
Rong J, Li G, Chen Y-P P. Acoustic feature selection for automatic emotion recognition from speech. J Inf Proc Manag, 2009, 45(3): 315–328
Russel J A. A circumplex model of affect. J Pers Soc Psychol, 1980, 39: 1161–1178
Banse R, Scherer K R. Acoustic profiles in vocal emotion expression. J Pers Soc Psychol, 1996, 70(3): 614–636
Hyun K H, Kim E H, Kwak Y K. Improvement of emotion recognition by Bayesian classifier using non-zero-pitch concept. Robot and Human Interactive Communication, ROMAN 2005. IEEE International Workshop on 13–15 Aug., 2005. 312–316
Kwon O W, Chan K L, Hao J, et al. Emotion Recognition by Speech Signals. Eurospeech, Geneva, Switzerland, 2003. 125–128
Wagner J, Vogt T, Andre E. A systematic comparison of different HMM design for emotion recognition from acted and spontaneous speech. ACII 2007, LNCS 4738. Berlin-Heidelberg: Springer-Verlag, 2007. 114–125
Hermansky H, Morgan N. RASTA processing of speech. IEEE Trans Speech and Audio Processing, 1994, 2(4): 578–589
Neiberg D, Elenius K, Laskowski K. Emotion recognition in spontaneous speech using GMMs. ICSLP Ninth International Conference on Spoken Language Processing, 2006. 809–812
Reynolds D, Quatieri T, Dunn R. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 2000. 19–41
Stolcke A. SRILM—An Extensible Language Modeling Toolkit. in ICSLP, Denver, Colorado, USA, 2002. 901–904
Zhou J, Wang G, Yang Y, et al. Speech emotion recognition based on Rough Set and SVM. Cognitive Informatics, ICCI 2006. 5th IEEE International Conference, 2006, 1: 53–61
Hyun K H, Kim E H, Kwak Y K. Robust speech emotion recognition using Log frequency power ration. SICE-ICASE International Joint Conference 2006, Oct. 18–21, 2006. 2586–2589
Roh Y W, Choi J W, Yoon D S, et al. Delta FBLC based speech/nonspeech frame decision in real car environment. The 4th Conference on New Exploratory Technologies (Next 2007), 2007. 244–247
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning—Data Mining, Inference, and Prediction. Heidelberg: Springer-Verlag, 2000, 27(2): 83–85
Mozziaconacci S. Pitch variations and emotions in speech. In: Proceedings of the 13th International Congress of Phonetic Sciences, ICPhS’95, 1995, 3: 178–181
Klasmeyer G, Sendlneier WF. Objective voice parameters to characterize the emotional content in speech. In: Proceedings of the 13th International Congress of Phonetic Sciences (ICPhS’95), 1995, 3: 181–185
McGilloway S, Cowie R, Douglas-Cowie E. Prosodic signs of emotion in speech: Preliminary results from a new technique for automated statistical analysis. In: Proceedings of the 13th International Congress of Phonetic Sciences (ICPhS’95), 1995, 3: 250–253
Nicholson J, Takahashi K, Nakatsu R. Emotion recognition in speech using neural networks. In: Proceedings of Sixth International Conference on Neural Information Processing (ICONIP’99), 1999, 2: 495–501
Petrushin V A. Emotion recognition in speech signal: Experimental study, development, and application. In: Proceedings of Sixth International Conference on Spoken Language Processing (ICSLP’00), 2000. 222–225
Nwe T L, Wei F S, Silva L D. Speech based emotion classification. In: Proceedings of IEEE region 10 International Conference on Electrical and Electronic Technology, 2001, 1: 297–301
Schuller B, Reiter S, Muller R, et al. Speaker independent speech emotion recognition by ensemble classification. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME’05). IEEE Computer Society, 2005. 864–867
Cai L, Jiang C, Wang Z, et al. A method combining the global and time series structure features for emotion recognition in speech. In: Proceedings of International Conference on Neural Networks and Signal Processing 2003, 2: 904–907
Park C H, Sim K B. Emotion recognition and acoustic analysis from speech signal. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN’03), 2003, 4: 2594–2598
Litman D, Forbes-Reley K. Recognizing emotion from student speech in tutoring dialogues. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’03), 2003. 25–30
Song M, Bu J, Chen C, et al. Audio-visual based emotion recognition-a new approach. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04), 2004, 2: 1020–1025
Bhatti M W, Wang Y, Guan L. A neural network approach for human emotion recognition in speech. In: Proceedings of the 2004 International Symposium on Circuits and Systems (ISCAS’04), 2004, 2, 181–184
Lee C M, Narayanan S. Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing 2004, 13: 293–302
Dellaert F, Polzin T, Waibel A. Recognizing emotion in speech. In: Proceedings of Fourth International Conference on Spoken Language Processing (ICSLP’96), 1996, 3: 1970–1973
Amir N. Classifying emotions in speech. A comparison of methods. In: Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH’01), 2001. 127–130
Lee C M, Narayanan S, Pieraccini R. Recognition of negative emotions from the speech signal. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. 240–243
Altun H, Polat G. New Frameworks to Boost Feature Selection Algorithms in Emotion Detection for Improved Human Computer Interaction, LNCS 4729. Berlin-Heidelberg: Springer-Verlag, 2007. 533–541
Hyun K H, Kim E H, Kwak Y K. Emotional feature extraction based on phoneme information for speech emotion recognition. 16th IEEE International Conference on Robot & Human Interactive Communication, 2007. 802–806
Kim E H, Hyun K H, Kwak Y K. Improvement of emotion recognition from voice by separation of obstruent. 15th IEEE International Symposium on Robut and Human Interactive Communication (RO-MAN06), 2006. 564–568
Kim E H, Hyun K H, Kim S H, et al. Speech emotion recognition using Eigen-FFT in clean and noisy environments. 16th IEEE International Conference on Robot & Human Interactive Communication, 2007. 689–694
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by MIC, Korea under ITRC IITA-2009-(C1090-0902-0046) and the Korea Science and Engineering Foundation (KOSEF) funded by the Korea government (MEST) (Grant No. 20090058909)
Rights and permissions
About this article
Cite this article
Roh, YW., Kim, DJ., Lee, WS. et al. Novel acoustic features for speech emotion recognition. Sci. China Ser. E-Technol. Sci. 52, 1838–1848 (2009). https://doi.org/10.1007/s11431-009-0204-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11431-009-0204-3