Skip to main content
Log in

Novel acoustic features for speech emotion recognition

  • Published:
Science in China Series E: Technological Sciences Aims and scope Submit manuscript

Abstract

This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech. The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform (FFT) spectral entropy, delta FFT spectral entropy, Mel-frequency filter bank (MFB) spectral entropy, and Delta MFB spectral entropy. Spectral-based entropy features are simple. They reflect frequency characteristic and changing characteristic in frequency of speech. We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores. This reduces the false recognition rate to improve overall performance. Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results, respectively. These scores are first obtained from a pattern recognition procedure. The pattern recognition phase uses the Gaussian mixture model (GMM). We classify the four emotional states as anger, sadness, happiness and neutrality. The proposed method is evaluated using 45 sentences in each emotion for 30 subjects, 15 males and 15 females. Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy, Zero Crossing Rate (ZCR), linear prediction coefficient (LPC), and pitch parameters. We demonstrate the effectiveness of the proposed approach. One of the proposed features, combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods. We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Tabatabaei T S, Krishnan S, Guergachi A. Emotional recognition using novel speech signal features. Circuits and Systems, ISCAS 2007. IEEE International Symposium on 27–30 May, 2007. 345–348

  2. Borchert M, Dusterhoft A. Emotion in speech-experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. Natural Language Processing and Knowledge Engineering, IEEE NLP-KE’ 05. Proceedings of 2005 IEEE International Conference on 30 Oct.–1 Nov., 2005. 147–151

  3. Noda T, Yano Y, Doki S, et al. Adaptive emotion recognition in speech by feature selection based on KL-divergence. IEEE International Conference on System, Man, and Cybernetics October 8–11, Taipei, Taiwan, 2006. 1921–1926

  4. Kim S I, Lee S H, Shin W J, et al. Recognition of emotional states in speech using Hidden Markov Model. Proceeding of KFIS Fall Conference, 2004, 14(2): 560–563

    Google Scholar 

  5. Zhao L, Cao Y, Wang Z, et al. Speech emotional recognition using global and time sequence structure features with MMD. Lecture Notes Computer Science, LNCS 3784. Berlin-Heidelberg: Springer-Verlag, 2005. 311–318

    Google Scholar 

  6. Schuller B, Rigoll G, Lang M. Hidden Markov Model-based speech emotion recognition. Proc. ICASSP, Hong Kong, China, 2003. 401–404

  7. Rong J, Li G, Chen Y-P P. Acoustic feature selection for automatic emotion recognition from speech. J Inf Proc Manag, 2009, 45(3): 315–328

    Article  Google Scholar 

  8. Russel J A. A circumplex model of affect. J Pers Soc Psychol, 1980, 39: 1161–1178

    Article  Google Scholar 

  9. Banse R, Scherer K R. Acoustic profiles in vocal emotion expression. J Pers Soc Psychol, 1996, 70(3): 614–636

    Article  Google Scholar 

  10. Hyun K H, Kim E H, Kwak Y K. Improvement of emotion recognition by Bayesian classifier using non-zero-pitch concept. Robot and Human Interactive Communication, ROMAN 2005. IEEE International Workshop on 13–15 Aug., 2005. 312–316

  11. Kwon O W, Chan K L, Hao J, et al. Emotion Recognition by Speech Signals. Eurospeech, Geneva, Switzerland, 2003. 125–128

  12. Wagner J, Vogt T, Andre E. A systematic comparison of different HMM design for emotion recognition from acted and spontaneous speech. ACII 2007, LNCS 4738. Berlin-Heidelberg: Springer-Verlag, 2007. 114–125

    Google Scholar 

  13. Hermansky H, Morgan N. RASTA processing of speech. IEEE Trans Speech and Audio Processing, 1994, 2(4): 578–589

    Article  Google Scholar 

  14. Neiberg D, Elenius K, Laskowski K. Emotion recognition in spontaneous speech using GMMs. ICSLP Ninth International Conference on Spoken Language Processing, 2006. 809–812

  15. Reynolds D, Quatieri T, Dunn R. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 2000. 19–41

  16. Stolcke A. SRILM—An Extensible Language Modeling Toolkit. in ICSLP, Denver, Colorado, USA, 2002. 901–904

  17. Zhou J, Wang G, Yang Y, et al. Speech emotion recognition based on Rough Set and SVM. Cognitive Informatics, ICCI 2006. 5th IEEE International Conference, 2006, 1: 53–61

    Google Scholar 

  18. Hyun K H, Kim E H, Kwak Y K. Robust speech emotion recognition using Log frequency power ration. SICE-ICASE International Joint Conference 2006, Oct. 18–21, 2006. 2586–2589

  19. Roh Y W, Choi J W, Yoon D S, et al. Delta FBLC based speech/nonspeech frame decision in real car environment. The 4th Conference on New Exploratory Technologies (Next 2007), 2007. 244–247

  20. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning—Data Mining, Inference, and Prediction. Heidelberg: Springer-Verlag, 2000, 27(2): 83–85

    Google Scholar 

  21. Mozziaconacci S. Pitch variations and emotions in speech. In: Proceedings of the 13th International Congress of Phonetic Sciences, ICPhS’95, 1995, 3: 178–181

    Google Scholar 

  22. Klasmeyer G, Sendlneier WF. Objective voice parameters to characterize the emotional content in speech. In: Proceedings of the 13th International Congress of Phonetic Sciences (ICPhS’95), 1995, 3: 181–185

    Google Scholar 

  23. McGilloway S, Cowie R, Douglas-Cowie E. Prosodic signs of emotion in speech: Preliminary results from a new technique for automated statistical analysis. In: Proceedings of the 13th International Congress of Phonetic Sciences (ICPhS’95), 1995, 3: 250–253

    Google Scholar 

  24. Nicholson J, Takahashi K, Nakatsu R. Emotion recognition in speech using neural networks. In: Proceedings of Sixth International Conference on Neural Information Processing (ICONIP’99), 1999, 2: 495–501

    Google Scholar 

  25. Petrushin V A. Emotion recognition in speech signal: Experimental study, development, and application. In: Proceedings of Sixth International Conference on Spoken Language Processing (ICSLP’00), 2000. 222–225

  26. Nwe T L, Wei F S, Silva L D. Speech based emotion classification. In: Proceedings of IEEE region 10 International Conference on Electrical and Electronic Technology, 2001, 1: 297–301

    Google Scholar 

  27. Schuller B, Reiter S, Muller R, et al. Speaker independent speech emotion recognition by ensemble classification. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME’05). IEEE Computer Society, 2005. 864–867

  28. Cai L, Jiang C, Wang Z, et al. A method combining the global and time series structure features for emotion recognition in speech. In: Proceedings of International Conference on Neural Networks and Signal Processing 2003, 2: 904–907

  29. Park C H, Sim K B. Emotion recognition and acoustic analysis from speech signal. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN’03), 2003, 4: 2594–2598

    Article  Google Scholar 

  30. Litman D, Forbes-Reley K. Recognizing emotion from student speech in tutoring dialogues. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’03), 2003. 25–30

  31. Song M, Bu J, Chen C, et al. Audio-visual based emotion recognition-a new approach. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04), 2004, 2: 1020–1025

    Google Scholar 

  32. Bhatti M W, Wang Y, Guan L. A neural network approach for human emotion recognition in speech. In: Proceedings of the 2004 International Symposium on Circuits and Systems (ISCAS’04), 2004, 2, 181–184

    Google Scholar 

  33. Lee C M, Narayanan S. Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing 2004, 13: 293–302

    Google Scholar 

  34. Dellaert F, Polzin T, Waibel A. Recognizing emotion in speech. In: Proceedings of Fourth International Conference on Spoken Language Processing (ICSLP’96), 1996, 3: 1970–1973

    Article  Google Scholar 

  35. Amir N. Classifying emotions in speech. A comparison of methods. In: Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH’01), 2001. 127–130

  36. Lee C M, Narayanan S, Pieraccini R. Recognition of negative emotions from the speech signal. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. 240–243

  37. Altun H, Polat G. New Frameworks to Boost Feature Selection Algorithms in Emotion Detection for Improved Human Computer Interaction, LNCS 4729. Berlin-Heidelberg: Springer-Verlag, 2007. 533–541

    Google Scholar 

  38. Hyun K H, Kim E H, Kwak Y K. Emotional feature extraction based on phoneme information for speech emotion recognition. 16th IEEE International Conference on Robot & Human Interactive Communication, 2007. 802–806

  39. Kim E H, Hyun K H, Kwak Y K. Improvement of emotion recognition from voice by separation of obstruent. 15th IEEE International Symposium on Robut and Human Interactive Communication (RO-MAN06), 2006. 564–568

  40. Kim E H, Hyun K H, Kim S H, et al. Speech emotion recognition using Eigen-FFT in clean and noisy environments. 16th IEEE International Conference on Robot & Human Interactive Communication, 2007. 689–694

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong-Wan Roh.

Additional information

Supported by MIC, Korea under ITRC IITA-2009-(C1090-0902-0046) and the Korea Science and Engineering Foundation (KOSEF) funded by the Korea government (MEST) (Grant No. 20090058909)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roh, YW., Kim, DJ., Lee, WS. et al. Novel acoustic features for speech emotion recognition. Sci. China Ser. E-Technol. Sci. 52, 1838–1848 (2009). https://doi.org/10.1007/s11431-009-0204-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11431-009-0204-3

Keywords

Navigation