Novel acoustic features for speech emotion recognition

Roh, Yong-Wan; Kim, Dong-Ju; Lee, Woo-Seok; Hong, Kwang-Seok

doi:10.1007/s11431-009-0204-3

Novel acoustic features for speech emotion recognition

Published: 09 June 2009

Volume 52, pages 1838–1848, (2009)
Cite this article

Science in China Series E: Technological Sciences Aims and scope Submit manuscript

Yong-Wan Roh¹,
Dong-Ju Kim¹,
Woo-Seok Lee² &
…
Kwang-Seok Hong¹

170 Accesses
8 Citations
Explore all metrics

Abstract

This paper focuses on acoustic features that effectively improve the recognition of emotion in human speech. The novel features in this paper are based on spectral-based entropy parameters such as fast Fourier transform (FFT) spectral entropy, delta FFT spectral entropy, Mel-frequency filter bank (MFB) spectral entropy, and Delta MFB spectral entropy. Spectral-based entropy features are simple. They reflect frequency characteristic and changing characteristic in frequency of speech. We implement an emotion rejection module using the probability distribution of recognized-scores and rejected-scores. This reduces the false recognition rate to improve overall performance. Recognized-scores and rejected-scores refer to probabilities of recognized and rejected emotion recognition results, respectively. These scores are first obtained from a pattern recognition procedure. The pattern recognition phase uses the Gaussian mixture model (GMM). We classify the four emotional states as anger, sadness, happiness and neutrality. The proposed method is evaluated using 45 sentences in each emotion for 30 subjects, 15 males and 15 females. Experimental results show that the proposed method is superior to the existing emotion recognition methods based on GMM using energy, Zero Crossing Rate (ZCR), linear prediction coefficient (LPC), and pitch parameters. We demonstrate the effectiveness of the proposed approach. One of the proposed features, combined MFB and delta MFB spectral entropy improves performance approximately 10% compared to the existing feature parameters for speech emotion recognition methods. We demonstrate a 4% performance improvement in the applied emotion rejection with low confidence score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques

Article 01 January 2024

Speech emotion recognition using MFCC-based entropy feature

Article 22 August 2023

Comparison of Classifiers for Speech Emotion Recognition (SER) with Discriminative Spectral Features

References

Tabatabaei T S, Krishnan S, Guergachi A. Emotional recognition using novel speech signal features. Circuits and Systems, ISCAS 2007. IEEE International Symposium on 27–30 May, 2007. 345–348
Borchert M, Dusterhoft A. Emotion in speech-experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. Natural Language Processing and Knowledge Engineering, IEEE NLP-KE’ 05. Proceedings of 2005 IEEE International Conference on 30 Oct.–1 Nov., 2005. 147–151
Noda T, Yano Y, Doki S, et al. Adaptive emotion recognition in speech by feature selection based on KL-divergence. IEEE International Conference on System, Man, and Cybernetics October 8–11, Taipei, Taiwan, 2006. 1921–1926
Kim S I, Lee S H, Shin W J, et al. Recognition of emotional states in speech using Hidden Markov Model. Proceeding of KFIS Fall Conference, 2004, 14(2): 560–563
Google Scholar
Zhao L, Cao Y, Wang Z, et al. Speech emotional recognition using global and time sequence structure features with MMD. Lecture Notes Computer Science, LNCS 3784. Berlin-Heidelberg: Springer-Verlag, 2005. 311–318
Google Scholar
Schuller B, Rigoll G, Lang M. Hidden Markov Model-based speech emotion recognition. Proc. ICASSP, Hong Kong, China, 2003. 401–404
Rong J, Li G, Chen Y-P P. Acoustic feature selection for automatic emotion recognition from speech. J Inf Proc Manag, 2009, 45(3): 315–328
Article Google Scholar
Russel J A. A circumplex model of affect. J Pers Soc Psychol, 1980, 39: 1161–1178
Article Google Scholar
Banse R, Scherer K R. Acoustic profiles in vocal emotion expression. J Pers Soc Psychol, 1996, 70(3): 614–636
Article Google Scholar
Hyun K H, Kim E H, Kwak Y K. Improvement of emotion recognition by Bayesian classifier using non-zero-pitch concept. Robot and Human Interactive Communication, ROMAN 2005. IEEE International Workshop on 13–15 Aug., 2005. 312–316
Kwon O W, Chan K L, Hao J, et al. Emotion Recognition by Speech Signals. Eurospeech, Geneva, Switzerland, 2003. 125–128
Wagner J, Vogt T, Andre E. A systematic comparison of different HMM design for emotion recognition from acted and spontaneous speech. ACII 2007, LNCS 4738. Berlin-Heidelberg: Springer-Verlag, 2007. 114–125
Google Scholar
Hermansky H, Morgan N. RASTA processing of speech. IEEE Trans Speech and Audio Processing, 1994, 2(4): 578–589
Article Google Scholar
Neiberg D, Elenius K, Laskowski K. Emotion recognition in spontaneous speech using GMMs. ICSLP Ninth International Conference on Spoken Language Processing, 2006. 809–812
Reynolds D, Quatieri T, Dunn R. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 2000. 19–41
Stolcke A. SRILM—An Extensible Language Modeling Toolkit. in ICSLP, Denver, Colorado, USA, 2002. 901–904
Zhou J, Wang G, Yang Y, et al. Speech emotion recognition based on Rough Set and SVM. Cognitive Informatics, ICCI 2006. 5th IEEE International Conference, 2006, 1: 53–61
Google Scholar
Hyun K H, Kim E H, Kwak Y K. Robust speech emotion recognition using Log frequency power ration. SICE-ICASE International Joint Conference 2006, Oct. 18–21, 2006. 2586–2589
Roh Y W, Choi J W, Yoon D S, et al. Delta FBLC based speech/nonspeech frame decision in real car environment. The 4th Conference on New Exploratory Technologies (Next 2007), 2007. 244–247
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning—Data Mining, Inference, and Prediction. Heidelberg: Springer-Verlag, 2000, 27(2): 83–85
Google Scholar
Mozziaconacci S. Pitch variations and emotions in speech. In: Proceedings of the 13th International Congress of Phonetic Sciences, ICPhS’95, 1995, 3: 178–181
Google Scholar
Klasmeyer G, Sendlneier WF. Objective voice parameters to characterize the emotional content in speech. In: Proceedings of the 13^th International Congress of Phonetic Sciences (ICPhS’95), 1995, 3: 181–185
Google Scholar
McGilloway S, Cowie R, Douglas-Cowie E. Prosodic signs of emotion in speech: Preliminary results from a new technique for automated statistical analysis. In: Proceedings of the 13th International Congress of Phonetic Sciences (ICPhS’95), 1995, 3: 250–253
Google Scholar
Nicholson J, Takahashi K, Nakatsu R. Emotion recognition in speech using neural networks. In: Proceedings of Sixth International Conference on Neural Information Processing (ICONIP’99), 1999, 2: 495–501
Google Scholar
Petrushin V A. Emotion recognition in speech signal: Experimental study, development, and application. In: Proceedings of Sixth International Conference on Spoken Language Processing (ICSLP’00), 2000. 222–225
Nwe T L, Wei F S, Silva L D. Speech based emotion classification. In: Proceedings of IEEE region 10 International Conference on Electrical and Electronic Technology, 2001, 1: 297–301
Google Scholar
Schuller B, Reiter S, Muller R, et al. Speaker independent speech emotion recognition by ensemble classification. In: Proceedings of IEEE International Conference on Multimedia and Expo (ICME’05). IEEE Computer Society, 2005. 864–867
Cai L, Jiang C, Wang Z, et al. A method combining the global and time series structure features for emotion recognition in speech. In: Proceedings of International Conference on Neural Networks and Signal Processing 2003, 2: 904–907
Park C H, Sim K B. Emotion recognition and acoustic analysis from speech signal. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN’03), 2003, 4: 2594–2598
Article Google Scholar
Litman D, Forbes-Reley K. Recognizing emotion from student speech in tutoring dialogues. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’03), 2003. 25–30
Song M, Bu J, Chen C, et al. Audio-visual based emotion recognition-a new approach. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04), 2004, 2: 1020–1025
Google Scholar
Bhatti M W, Wang Y, Guan L. A neural network approach for human emotion recognition in speech. In: Proceedings of the 2004 International Symposium on Circuits and Systems (ISCAS’04), 2004, 2, 181–184
Google Scholar
Lee C M, Narayanan S. Towards detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing 2004, 13: 293–302
Google Scholar
Dellaert F, Polzin T, Waibel A. Recognizing emotion in speech. In: Proceedings of Fourth International Conference on Spoken Language Processing (ICSLP’96), 1996, 3: 1970–1973
Article Google Scholar
Amir N. Classifying emotions in speech. A comparison of methods. In: Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH’01), 2001. 127–130
Lee C M, Narayanan S, Pieraccini R. Recognition of negative emotions from the speech signal. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. 240–243
Altun H, Polat G. New Frameworks to Boost Feature Selection Algorithms in Emotion Detection for Improved Human Computer Interaction, LNCS 4729. Berlin-Heidelberg: Springer-Verlag, 2007. 533–541
Google Scholar
Hyun K H, Kim E H, Kwak Y K. Emotional feature extraction based on phoneme information for speech emotion recognition. 16th IEEE International Conference on Robot & Human Interactive Communication, 2007. 802–806
Kim E H, Hyun K H, Kwak Y K. Improvement of emotion recognition from voice by separation of obstruent. 15th IEEE International Symposium on Robut and Human Interactive Communication (RO-MAN06), 2006. 564–568
Kim E H, Hyun K H, Kim S H, et al. Speech emotion recognition using Eigen-FFT in clean and noisy environments. 16th IEEE International Conference on Robot & Human Interactive Communication, 2007. 689–694

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Sungkyunkwan University, 300, Chunchun-dong, Jangan-gu, Suwon, Kyungki-do, 440-746, Korea
Yong-Wan Roh, Dong-Ju Kim & Kwang-Seok Hong
Development Division / DSP Development Team, Tamul Multimedia, 10th Fl., Anyang Trade Center, 1107, Bisan-Dong, DongAn-Gu, Anyang-City, 431-050, Korea
Woo-Seok Lee

Authors

Yong-Wan Roh
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Ju Kim
View author publications
You can also search for this author in PubMed Google Scholar
Woo-Seok Lee
View author publications
You can also search for this author in PubMed Google Scholar
Kwang-Seok Hong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong-Wan Roh.

Additional information

Supported by MIC, Korea under ITRC IITA-2009-(C1090-0902-0046) and the Korea Science and Engineering Foundation (KOSEF) funded by the Korea government (MEST) (Grant No. 20090058909)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roh, YW., Kim, DJ., Lee, WS. et al. Novel acoustic features for speech emotion recognition. Sci. China Ser. E-Technol. Sci. 52, 1838–1848 (2009). https://doi.org/10.1007/s11431-009-0204-3

Download citation

Received: 08 March 2009
Accepted: 24 April 2009
Published: 09 June 2009
Issue Date: July 2009
DOI: https://doi.org/10.1007/s11431-009-0204-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novel acoustic features for speech emotion recognition

Abstract

Access this article

Similar content being viewed by others

An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques

Speech emotion recognition using MFCC-based entropy feature

Comparison of Classifiers for Speech Emotion Recognition (SER) with Discriminative Spectral Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Novel acoustic features for speech emotion recognition

Abstract

Access this article

Similar content being viewed by others

An Innovative Method for Speech Signal Emotion Recognition Based on Spectral Features Using GMM and HMM Techniques

Speech emotion recognition using MFCC-based entropy feature

Comparison of Classifiers for Speech Emotion Recognition (SER) with Discriminative Spectral Features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation