Skip to main content
Log in

Speech emotion recognition approaches in human computer interaction

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

Speech Emotion Recognition (SER) represents one of the emerging fields in human-computer interaction. Quality of the human-computer interface that mimics human speech emotions relies heavily on the types of features used and also on the classifier employed for recognition. The main purpose of this paper is to present a wide range of features employed for speech emotion recognition and the acoustic characteristics of those features. Also in this paper, we analyze the performance in terms of some important parameters such as: precision, recall, F-measure and recognition rate of the features using two of the commonly used emotional speech databases namely Berlin emotional database and Danish emotional database. Emotional speech recognition is being applied in modern human-computer interfaces and the overview of 10 interesting applications is also presented in this paper to illustrate the importance of this technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zeng, Z., Roisman, M. P. I., & Huang, T. S. (2009). A survey of affect recognition methods: audio,visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39–58.

    Article  Google Scholar 

  2. Vogt, T., Andre, E., & Wagner, J. (2008). Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation. In C. Peter & R. Beale (Eds.), LNCS: Vol. 4868. Affect and emotion in HCI (pp. 75–91).

    Chapter  Google Scholar 

  3. Petrantonakis, P. C., & Hadjileontiadis, L. J. (2010). Emotion recognition from EEG using higher order crossings. IEEE Transactions on Information Technology in Biomedicine, 14(2), 186–197.

    Article  Google Scholar 

  4. Frantzidis, C. A., Bratsas, C., et al. (2010). On the classification of emotional biosignals evoked while viewing affective pictures: an integrated data-mining-based approach for healthcare applications. IEEE Transactions on Information Technology in Biomedicine, 14(2), 309–318.

    Article  Google Scholar 

  5. Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann, J.-R., & Chen, J.-H. (2010). EEG-based emotion recognition in music listening. IEEE Transactions on Biomedical Engineering, 57(7), 1798–1806.

    Article  Google Scholar 

  6. Schaaff, K., & Schultz, T. (2009). Towards an EEG-based emotion recognizer for humanoid robots. In The 18th IEEE international symposium on robot and human interactive communication, Toyama, Japan, Sept. 27–Oct. 2 (pp. 719–722). University of Karlsruhe (TH), Karlsruhe, Germany.

    Google Scholar 

  7. Murugappan, M., Rizon, M., Nagarajan, R., Yaacob, S., Zunaidi, I., & Hazry, D. (2007). EEG feature extraction for classifying emotions using FCM and FKM. International Journal of Computers and Communications, 2(1), 21–25.

    Google Scholar 

  8. Petrantonakis, P. C., & Hadjileontiadis, L. J. (2010). Emotion recognition from EEG using higher order crossings. IEEE Transactions on Information Technology in Biomedicine, 14(2), 186–197.

    Article  Google Scholar 

  9. Schaaff, K., & Schultz, T. (2009). Towards an EEG-based emotion recognizer for humanoid robots. In The 18th IEEE international symposium on robot and human interactive communication, Toyama, Japan, Sept. 27–Oct. 2 (pp. 792–796).

    Google Scholar 

  10. Lin, Y.-P., Wang, C.-H., Jung, T.-P., Wu, T.-L., Jeng, S.-K., Duann, J.-R., & Chen, J.-H. (2010). EEG-based emotion recognition in music listening. IEEE Transactions on Biomedical Engineering, 57(7), 1798–1806.

    Article  Google Scholar 

  11. International Conference on Information Technology and Computer Science (2009). The Research on Emotion recognition from ECG signal. In International conference on information technology and computer science, Kiev, July 25–26

    Google Scholar 

  12. Han, M.-J., Hsu, J.-H., & Song, K.-T. (2008). A new information fusion method for bimodal robotic emotion recognition. Journal of Computers, 3(7), 39–47.

    Article  Google Scholar 

  13. Chibelushi, C. C., Deravi, F., & Mason, J. S. D. (2002). A review of speech-based bimodal recognition. IEEE Transactions on Multimedia, 4(1), 23–37.

    Article  Google Scholar 

  14. Elwakdy, M., Elsehely, E., Eltokhy, M., & Elhennawy, A. (2008). Speech recognition using a wavelet transform to establish fuzzy inference system through subtractive clustering and neural network (ANFIS). International Journal of Circuits, Systems and Signal Processing, 4(2), 264–273.

    Google Scholar 

  15. Ranjan, S. (2010). Exploring the discrete wavelet transform as a tool for Hindi speech recognition. International Journal of Computer Theory and Engineering, 2(4), 642–645.

    Google Scholar 

  16. Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera Am Mittag German audio-visual emotional Speech Database. In IEEE international conference on multimedia & expo, Hannover, Germany, 23–26 June.

    Google Scholar 

  17. Wollmer, M., Metallinou, A., Eyben, F., Schuller, B., & Narayanan, S. (2010). Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In International speech communication association, Makuhari, Chiba, Japan, 26–30 September.

    Google Scholar 

  18. Firoz Shah, A., Raji Sukumar, A., & Babu Anto, P. (2010). Discrete wavelet transforms and artificial neural networks for speech emotion recognition. International Journal of Computer Theory and Engineering, 2(3), 319–322.

    Google Scholar 

  19. Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., & Wendemuth, A. (2009). Acoustic emotion recognition: a benchmark comparison of performances. In IEEE workshop on automatic speech recognition and understanding, Merano, Italy, 13–20 December (pp. 552–557).

    Google Scholar 

  20. Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: towards a new generation of databases. Speech Communication, 40, 33–60.

    Article  Google Scholar 

  21. Hansen, J. H. L. (1996). Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition. Speech Communication, 20(1–2), 151–170.

    Article  Google Scholar 

  22. Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 582–596.

    Article  Google Scholar 

  23. Xiao, Z., Dellandrea, E., Dou, W., Chen, L., & Ecole Centrale de Lyon (2007). Automatic hierarchical classification of emotional speech. In Ninth IEEE international symposium on multimedia 2007—workshops (pp. 291–296).

    Chapter  Google Scholar 

  24. Camelin, N., Bechet, F., Damnati, G., & De Mori, R. (2010). Detection and interpretation of opinion expressions in spoken surveys. IEEE Transactions on Audio, Speech, and Language Processing, 18(2), 369–381.

    Article  Google Scholar 

  25. Ververidis, D., & Kotropoulos, C. (2008). Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Processing, 88(12), 2956–2970.

    Article  Google Scholar 

  26. Visser, E., Otsuka, M., & Lee, T.-W. (2003). A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments. Speech Communication, 41, 393–407.

    Article  Google Scholar 

  27. Nogueiras, A., Moreno, A., Bonafonte, A., & Mariño, J. B. (2001). Speech emotion recognition using hidden Markov models. In EUROSPEECH 2001 Scandinavia, 7th European conference on speech communication and technology, 2nd INTERSPEECH Event, Aalborg, Denmark, 3–7 September.

    Google Scholar 

  28. Neiberg, D., & Elenius, K. (2008). Automatic recognition of anger in spontaneous speech. In INTERSPEECH 9th annual conference of the international speech communication association, Brisbane, Australia, 22–26 September.

    Google Scholar 

  29. Schuller, B., Batliner, A., Steidl, S., & Seppi, D. (2009). Emotion recognition from speech: putting ASR in the loop. In IEEE international conference on acoustics, speech, and signal processing (ICASSP 2009), Taipei, Taiwan, 19–24 April.

    Google Scholar 

  30. Xiao, Z., Dellandrea, E., & Chen, L. (2009). Recognition of emotions in speech by a hierarchical approach. In Affective computing and intelligent interaction and workshops 2009 (ACII 2009), 3rd international conference, Amsterdam, 10–12 September.

    Google Scholar 

  31. Khanchandani, K. B., & Hussain, M. A. (2009). Emotion recognition using multilayer perceptron and generalized feed forward neural network. IEEE Journal of Scientific and Industrial Research, 68, 367–371.

    Google Scholar 

  32. Sobol-Shikler, T., & Robinson, P. (2010). Classification of complex information: inference of co-occurring affective states from their expressions in speech. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7), 1284–1297.

    Article  Google Scholar 

  33. Trimmer, C. G., & Cuddy, L. L. (2008). Emotional intelligence, not music training, predicts recognition of emotional speech prosody. Emotion, 8(6), 838–849. Copyright 2008 by the American Psychological Association.

    Article  Google Scholar 

  34. Erro, D., Navas, E. Hernáez, I., & Saratxaga, I. (2010). Emotion conversion based on prosodic unit selection. IEEE Transactions on Audio, Speech, and Language Processing, 18(5), 974–983.

    Article  Google Scholar 

  35. Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In International speech communication association, acoustics, speech, and signal processing. Proceedings (ICASSP’04), IEEE international conference, Quebec, Canada, 17–21 May.

    Google Scholar 

  36. Schuller, B., Seppi, D., Batliner, A., Maier, A., & Steidl, S. (2007). Towards more reality in the recognition of emotional speech. In IEEE international conference on acoustics, speech, and signal processing, (ICASSP), Honolulu, 15 April.

    Google Scholar 

  37. Truong, K. P., & Raaijmakers, S. (2008). Automatic recognition of spontaneous emotions in speech using acoustic and lexical features. In LNCS: Vol. 5237. MLMI 2008 (pp. 161–172).

    Google Scholar 

  38. Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine—belief network architecture. In IEEE international conference on acoustics, speech, and signal processing, Quebec, Canada, 17–21 May.

    Google Scholar 

  39. Schuller, B., Vlasenko, B., Arsic, D., Rigoll, G., & Wendemuth, A. (2008). Combining speech recognition and acoustic word emotion models for robust text-independent emotion recognition. In IEEE international conference on multimedia & expo, Hannover, Germany, 23–26 June.

    Google Scholar 

  40. Fujie, S. Yagi, D., Matsusaka, Y., Kikuchi, H., & Kobayashi, T. (2004). Spoken dialogue system using prosody as para-linguistic information. In Indian science congress association archive, speech prosody 2004, international conference, Nara, Japan, 23–26 March.

    Google Scholar 

  41. Ringeval, F., & Chetouani, M. (2008). A vowel based approach for acted emotion recognition. In INTERSPEECH 2008 9th annual conference of the international speech communication association, Brisbane, Australia, 22–26 September.

    Google Scholar 

  42. Kim, E. H., Hyun, K. H., Kim, S. H., & Kwak, Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics, 14(3), 317–325.

    Article  Google Scholar 

  43. Wöllmer, M., Schuller, B., Eyben, F., & Rigoll, G. (2010). Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening. IEEE Journal of Selected Topics in Signal Processing, 4(5), 867–881.

    Article  Google Scholar 

  44. Tarng, W., Chen, Y.-Y., Li, C.-L., Hsie, K.-R., & Chen, M. (2010). Applications of support vector machines on smart phone systems for emotional speech recognition. World Academy of Science, Engineering and Technology, 72, 106–113.

    Google Scholar 

  45. Chavhan, Y., Dhore, M. L., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.

    Article  Google Scholar 

  46. Paulmann, S., Pell, M. D., & Kotz, S. A. (2008). How aging affects the recognition of emotional speech. Brain and Language, 104, 262–269.

    Article  Google Scholar 

  47. Moore, E., II, Clements, M. A., Peifer, J. W., Weisser, L. (2008). Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Transactions on Biomedical Engineering, 55(1), 96–107.

    Article  Google Scholar 

  48. Jang, K.-D., & Kwon, O.-W. (2006). Speech emotion recognition for affective human-robot interaction. In SPECOM’2006, St. Petersburg, 25–29 June (pp. 419–422).

    Google Scholar 

  49. Park, J.-S., Kim, J.-H., & Oh, Y.-H. (2009). Feature vector classification based speech emotion recognition for service robots. IEEE Transactions on Consumer Electronics, 55(3), 1590–1596.

    Article  Google Scholar 

  50. Wang, Y., & Guan, L. (2008). Recognizing human emotional state from audiovisual signals. IEEE Transactions on Multimedia, 10(4), 659–668.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Ramakrishnan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramakrishnan, S., El Emary, I.M.M. Speech emotion recognition approaches in human computer interaction. Telecommun Syst 52, 1467–1478 (2013). https://doi.org/10.1007/s11235-011-9624-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-011-9624-z

Keywords

Navigation