International Journal of Speech Technology

, Volume 21, Issue 1, pp 93–120 | Cite as

Databases, features and classifiers for speech emotion recognition: a review

  • Monorama SwainEmail author
  • Aurobinda Routray
  • P. Kabisatpathy


Speech is an effective medium to express emotions and attitude through language. Finding the emotional content from a speech signal and identify the emotions from the speech utterances is an important task for the researchers. Speech emotion recognition has considered as an important research area over the last decade. Many researchers have been attracted due to the automated analysis of human affective behaviour. Therefore a number of systems, algorithms, and classifiers have been developed and outlined for the identification of emotional content of a speech from a person’s speech. In this study, available literature on various databases, different features and classifiers have been taken in to consideration for speech emotion recognition from assorted languages.


Speech corpus Excitation features Spectral features Prosodic features Classifiers Emotion recognition 



The authors are grateful for the valuable input given by Prof. J. Talukdar, Silicon Institute of Technology, Bhubaneswar, Odisha.


  1. Abrilian, S., Devillers, L., & Martin, J. C. (2006). Annotation of emotions in real-life video interviews: Variability between coders. In 5th international conference on language resources and evaluation (LREC 06), Genoa, pp. 2004–2009.Google Scholar
  2. Agrawal, S. S. (2011). Emotions in Hindi speech-analysis, perception and recognition. In International conference on speech database and assessments (Oriental COCOSDA).Google Scholar
  3. Agrawal, S. S., Jain, A., & Arora, S. (2009). Acoustic and perceptual features of intonation patterns in Hindi speech. In International workshop on spoken language prosody (IWSLPR-09), Kolkata, pp. 25–27.Google Scholar
  4. Alonso, J. B., Cabrera, J., Medina, M., & Travieso, C. M. (2015). New approach in quantification of emotional intensity from the speech signal: Emotional temperature. Experts Systems with Applications, 42, 9554–9564.CrossRefGoogle Scholar
  5. Amer, M. R., Siddiquie, B., Richey, C., & Divakaran, A. (2014). Emotion detection in speech using deep networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3724–3728.Google Scholar
  6. Amir, N., Ron, S., & Laor, N. (2000). Analysis of an emotional speech corpus in Hebrew based on objective criteria. In Proceedings of ISCA workshop speech and emotion, Belfast, Vol. 1, pp. 29–33.Google Scholar
  7. Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697.CrossRefGoogle Scholar
  8. Atassi, H., & Esposito, A. (2008). A speaker independent approach to the classification of emotional vocal expressions. In IEEE international conference on tools with artificial intelligence (ICTAI’08), Dayton, Ohio, USA, Vol 2, pp 147–152.Google Scholar
  9. Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614–636.CrossRefGoogle Scholar
  10. Bapineedu, G., Avinash, B., Gangashetty, S. V., & Yegnanarayana, B. (2009). Analysis of Lombard speech using excitation source information. In INTERSPEECH-09, Brighton, UK, pp. 1091–1094.Google Scholar
  11. Batliner, A., Biersack, S., & Steidl, S. (2006). The prosody of pet robot directed speech: Evidence from children. In Speech prosody, Dresden, pp. 1–4.Google Scholar
  12. Batliner, A., Hacker, C., Steidl, S., Noth, E., D’Arcy, S., Russell, M., & Wong, M. (2004). You stupid tin box—children interacting with the AIBO robot: A cross-linguistic emotional speech corpus. In Proceedings of language resources and evaluation (LREC 04), Lisbon.Google Scholar
  13. Batliner, A., Huber, R., Niemann, H., Nöth, E., Spilker, J., & Fischer, K. (2000). The recognition of emotion. In Verbmobil: Foundations of speech-to-speech translation, pp. 122–130.Google Scholar
  14. Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7–8), 613–625.Google Scholar
  15. Borden, G., Harris, K., & Raphael, L. (1994). Speech science primer: Physiology, acoustics, and perception of speech (3rd ed.). Baltimore: Williams and Wilkins.Google Scholar
  16. Bozkurt, E., Erzin, E., & Erdem, A. T. (2009). Improving automatic emotion recognition from speech signals. In 10th annual conference of the international speech communication association (INTERSPEECH), Brighton, UK, pp. 324–327.Google Scholar
  17. Brester, C., Semenkin, E., & Sidorov, M. (2016). Multi-objective heuristic feature selection for speech-based multilingual emotion recognition. JAISCR, 6(4), 243–253.Google Scholar
  18. Buck, R. (1999). The biological affects, a typology. Psychological Review, 106(2), 301–336.CrossRefGoogle Scholar
  19. Bulut, M., Narayanan, S. S., & Syrdal, A. K. (2002). Expressive speech synthesis using a concatenative synthesizer. In Proceedings of international conference on spoken language processing (ICSLP’02), Vol. 2, pp. 1265–1268.Google Scholar
  20. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., & Weiss, B. (2005). A database of German emotional speech. In Proceedings of the INTERSPEECH 2005, Lissabon, Portugal, pp. 1517–1520.Google Scholar
  21. Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S. et al. (2008) IEMOCAP: Interactive emotional dyadic motion capture database. In: Language resources and evaluation.Google Scholar
  22. Caballero-Morales, S. O. (2013) Recognition of emotions in Mexican Spanish speech: An approach based on acoustic modelling of emotion-specific vowels. The Scientific World Journal, 2013, 1–13. Google Scholar
  23. Caldognetto, E. M., Cosi, P., Drioli, C., Tisato, G., & Cavicchio, F. (2004). Modifications of phonetic labial targets in emotive speech: Effects of the co-production of speech and emotions. Speech Communication, 44, 173–185.CrossRefGoogle Scholar
  24. Chauhan, A., Koolagudi, S. G., Kafley, S. & Rao, K. S. (2010). Emotion recognition using LP residual. In Proceedings of the 2010 IEEE students’ technology symposium, IIT Kharagpu.Google Scholar
  25. Chen, L., Mao, X., Xue, Y., & Lung, L. (2012). Speech emotion recognition: Features and classification models. Digital Signal Processing, 22(6), 1154–1160.MathSciNetCrossRefGoogle Scholar
  26. Chuang, Z.-J., & Wu, C.-H. (2002). Emotion recognition from textual input using an emotional semantic network. In Proceedings of international conference on spoken language processing (ICSLP’02), Vol. 3, pp. 2033–2036.Google Scholar
  27. Cichosz, J., & Slot, K. (2005). Low-dimensional feature space derivation for emotion recognition. In INTERSPEECH’05, Lisbon, Portugal, pp. 477–480.Google Scholar
  28. Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO Corpus: An Italian emotional speech database. In Proceedings of the 9th international conference on language resources and evaluation—LREC 14, pp. 3501–3504.Google Scholar
  29. Cummings, K. E., & Clements, M. A. (1998). Analysis of the glottal excitation of emotionally styled and stressed speech. The Journal of the Acoustical Society of America, 98,  88–98.Google Scholar
  30. Darwin, C. (1872/1965). The expression of the emotions in man and animals. Chicago University Press, Chicago.Google Scholar
  31. Dellaert, F., Polzin, T., & Waibel, A. (1996a). Recognising emotions in speech. In ICSLP 96.Google Scholar
  32. Dellert, F., Polzin, T., & Waibel, A. (1996b). Recognizing emotion in speech. In 4th international conference on spoken language processing, Philadelphia, PA, USA, pp. 1970–1973.Google Scholar
  33. Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases. Speech Communication, 40, 33–60.CrossRefzbMATHGoogle Scholar
  34. Eckman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6, 169–200.CrossRefGoogle Scholar
  35. Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion. Sussex: Wiley.Google Scholar
  36. EI Ayadi M, Kamel MS, Karray F (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 1(44), 572–587.CrossRefzbMATHGoogle Scholar
  37. Engberg, I., & Hansen, A. (1996). “Documentation of the Danish emotional speech database” des. Retrieved from
  38. Esmaileyan, Z., & Marvi, H. (2014). A database for automatic Persian speech emotion recognition: Collection, processing and evaluation. IJE Transactions A: Bascis, 27(1), 79–90.Google Scholar
  39. Espinosa, H. P., Garcia, J. O., & Pineda, L. V. (2010). Features selection for primitives estimation on emotional speech. In ICASSP, Florence, Italy, pp. 5138–5141Google Scholar
  40. Fernandez, R., & Picard, R. W. (2003). Modeling driver’s speech under stress. Speech Communication, 40, 145–159.CrossRefzbMATHGoogle Scholar
  41. Shah, A. F., Vimal Krishnan, V. R., Sukumar, A. R., Jayakumar, A., & Anto, P. B. (2009). Speaker independent automatic emotion recognition in speech: A comparison of MFCCs and discrete wavelet transforms. In International conference on advances in recent technologies in communication and computing, ARTCom ‘09.Google Scholar
  42. Fontaine, J. R., Scherer, K. R., Roesch, E. B., & Ellsworth, P. C. (2007). The world of emotion is not two dimensional. Psychological Science, 13, 1050–1057.CrossRefGoogle Scholar
  43. France, D. J., Shiavi, R. G., Silverman, S., Silverman, M., & Wilkes, M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Transactions on Biomedical Engineering, 7, 829–837.CrossRefGoogle Scholar
  44. Gangamohan, P., Kadiri, S. R., Gangashetty, S. V., & Yegnanarayana, B. (2014). Excitation source features for discrimination of anger and happy emotions. In: INTERSPEECH, Singapore, pp. 1253–1257.Google Scholar
  45. Gangamohan, P., Kadiri, S. R., & Yegnanarayana, B. (2013). Analysis of emotional speech at sub segmental level. In Interspeech, Lyon, France, pp. 1916–1920.Google Scholar
  46. Gomez, P., & Danuser, B. (2004). Relationships between musical structure and physiological measures of emotion. Emotion, 7(2), 377–387.CrossRefGoogle Scholar
  47. Grimm, M., Kroschel, K., & Narayanan, S. (2008). The Vera Ammittag German audio-visual emotional speech database. In International conference on multimedia and expo, pp. 865–868.Google Scholar
  48. Grimm, M., Mower, E., Kroschel, K., & Narayanan, S. (2006). Combining categorical and primitives-based emotion recognition. In 14th European signal processing conference (EUSIPCO 2006), Florence, Italy.Google Scholar
  49. Haq, S., & Jackson, P. J. B. (2009). Speaker-dependent audio-visual emotion recognition. In Proceedings of international conference on auditory-visual speech processing, pp. 53–58.Google Scholar
  50. He, L., Lech, M., & Allen, N. (2010). On the importance of glottal flow spectral energy for the recognition of emotions in speech. In INTERSPEECH 2010, Makuhari, Chiba, Japan, pp. 26–30.Google Scholar
  51. Hozjan, V., & Kacic, Z. (2003). Improved emotion recognition with large set of stastical features. Geneva: Eurospecch.Google Scholar
  52. Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., & Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. In Proceedings of the 3rd international conference on language (LREC’02) Las Palmas de Gran Canaria, Spain, pp. 2019–2023.Google Scholar
  53. Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falco, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech and Language, 24(3), 445–460.CrossRefGoogle Scholar
  54. Iliou, T., & Anagnostopoulos, C.-N. (2009). Statistical evaluation of speech features for emotion recognition. In Fourth international conference on digital telecommunications, Colmar, France, pp. 121–126.Google Scholar
  55. Iriondo, I., Guaus, R., & Rodriguez, A. (2000). Validation of an acoustical modeling of emotional expression in Spanish using speech synthesis techniques. In Proceedings of ISCA workshop speech and emotion, Belfast, Vol. 1, pp. 161–166.Google Scholar
  56. Izard, C. E. (1992). Basic emotions, relations among emotions, and emotion-cognition relations. Psychological Review, 99, 561–565.CrossRefGoogle Scholar
  57. Jeon, J. H., Le, D., Xia, R., & Liu, Y. (2013). A preliminary study of cross-lingual emotion recognition from speech: Automatic classification versus human perception. In Interspeech, Layon, France, pp. 2837–2840.Google Scholar
  58. Jiang, D.-N., & Cai, L. H. (2004). Classifying emotion in Chinese speech by decomposing prosodic features. In International conference on speech and language processing (ICSLP), Jeju, Korea.Google Scholar
  59. Jiang, D.-N., Zhang, W., Shen, L.-Q., & Cai, L.-H. (2005). Prosody analysis and modelling for emotional speech synthesis. In IEEE proceedings of ICASSP 2005, pp. 281–284.Google Scholar
  60. Jin, X., & Wang, Z. (2005). An emotion space model for recognition of emotions in spoken Chinese (pp. 397–402). Berlin: Springer.Google Scholar
  61. Jovičić, S. T., Kašić, Z., Đorđević, M., & Rajković, M. (2004). Serbian emotional speech database: Design, processing and evaluation. In SPECOM 9th conference speech and computer, St. Petersburg, Russia.Google Scholar
  62. Kadiri, S. R., Gangamohan, P., Gangashetty, S. V., & Yegnanarayana, B. (2015). Analysis of excitation source features of speech for emotion recognition. In INTERSPEECH 2015, Dresden, pp. 1324–1328.Google Scholar
  63. Kandali, A. B., Routray, A., & Basu, T. K. (2008a). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In Proceedings of IEEE region 10 conference on TENCHON.Google Scholar
  64. Kandali, A. B., Routray, A., & Basu, T. K. (2008b). Emotion recognition from speeches of some native languages of ASSAM independent of text and speaker. In National seminar on Devices, Circuits, and Communications, B. I. T. Mesra, Ranchi, pp. 6–7.Google Scholar
  65. Kao, Y.-H., & Lee, L.-S. (2006). Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In INTERSPEECH-ICSLP, Pittsburgh, Pennsylvania, pp. 1814–1817.Google Scholar
  66. Kim, J. B., Park, J. S., Oh, Y. H. (2011). On-line speaker adaptation based emotion recognition using incremental emotional information. In ICASSP, Prague, Czech Republic, pp. 4948–4951.Google Scholar
  67. Koolagudi, S. G., Devliyal, S., Chawla, B., Barthwal, A., & Rao, K. S. (2012). Recognition of emotions from speech using excitation source features. Procedia Engineering, 38, 3409–3417.CrossRefGoogle Scholar
  68. Koolagudi, S. G., & Krothapalli, S. R. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(4), 495–511.CrossRefGoogle Scholar
  69. Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabati, S., & Rao, K. S. (2009). IITKGP-SESC: Speech database for emotion analysis. Communications in computer and information science, LNCS (pp. 485–492). Berlin: Springer.Google Scholar
  70. Koolagudi, S. G., & Rao, K. S. (2012a). Emotion recognition from speech: A review. International Journal of Speech Technology, 15, 99–117.CrossRefGoogle Scholar
  71. Koolagudi, S. G., & Rao, K. S. (2012b). Emotion recognition from speech using source, system, and prosodic features. International Journal of Speech Technology, 15(2), 265–289.CrossRefGoogle Scholar
  72. Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In International conference on signal processing and communications (SPCOM).Google Scholar
  73. Krothapalli, S. R., & Koolagudi, S. G. (2013). Characterization and recognition of emotions from speech using excitation source information. International Journal of Speech Technology, 16(2), 181–201.CrossRefGoogle Scholar
  74. Kwon, O.-W., Chan, K., Hao, J., & Lee, T.-W. (2003). Emotion recognition by speech signals. In EUROSPEECH, pp. 125–128,.Google Scholar
  75. Lanjewar, R. B., Mauhurkar, S., & Patel, N. (2015). Implementation and comparison of speech emotion recognition system using Gaussian mixture model and K-nearest neighbor techniques. Procedia Computer Science, 49, 50–57.CrossRefGoogle Scholar
  76. Lazarus, R. S. (1991). Emotion & adaptation. New York: Oxford University Press.Google Scholar
  77. Lee, C. M., & Narayanan, S. (2003). Emotion recognition using a data-driven fuzzy inference system. In European conference on speech and language processing (EUROSPEECH), Geneva, Switzerland, pp. 157–160.Google Scholar
  78. Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.CrossRefGoogle Scholar
  79. Lee, C. M., Narayanan, S., & Pieraccini, R. (2001). Recognition of negative emotion in the human speech signals. In Workshop on auto, speech recognition and understanding.Google Scholar
  80. Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh, A., Busso, C., Deng, Z. et al. (2004). Emotion recognition based on phoneme classes. In 8th international conference on spoken language processing, INTERSPEECH 2004, Korea.Google Scholar
  81. Lee, C.-C., Mower, E., Busso, C., Lee, S., & Narayanan, S. (2011). Emotion recognition using a hierarchical binary decision tree approach. Speech Communication, 53, 1162–1171.CrossRefGoogle Scholar
  82. Lida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus based synthesis system with emotion. Speech Communication, 40, 161–187.CrossRefzbMATHGoogle Scholar
  83. Lin, Y.-L., & Wei, G. (2005). Speech emotion recognition based on HMM and SVM. In: Fourth International conference on machine learning and cybernetics, Guangzhou, pp. 4898–4901.Google Scholar
  84. Lotfian, R., & Busso, C. (2015). Emotion recognition using synthetic speech as neutral reference. In IEEE International conference on ICASSP, pp. 4759–4763.Google Scholar
  85. Luengo, I., Navas, E., Hernáez, I., & Sánchez, J. (2005). Automatic emotion recognition using prosodic parameters. In INTERSPEECH, Lisbon, Portugal, pp. 493–496.Google Scholar
  86. Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In ICASSP, Honolulu, Hawaii, pp. IV17–IV20.Google Scholar
  87. Makarova, V., & Petrushin, V. A. (2002). RUSLANA: A database of Russian emotional utterances. In 7th International conference on spoken language processing (ICSLP 02), pp. 2041–2044.Google Scholar
  88. Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.CrossRefGoogle Scholar
  89. McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., & Stroeve, S. (2000) Approaching automatic recognition of emotion from voice: A rough benchmark. In Proceedings of ISCA workshop speech emotion, pp. 207–212.Google Scholar
  90. McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2007). The SEMAINE database: Annotated multimodal records of emotionally coloured conversations between a person and a limited agent. Journal of LATEX Class Files, 6(1), 1–14.Google Scholar
  91. Mencattini, A., Martinelli, E., Costantini, G., Todisco, M., Basile, B., Bozzali, M., & Di Natale, C. (2014). Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure. Knowledge-Based Systems, 63, 68–81.CrossRefGoogle Scholar
  92. Mirsamadi, S., Barsoum, E., & Zhang, C. (2017). Automatic speech emotion recognition using recurrent neural networks with local attention. In Proceedings of IEEE conference on ICASSP, pp. 2227–2231.Google Scholar
  93. Mohanty, S., & Swain, B. K. (2010). Emotion recognition using fuzzy K-means from Oriya speech. In International Conference [ACCTA-2010] on Special Issue of IJCCT, Vol. 1 Issue 2–4.Google Scholar
  94. Montero, J. M., Gutiérrez-Arriola, J., Colás, J., Enríquez, E., & Pardo, J. M. (1999). Analysis and modeling of emotional speech in Spanish. In Proceedings of international conference on phonetic sciences, pp. 957–960.Google Scholar
  95. Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 49, 98–112.CrossRefGoogle Scholar
  96. Nakatsu, R., Nicholson, J., & Tosa, N. (2000). Emotion recognition and its application to computer agents with spontaneous interactive capabilities. Knowledge-Based Systems, 13, 497–504.CrossRefGoogle Scholar
  97. Nandi, D., Pati, D., & Rao, K. S. (2017). Parametric representation of excitation source information for language identification. Computer Speech and Language, 41, 88–115.CrossRefGoogle Scholar
  98. Neiberg, D., Elenius, K., & Laskowski, K. (2006). Emotion recognition in spontaneous speech using GMMs. In INTERSPEECH 2006, ICSLP, Pittsburgh, Pennsylvania, pp. 809–812.Google Scholar
  99. New, T. L., Wei, F. S., & De Silva, L. C. (2001). Speech based emotion classification. In Proceedings of the IEEE region 10 international conference on electrical and electronic technology (TENCON), Phuket Island, Singapore, Vol. 1, pp 297–301.Google Scholar
  100. New, T. L., Wei, F. S., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech Communication, 41, 603–623.CrossRefGoogle Scholar
  101. Nicholson, J., Takahashi, K., & Nakatsu, R. (2006). Emotion recognition in speech using neural networks. Neural Computing & Applications, 11, 290–296.zbMATHGoogle Scholar
  102. Nogueiras, A., Marino, J. B., Moreno, A., & Bonafonte, A. (2001). Speech emotion recognition using hidden Markov models. In Proceedings of European conference on speech communication and technology (Eurospeech’01), Denmark.Google Scholar
  103. Nordstrand, L., Svanfeld, G., Granstrom, B., & House, D. (2004). Measurements of ariculatory variation in expressive speech for a set of Swedish vowels. Speech Communication, 44, 187–196.CrossRefGoogle Scholar
  104. Ooi, C. S., Seng, K. P., Ang, L.-M., & Chew, L. W. (2014). A new approach of audio emotion recognition. Experts Systems with Applications, 41, 5858–5869.CrossRefGoogle Scholar
  105. Pao, T.-L., Chen, Y.-T., Yeh, J.-H., & Liao, W.-Y. (2005). Combining acoustic features for improved emotion recognition in Mandarin speech. In International conference on affective computing and intelligent interaction, pp. 279–285.Google Scholar
  106. Park, C.-H., & Sim, K.-B. (2003). Emotion recognition and acoustic analysis from speech signal. In Proceedings of the international joint conference on neural networks, pp. 2594–2598.Google Scholar
  107. Pereira, C. (2000). Dimensions of emotional meaning in speech. In Proceedings of ISCA workshop speech and emotion, Belfast, Vol. 1, pp. 25–28.Google Scholar
  108. Petrushin, V. A. (1999). Emotion in speech: Recognition and application to call centers. In Proceedings of the 1999 conference on artificial neural networks in engineering (ANNIE 99).Google Scholar
  109. Picard, R. W. (1997). Affective computing. Cambridge: The MIT Press.CrossRefGoogle Scholar
  110. Picard, R. W., Vyzas, E., & Healey, J. (2001). Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1175–1191.CrossRefGoogle Scholar
  111. Power, M., & Dalgleish, T. (2000). Cognition and emotion from order to disorder. New York: Psychology Press.Google Scholar
  112. Prasanna, S. R. M., & Govind, D. (2010). Analysis of excitation source information in emotional speech. In INTERSPEECH 2010, Makuhari, Chiba, Japan, pp. 781–784.Google Scholar
  113. Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.CrossRefGoogle Scholar
  114. Pravena, D., & Govind, D. (2017). Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals. International Journal of Speech Technology, 20(4), 787–797.CrossRefGoogle Scholar
  115. Pravena, D., & Govind, D. (2017). Development of simulated emotion speech database for excitation source analysis. International Journal of Speech Technology, 20, 327–338.CrossRefGoogle Scholar
  116. Quiros-Ramirez, M. A., Polikovsky, S., Kameda, Y., & Onisawa, T. (2014). A spontaneous cross-cultural emotion database: Latin-America vs. Japan. In International conference on Kansei Engineering and emotion research, pp. 1127–1134.Google Scholar
  117. Rabiner, L., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.zbMATHGoogle Scholar
  118. Rahurkar, M. A., & Hansen, J. H. (2002). Frequency band analysis for stress detection using a Teager energy operator based feature. Proceedings of International Conference on Spoken Language Processing (ICSLP’), Vol. 3, issue 02, pp. 2021–2024.Google Scholar
  119. Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. In IEEE transactions on audio, speech and language processing, Vol. 14, p. 3.Google Scholar
  120. Rao, K. S., & Koolagudi, S. G. (2011). Identification of Hindi dialects and emotions using spectral and prosodic features of speech. Systemics, Cybernetics, and Informatics, 9(4), 24–33.Google Scholar
  121. Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16(2), 143–160.CrossRefGoogle Scholar
  122. Rao, K. S., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S. V. S. K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3, 3603–3607.Google Scholar
  123. Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765.CrossRefGoogle Scholar
  124. Rao, K. S., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. In IEEE transactions on audio and speech, pp. 972–980.Google Scholar
  125. Rong, J., Li, G., & Chen, Y. P. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing and Management, 45, 315–328.CrossRefGoogle Scholar
  126. Rozgic, V., Ananthakrishnan, S., Saleem, S., Kumar, R., Vembu, A. N., & Prasad, R. (2012). Emotion recognition using acoustic and lexical features. In INTERSPEECH, Portland, USA.Google Scholar
  127. Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39, 1161–1178.CrossRefGoogle Scholar
  128. Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76, 805–819.CrossRefGoogle Scholar
  129. Russell, J. A., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. Journal of Research in Personality, 11, 273–294.CrossRefGoogle Scholar
  130. Salovey, P., Kokkonen, M., Lopes, P., & Mayer, J. (2004). Emotional Intelligence: What do we know? In ASR Manstead, N. H. Frijda & A. H. Fischer (Eds.), Feelings and emotions: The Amsterdam symposium (pp. 321–340). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  131. Schachter, S., & Singer, J. (1962). Cognitive, social, and physiological determinants of emotional state. Psychological Review, 69, 379–399.CrossRefGoogle Scholar
  132. Scherer, K. R., Grandjean, D., Johnstone, T., Klasmeyer, G., & Banziger, T. (2002). Acoustic correlates of task load and stress. In Proceedings of international conference on spoken language processing (ICSLP’02), Colorado, Vol. 3, pp. 2017–2020.Google Scholar
  133. Schroder, M. (2000). Experimental study of affect bursts. In Proceedings of ISCA workshop speech and emotion, Vol. 1, pp. 132–137.Google Scholar
  134. Schroder, M., & Grice, M. (2003). Expressing vocal effort in concatenative synthesis. In Proceedings of international conference on phonetic sciences (ICPhS’03), Barcelona, pp. 2589–2592.Google Scholar
  135. Schubert, E. (1999). Measurement and time series analysis of emotion in music, Ph.D dissertation, school of Music education, University of New South Wales, Sydeny, Australia.Google Scholar
  136. Schuller, B., Rigoll, G., & Lang, M. (2003). Hidden Markov model based speech emotion recognition. In Proceedings of the International conference on multimedia and Expo, ICME.Google Scholar
  137. Schuller, B., Rigoll, G., & Lang, M. (2004). Speech emotion recognition combining acoustic features and linguistis information in a hybrid support vector machine-belief network architecture. In Proceedings of international conference on acoustics, speech and signal processing (ICASSP’04), Vol. 1, pp. 557–560.Google Scholar
  138. Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.CrossRefGoogle Scholar
  139. Slaney, M., & McRoberts, G. (2003). Babyears: A recognition system for affective vocalizations. Speech Comunnication, 39, 367–384.CrossRefzbMATHGoogle Scholar
  140. Song, P., Ou, S., Zheng, W., Jin, Y., & Zhao, L. (2016). Speech emotion recognition using transfer non-negative matrix factorization. In Proceedings of IEEE international conference ICASSP, pp. 5180–5184.Google Scholar
  141. Sun, R., & Moore, E. (2011). Investigating glottal parameters and teager energy operators in emotion recognition. In Affective Computing and Intelligent Interaction, pp. 425–434.Google Scholar
  142. Takahashi, K. (2004). Remarks on SVM-based emotion recognition from multi-modal bio-potential signals. In 13th IEEE international workshop on robot and human interactive communication, Roman.Google Scholar
  143. Tao, J., & Kang, Y. (2005). Features importance analysis for emotional speech classification. In Affective Computing and Intelligent Interaction, pp. 449–457.Google Scholar
  144. Tato, R., Santos, R., Kompe, R., & Pardo, J. M. (2002). Emotional space improves emotion recognition. In Proceedings of international conference on spoken language processing (ICSLP’02), Colorado, Vol. 3, pp. 2029–2032.Google Scholar
  145. Tomkins, S. (1962). Affect imagery and consciousness: The positive affects, Vol. 1. New York: Springer.Google Scholar
  146. University of Pennsylvania Linguistic Data Consortium. (2002). Emotional prosody speech and transcripts. Retrieved from
  147. Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features and methods. Speech Communication, 48, 1162–1181.CrossRefGoogle Scholar
  148. Ververidis, D., Kotropoulos, C., & Pitas, I. (2004). Automatic emotional speech classification. In Proceedings of international conference on acoustics, speech and signal processing (ICASSP’04), Montreal, Vol. 1, pp. 593–596.Google Scholar
  149. Vidrascu, L., & Devillers, L. (2005). Detection of real-life emotions in call centers. In INTERSPEECH, Lisbon, Portugal, pp. 1841–1844.Google Scholar
  150. Vogt, T., & André, E. (2006). Improving automatic from speech via gender differentiation. In Proceedings of language resources and evaluation conference (LREC 2006), Genoa.Google Scholar
  151. Wakita, H. (1976). Residual energy of linear prediction to vowel and speaker recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24, 270–271.CrossRefGoogle Scholar
  152. Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015). Speech emotion recognition using Fourier parameters. IEEE Transcations on Affective Computing, 6(1), 69–75.Google Scholar
  153. Wang, Y., Du, S., & Zhan, Y. (2008). Adaptive and optimal classification of speech emotion recognition. In Fourth international conference on natural computation, pp. 407–411.Google Scholar
  154. Wang, Y., & Guan, L. (2004). An investigation of speech based human emotion recognition. In IEEE 6th workshop on multimedia signal processing.Google Scholar
  155. Werner, S., & Keller, E. (1994). Prosodic aspects of speech. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition: Basic concepts, state of the art, the future challenges (pp. 23–40). Chichester: Wiley.Google Scholar
  156. Wu, S., Falk, T. H., & Chan, W.-Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785.CrossRefGoogle Scholar
  157. Wu, T., Yang, Y., Wu, Z., & Li, D. (2006). MASC: a speech corpus in mandarin for emotion analysis and affective speaker recognition. In Speaker and language recognition workshop.Google Scholar
  158. Wu, W., Zheng, T. F., Xu, M.-X., & Bao, H.-J. (2006). Study on speaker verification on emotional speech. In INTERSPEECH’06, Piisburgh, Pennsylvania, pp. 2102–2105.Google Scholar
  159. Wundt, W. (2013). An introduction to psychology. Read Books Ltd.Google Scholar
  160. Yamagishi, J., Onishi, K., Maskko, T., & Kobayashi, T. (2003). Emotion recognition using a data-driven fuzzy inference system. Geneva: Eurospeech.Google Scholar
  161. Yegnanarayana, B., & Gangashetty, S. (2011). Epoch-based analysis of speech signals. S¯adhan¯ a, 36(5), 651–697.Google Scholar
  162. Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speechspecific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.CrossRefGoogle Scholar
  163. Yeh, L., & Chi, T. (2010). Spectro-temporal modulations for robust speech emotion recognition. In INTERSPEECH, Chiba, Japan, pp. 789–792.Google Scholar
  164. Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Busso, C., Deng, Z., Lee, S., & Narayanan, S. (2004). An acoustic study of emotions expressed in speech. In Proceedings of International Conference on Spoken Language Processing (ICSLP’04), Korea, Vol. 1, pp. 2193–2196.Google Scholar
  165. You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (1997). Getting started with susas: a speech under simulated and actual stress database. Eurospeech, 4, 1743–1746.Google Scholar
  166. Yu, F., Chang, E., Xu, Y.-Q., & Shum, H.-Y. (2001). Emotion detection from speech to enrich multimedia content. In: Proceedings of IEEE Pacific-Rim Conference on Multimedia, Beijing, Vol. 1, pp. 550–557.Google Scholar
  167. Yuan, J., Shen, L., & Chen, F. (2002). The acoustic realization of anger, fear, joy and sadness in Chinese. In Proceedings of International Conference on Spoken Language Processing (ICSLP’02), Vol. 3, pp. 2025–2028.Google Scholar
  168. Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Sun et al. (Ed.), Advances in neural networks. Lecture notes in computer science (pp. 457–464). Berlin: Springer.Google Scholar
  169. Zhang, T., Hasegawa-Johnson, M., & Levinson, S. E. (2004). Children’s emotion recognition in an intelligent tutoring scenario. In Proceeding of the eighth European Conference on Speech Communication and Technology, INTERSPEECH.Google Scholar
  170. Zhu, A., & Luo, Q. (2007). Study on speech emotion recognition system in E-learning. In J. Jacko (Ed.), Human computer interaction, Part III, HCII (pp. 544–552). Berlin: Springer.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringSilicon Institute of TechnologyBhubaneswarIndia
  2. 2.Electrical EngineeringIndian Institute of Technology KharagpurKharagpurIndia
  3. 3.Department of Electronics and CommunicationCV Raman College of EngineeringBhubaneswarIndia

Personalised recommendations