Advertisement

International Journal of Speech Technology

, Volume 21, Issue 1, pp 137–156 | Cite as

Speech emotion recognition research: an analysis of research focus

  • Mumtaz Begum MustafaEmail author
  • Mansoor A. M. Yusoof
  • Zuraidah M. Don
  • Mehdi Malekzadeh
Article
  • 644 Downloads

Abstract

This article analyses research in speech emotion recognition (“SER”) from 2006 to 2017 in order to identify the current focus of research, and areas in which research is lacking. The objective is to examine what is being done in this field of research. Searching on selected keywords, we extracted and analysed 260 articles from well-known online databases. The analysis indicates that SER research is an active field of research, dozens of articles being published each year in journals and conference proceedings. The majority of articles concentrate on three critical aspects of SER, namely (1) databases, (2) suitable speech features, and (3) classification techniques to maximize the recognition accuracy of SER systems. Having carried out association analysis of the critical aspects and how they influence the performance of the SER system in term of recognition accuracy, we found that certain combination of databases, speech features and classifiers influence the recognition accuracy of the SER system. We have also suggested aspects of SER that could be taken into consideration in future works based on our review.

Keywords

Speech emotion recognition Emotional speech ASR system Emotional speech database Speech feature Classification of emotion Trend analysis 

Notes

Funding

The funding was provided by University of Malaya Research Grant (AFR (Frontier Science)) (Grant Number: RG284-14AFR), Postgraduate Research Grant (PPP) (Grant Number: PG220-2014B).

References

  1. Abdelwahab, M., & Busso, C. (2017). Incremental adaptation using active learning for acoustic emotion Recognition. In International conference on acoustics, speech and signal processing.Google Scholar
  2. Alam, M. J., Attabi, Y., Dumouchel, P., Kenny, P., & O’Shaughnessy, D. D. (2013). Amplitude modulation features for emotion recognition from speech. In INTERSPEECH (pp. 2420–2424).Google Scholar
  3. Albornoz, E. M., Crolla, M. B., & Milone, D. H. (2008). Recognition of emotions in speech. In Proceedings of XXXIV CLEI, Santa Fe Argentina, pp. 1120–1129.Google Scholar
  4. Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, 25(3), 556–570.CrossRefGoogle Scholar
  5. Altun, H., & Polat, G. (2009). Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Systems with Applications, 36(4), 8197–8203.CrossRefGoogle Scholar
  6. Álvarez, A., Cearreta, I., López, J. M., Arruti, A., Lazkano, E., Sierra, B., & Garay, N. (2007). A comparison using different speech parameters in the automatic emotion recognition using Feature Subset Selection based on Evolutionary Algorithms. In International conference on text, speech and dialogue (pp. 423–430). Berlin: Springer.CrossRefGoogle Scholar
  7. Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2012). Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155–177.CrossRefGoogle Scholar
  8. Ananthakrishnan, S., Vembu, A. N., & Prasad, R. (2011). Model-based parametric features for emotion recognition from speech. In 2011 IEEE workshop on automatic speech recognition and understanding (ASRU), (pp. 529–534). Piscataway: IEEE.CrossRefGoogle Scholar
  9. Arias, J. P., Busso, C., & Yoma, N. B. (2013). Energy and F0 contour modeling with functional data analysis for emotional speech detection. In INTERSPEECH (pp. 2871–2875).Google Scholar
  10. Arias, J. P., Busso, C., & Yoma, N. B. (2014). Shape-based modeling of the fundamental frequency contour for emotion detection in speech. Computer Speech & Language, 28(1), 278–294.CrossRefGoogle Scholar
  11. Atassi, H., & Esposito, A. (2008). A speaker independent approach to the classification of emotional vocal expressions. In 20th IEEE international conference on tools with artificial intelligence, 2008. ICTAI08. (Vol. 2, pp. 147–152). Piscataway: IEEE.CrossRefGoogle Scholar
  12. Atassi, H., Smekal, Z., & Esposito, A. (2012). Emotion recognition from spontaneous Slavic speech. In 2012 IEEE 3rd international conference on cognitive infocommunications (CogInfoCom) (pp. 389–394). Piscataway: IEEE.CrossRefGoogle Scholar
  13. Athanaselis, T., Bakamidis, S., Dologlou, I., Cowie, R., Douglas-Cowie, E., & Cox, C. (2005). ASR for emotional speech: Clarifying the issues and enhancing performance. Neural Networks, 18(4), 437–444.CrossRefGoogle Scholar
  14. Attabi, Y., & Dumouchel, P. (2012). Emotion recognition from speech: WOC-NN and class-interaction. In 2012 11th international conference on information science, signal processing and their applications (ISSPA) (pp. 126–131). Piscataway: IEEE.CrossRefGoogle Scholar
  15. Attabi, Y., & Dumouchel, P. (2013). Anchor models for emotion recognition from speech. IEEE Transactions on Affective Computing, 4(3), 280–290.CrossRefGoogle Scholar
  16. Bahreini, K., Nadolski, R., & Westera, W. (2016). Towards real-time speech emotion recognition for affective e-learning. Education and Information Technologies, 21(5), 1367–1386.CrossRefGoogle Scholar
  17. Balti, H., & Elmaghraby, A. S. (2013). Speech emotion detection using time dependent self organizing maps. In 2013 IEEE international symposium on signal processing and information technology (ISSPIT) (pp. 000470–000478). Piscataway: IEEE.Google Scholar
  18. Barra Chicote, R., Fernández Martínez, F., Lutfi, L., Binti, S., Lucas Cuesta, J. M., Macías Guarasa, J., … Pardo Muñoz, J. M. (2009). Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions. ISCA.Google Scholar
  19. Batliner, A., Schuller, B., Seppi, D., Steidl, S., Devillers, L., Vidrascu, L., & Amir, N. (2011). The automatic recognition of emotions in speech. In Emotion-oriented systems (pp. 71–99). Berlin Heidelberg: Springer.CrossRefGoogle Scholar
  20. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., … Aharonson, V. (2006). Combining efforts for improving automatic classification of emotional user states. Proc. IS-LTC 240–245.Google Scholar
  21. Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Devillers, L., … Aharonson, V. (2007). The impact of F0 extraction errors on the classification of prominence and emotion. Proc. ICPhS 2201–2204.Google Scholar
  22. Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., & Rose, R. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10), 763–786.CrossRefGoogle Scholar
  23. Bertero, D., & Fung, P. (2017). A first look into a Convolutional Neural Network for speech emotion detection. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5115–5119). Piscataway: IEEE.CrossRefGoogle Scholar
  24. Bhaykar, M., Yadav, J., & Rao, K. S. (2013). Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM. 2013 national conference on communications (NCC) (pp. 1–5). Piscataway: IEEE.Google Scholar
  25. Bitouk, D., Nenkova, A., & Verma, R. (2009). Improving emotion recognition using class-level spectral features. In INTERSPEECH (pp. 2023–2026).Google Scholar
  26. Böck, R., Hübner, D., & Wendemuth, A. (2010). Determining optimal signal features and parameters for hmm-based emotion classification. In 15th IEEE mediterranean electrotechnical conference MELECON 2010–2010 (pp. 1586–1590). Piscataway: IEEE.CrossRefGoogle Scholar
  27. Bojanić, M., Crnojević, V., & Delić, V. (2012). Application of neural networks in emotional speech recognition. In 2012 11th symposium on neural network applications in electrical engineering (NEUREL) (pp. 223–226). Piscataway: IEEE.Google Scholar
  28. Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2010). Use of line spectral frequencies for emotion recognition from speech. In 2010 20th international conference on pattern recognition (ICPR) (pp. 3708–3711). Piscataway: IEEE.CrossRefGoogle Scholar
  29. Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication, 53(9), 1186–1197.CrossRefGoogle Scholar
  30. Bozkurt, E., Erzin, E., Eroğlu Erdem, Ç, & Erdem, T. (2009). Improving automatic emotion recognition from speech signals. In 10th annual conference of the international speech communication association 2009 (INTERSPEECH 2009). International Speech Communications Association.Google Scholar
  31. Brester, C., Semenkin, E., & Sidorov, M. (2016). Multi-objective heuristic feature selection for speech-based multilingual emotion recognition. Journal of Artificial Intelligence and Soft Computing Research, 6(4), 243–253.CrossRefGoogle Scholar
  32. Brooks, C. A., Thompson, C., & Kovanović, V. (2016). Introduction to data mining for educational researchers. In Proceedings of the 6th international conference on learning analytics & knowledge (pp. 505–506). ACM.Google Scholar
  33. Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 582–596.CrossRefGoogle Scholar
  34. Busso, C., Mariooryad, S., Metallinou, A., & Narayanan, S. (2013). Iterative feature normalization scheme for automatic emotion detection from speech. IEEE Transactions on Affective Computing, 4(4), 386–397.CrossRefGoogle Scholar
  35. Busso, C., Metallinou, A., & Narayanan, S. S. (2011). Iterative feature normalization for emotional speech detection. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5692–5695). Piscataway: IEEE.CrossRefGoogle Scholar
  36. Calvo, R. A., & D’Mello, S. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing, 1(1), 18–37.CrossRefGoogle Scholar
  37. Casale, S., Russo, A., Scebba, G., & Serrano, S. (2008). Speech emotion classification using machine learning algorithms. In 2008 IEEE international conference on semantic computing (pp. 158–165). Piscataway: IEEE.CrossRefGoogle Scholar
  38. Casale, S., Russo, A., & Serrano, S. (2010). Analysis of robustness of attributes selection applied to speech emotion recognition. In 2010 18th European signal processing conference (pp. 1174–1178). Piscataway: IEEE.Google Scholar
  39. Chakraborty, R., Pandharipande, M., & Kopparapu, S. K. (2016). Knowledge-based framework for intelligent emotion recognition in spontaneous speech. Procedia Computer Science, 96, 587–596.CrossRefGoogle Scholar
  40. Chandaka, S., Chatterjee, A., & Munshi, S. (2009). Support vector machines employing cross-correlation for emotional speech recognition. Measurement, 42(4), 611–618.CrossRefGoogle Scholar
  41. Chandrakala, S., & Sekhar, C. C. (2009). Combination of generative models and SVM based classifier for speech emotion recognition. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 497–502). Piscataway: IEEE.Google Scholar
  42. Chavhan, Y., Dhore, M. L., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.CrossRefGoogle Scholar
  43. Chavhan, Y. D., Yelure, B. S., & Tayade, K. N. (2015). Speech emotion recognition using RBF kernel of LIBSVM. In 2015 2nd international conference on electronics and communication systems (ICECS) (pp. 1132–1135). Piscataway: IEEE.CrossRefGoogle Scholar
  44. Chen, L., Mao, X., Wei, P., Xue, Y., & Ishizuka, M. (2012). Mandarin emotion recognition combining acoustic and emotional point information. Applied Intelligence, 37(4), 602–612.CrossRefGoogle Scholar
  45. Chenchah, F., & Lachiri, Z. (2014). Speech emotion recognition in acted and spontaneous context. Procedia Computer Science, 39, 139–145.CrossRefGoogle Scholar
  46. Cheng, X., & Duan, Q. (2012). Speech emotion recognition using gaussian mixture model. In The 2nd international conference on computer application and system modeling.Google Scholar
  47. Chiou, B. C., & Chen, C. P. (2013). Feature space dimension reduction in speech emotion recognition using support vector machine. In Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific (pp. 1–6). Piscataway: IEEE.Google Scholar
  48. Christina, I. J., & Milton, A. (2012). Analysis of all pole model to recognize emotions from speech signal. In 2012 international conference on computing, electronics and electrical technologies (ICCEET) (pp. 723–728). Piscataway: IEEE.CrossRefGoogle Scholar
  49. Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schroder, M. (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In Proceedings of ISCA speech and emotion workshop, pp 19–24.Google Scholar
  50. Cummins, N., Amiriparian, S., Hagerer, G., Batliner, A., Steidl, S., & Schuller, B. (2017). An Image-based deep spectrum feature representation for the recognition of emotional speech. In Proceedings of the 25th ACM international conference on multimedia, MM. Piscataway: IEEE.Google Scholar
  51. D’Mello, S., & Kory, J. (2012). Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies. In Proceedings of the 14th ACM international conference on multimodal interaction (pp. 31–38). ACM.Google Scholar
  52. Dai, K., Fell, H. J., & MacAuslan, J. (2008). Recognizing emotion in speech using neural networks. Telehealth and Assistive Technologies, 31, 38–43.Google Scholar
  53. Delic, V., Bojanic, M., Gnjatovic, M., Secujski, M., & Jovicic, S. T. (2012). Discrimination capability of prosodic and spectral features for emotional speech recognition. Elektronika ir Elektrotechnika, 18(9), 51–54.CrossRefGoogle Scholar
  54. Deng, J., Han, W., & Schuller, B. (2012). Confidence measures for speech emotion recognition: A start. In Proceedings of speech communication; 10. ITG symposium (pp. 1–4). VDE.Google Scholar
  55. Deng, J., Xu, X., Zhang, Z., Frühholz, S., Grandjean, D., & Schuller, B. (2017). Fisher kernels on phase-based features for speech emotion recognition. In Dialogues with social robots (pp. 195–203). Springer: Singapore.CrossRefGoogle Scholar
  56. Deng, J., Zhang, Z., Eyben, F., & Schuller, B. (2014). Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Processing Letters, 21(9), 1068–1072.CrossRefGoogle Scholar
  57. Deng, J., Zhang, Z., Marchi, E., & Schuller, B. (2013). Sparse autoencoder-based feature transfer learning for speech emotion recognition. In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. 511–516). Piscataway: IEEE.CrossRefGoogle Scholar
  58. Devillers, L., & Vidrascu, L. (2006). Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In 9th international conference on spoken language processing.Google Scholar
  59. Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: towards a new generation of databases. Speech Communication, 40, 33–60.zbMATHCrossRefGoogle Scholar
  60. Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., McRorie, M., Martin, J. C., Devillers, L., Abrilan, S., Batliner, A., Amir, N., & Karpouzis, K. (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of international conference affective computing and intelligent interaction, pp 488–500.Google Scholar
  61. Ekman, P. (1957). A methodological discussion of non-verbal behavior. Journal of Psychology, 43, 141–149.CrossRefGoogle Scholar
  62. Ekman, P. (1972). Universals and cultural differences in facial expression of emotion. In J. Cole (Ed.), Nebraska symposium on motivation (pp. 207–283). Lincoln, NE: University of Nebraska Press.Google Scholar
  63. Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion. Chichester: Wiley.Google Scholar
  64. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.zbMATHCrossRefGoogle Scholar
  65. Elbarougy, R., & Akagi, M. (2012). Speech emotion recognition system based on a dimensional approach using a three-layered model. In Signal & information processing association annual summit and conference (APSIPA ASC), 2012 Asia-Pacific (pp. 1–9). Piscataway: IEEE.Google Scholar
  66. Elbarougy, R., & Akagi, M. (2013). Cross-lingual speech emotion recognition system based on a three-layer model for human perception. In Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific (pp. 1–10). Piscataway: IEEE.Google Scholar
  67. Erdem, C. E., Bozkurt, E., Erzin, E., & Erdem, A. T. (2010). RANSAC-based training data selection for emotion recognition from spontaneous speech. In Proceedings of the 3rd international workshop on affective interaction in natural environments (pp. 9–14). ACM.Google Scholar
  68. Esmaileyan, Z., & Marvi, H. (2014). Recognition of emotion in speech using variogram based features. Malaysian Journal of Computer Science, 27(3), 156–170.Google Scholar
  69. Espinosa, H. P., García, C. A. R., & Pineda, L. V. (2010). Features selection for primitives estimation on emotional speech. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 5138–5141). Piscataway: IEEE.CrossRefGoogle Scholar
  70. Fayek, H. M., Lech, M., & Cavedon, L. (2016). On the correlation and transferability of features between automatic speech recognition and speech emotion recognition. In INTERSPEECH (pp. 3618–3622).Google Scholar
  71. Feraru, M., & Zbancioc, M. (2013). Speech emotion recognition for SROL database using weighted KNN algorithm. In 2013 international conference on electronics, computers and artificial intelligence (ECAI) (pp. 1–4). Piscataway: IEEE.Google Scholar
  72. Fernandez, R., & Picard, R. (2011). Recognizing affect from speech prosody using hierarchical graphical models. Speech Communication, 53(9), 1088–1103.CrossRefGoogle Scholar
  73. Firoz Shah, A., Vimal, K. V. R., Raji, S. A., Jayakumar, A., & Babu, A. P. (2009) Speaker independent automatic emotion recognition from speech: a comparison of MFCCs and discrete wavelet transforms. In Proceedings of international conference on advances in recent technologies in communication and computing, pp 528–531.Google Scholar
  74. Fu, L., Mao, X., & Chen, L. (2008a). Relative speech emotion recognition based artificial neural network. In Pacific-Asia workshop on computational intelligence and industrial application, 2008. PACIIA’08. (Vol. 2, pp. 140–144). Piscataway: IEEE.CrossRefGoogle Scholar
  75. Fu, L., Mao, X., & Chen, L. (2008b). Speaker independent emotion recognition based on SVM/HMMs fusion system. In International conference on audio, language and image processing, 2008. ICALIP 2008 (pp. 61–65). Piscataway: IEEE.Google Scholar
  76. Gamage, K. W., Sethu, V., & Ambikairajah, E. (2017). Salience based lexical features for emotion recognition. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5830–5834). Piscataway: IEEE.CrossRefGoogle Scholar
  77. Garg, V., Kumar, H., & Sinha, R. (2013). Speech based emotion recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers. In 2013 national conference on communications (NCC) (pp. 1–5). Piscataway: IEEE.Google Scholar
  78. Gaurav, M. (2008). Performance analysis of spectral and prosodic features and their fusion for emotion recognition in speech. In Spoken language technology workshop, 2008. SLT 2008 (pp. 313–316). Piscataway: IEEE.CrossRefGoogle Scholar
  79. Georgogiannis, A., & Digalakis, V. (2012). Speech emotion recognition using non-linear teager energy based features in noisy environments. In 2012 proceedings of the 20th European signal processing conference (EUSIPCO) (pp. 2045–2049). Piscataway: IEEE.Google Scholar
  80. Gharavian, D., Sheikhan, M., & Ashoftedel, F. (2013). Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model. Neural Computing and Applications, 22(6), 1181–1191.CrossRefGoogle Scholar
  81. Gharavian, D., Sheikhan, M., & Janipour, M. (2010). Pitch in emotional speech and emotional speech recognition using pitch frequency. Majlesi Journal of Electrical Engineering, 4(1).Google Scholar
  82. Gharavian, D., Sheikhan, M., Nazerieh, A., & Garoucy, S. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21(8), 2115–2126.CrossRefGoogle Scholar
  83. Gharsellaoui, S., Selouani, S. A., & Dahmane, A. O. (2015). Automatic emotion recognition using auditory and prosodic indicative features. In 2015 IEEE 28th Canadian conference on electrical and computer engineering (CCECE) (pp. 1265–1270). Piscataway: IEEE.CrossRefGoogle Scholar
  84. Giannoulis, P., & Potamianos, G. (2012). A hierarchical approach with feature selection for emotion recognition from speech. In LREC (pp. 1203–1206).Google Scholar
  85. Glüge, S., Böck, R., & Wendemuth, A. (2011). Segmented-memory recurrent neural networks versus hidden markov models in emotion recognition from speech. In IJCCI (NCTA) (pp. 308–315).Google Scholar
  86. Grimm, M., Kroschel, K., Mower, E., & Narayanan, S. (2007a). Primitives-based evaluation and estimation of emotions in speech. Speech Communication, 49(10), 787–800.CrossRefGoogle Scholar
  87. Grimm, M., Kroschel, K., & Narayanan, S. (2007b). Support vector regression for automatic recognition of spontaneous emotions in speech. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–1085). Piscataway: IEEE.Google Scholar
  88. Hamidi, M., & Mansoorizade, M. (2012). Emotion recognition from persian speech with neural network. International Journal of Artificial Intelligence & Applications, 3(5), 107.CrossRefGoogle Scholar
  89. Han, J., Zhang, Z., Ringeval, F., & Schuller, B. (2017). Prediction-based learning for continuous emotion recognition in speech. In 42nd IEEE international conference on acoustics, speech, and signal processing, ICASSP 2017.Google Scholar
  90. Han, K., Yu, D., & Tashev, I. (2014). Speech emotion recognition using deep neural network and extreme learning machine. In 15th annual conference of the international speech communication association.Google Scholar
  91. Harimi, A., Fakhr, H. S., & Bakhshi, A. (2016). Recognition of emotion using reconstructed phase space of speech. Malaysian Journal of Computer Science, 29(4), 262–271.CrossRefGoogle Scholar
  92. Hassan, A., & Damper, R. I. (2009). Emotion recognition from speech using extended feature selection and a simple classifier. In 10th annual conference of the international speech communication association.Google Scholar
  93. He, L., Lech, M., Maddage, N. C., & Allen, N. B. (2011). Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control, 6(2), 139–146.CrossRefGoogle Scholar
  94. Henríquez, P., Alonso, J. B., Ferrer, M. A., Travieso, C. M., & Orozco-Arroyave, J. R. (2014). Nonlinear dynamics characterization of emotional speech. Neurocomputing, 132, 126–135.CrossRefGoogle Scholar
  95. Hu, H., Xu, M. X., & Wu, W. (2007). Fusion of global statistical and segmental spectral features for speech emotion recognition. In INTERSPEECH (pp. 2269–2272).Google Scholar
  96. Hu, H., Xu, M. X., & Wu, W. (2007). GMM supervector based SVM with spectral features for speech emotion recognition. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–413). Piscataway: IEEE.Google Scholar
  97. Huang, R., & Ma, C. (2006). Toward a speaker-independent real-time affect detection system. In 18th international conference on pattern recognition, 2006. ICPR 2006. (Vol. 1, pp. 1204–1207). Piscataway: IEEE.Google Scholar
  98. Huang, Y., Wu, A., Zhang, G., & Li, Y. (2016). Speech emotion recognition based on deep belief networks and wavelet packet cepstral coefficients. International Journal of Simulation: Systems, Science and Technology, 17(28), 28–31.Google Scholar
  99. Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2014). Speech emotion recognition using CNN. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 801–804). ACM.Google Scholar
  100. Hussain, L., Shafi, I., Saeed, S., Abbas, A., Awan, I. A., Nadeem, S. A., … Rahman, B. (2017). A radial base neural network approach for emotion recognition in human speech. IJCSNS, 17(8), 52.Google Scholar
  101. Iliev, A. I., & Scordilis, M. S. (2008). Emotion recognition in speech using inter-sentence Glottal statistics. In 15th international conference on systems, signals and image processing, 2008. IWSSIP 2008. (pp. 465–468). Piscataway: IEEE.CrossRefGoogle Scholar
  102. Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falcão, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460.CrossRefGoogle Scholar
  103. Iliou, T., & Anagnostopoulos, C. N. (2009). Comparison of different classifiers for emotion recognition. In 13th panhellenic conference on informatics, 2009. PCI’09. (pp. 102–106). Piscataway: IEEE.CrossRefGoogle Scholar
  104. Iliou, T., & Anagnostopoulos, C. N. (2010a). SVM-MLP-PNN classifiers on speech emotion recognition field—A comparative study. In 2010 fifth international conference on digital telecommunications (ICDT) (pp. 1–6). Piscataway: IEEE.Google Scholar
  105. Iliou, T., & Anagnostopoulos, C. N. (2010b). Classification on speech emotion recognition-a comparative study. Animation, 4, 5.Google Scholar
  106. Iriondo, I., Planet, S., Alías, F., Socoró, J. C., & Martínez, E. (2007). Validation of an expressive speech corpus by mapping automatic classification to subjective evaluation. Computational and Ambient Intelligence, 646–653.Google Scholar
  107. Ivanov, A., & Riccardi, G. (2012). Kolmogorov-Smirnov test for feature selection in emotion recognition from speech. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5125–5128). Piscataway: IEEE.CrossRefGoogle Scholar
  108. Javidi, M. M., & Roshan, E. F. (2013). Speech emotion recognition by using combinations of C5. 0, neural network (NN), and support vector machines (SVM) classification methods. International Journal of Applied Mathematics and Computer Science, 6, 191–200.Google Scholar
  109. Jeon, J. H., Xia, R., & Liu, Y. (2011). Sentence level emotion recognition based on decisions from subsentence segments. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4940–4943). Piscataway: IEEE.CrossRefGoogle Scholar
  110. Jiang, J., Wu, Z., Xu, M., Jia, J., & Cai, L. (2012). Comparison of adaptation methods for GMM-SVM based speech emotion recognition. In 2012 IEEE spoken language technology workshop (SLT) (pp. 269–273). Piscataway: IEEE.CrossRefGoogle Scholar
  111. Jin, Q., Li, C., Chen, S., & Wu, H. (2015). Speech emotion recognition with acoustic and lexical features. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4749–4753). Piscataway: IEEE.CrossRefGoogle Scholar
  112. Kamińska, D., & Pelikant, A. (2012). Recognition of human emotion from a speech signal based on Plutchik’s model. International Journal of Electronics and Telecommunications, 58(2), 165–170.CrossRefGoogle Scholar
  113. Kandali, A. B., Routray, A., & Basu, T. K. (2008). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In TENCON 2008–2008 IEEE region 10 conference (pp. 1–5). Piscataway: IEEE.Google Scholar
  114. Khan, M., Goskula, T., Nasiruddin, M., & Quazi, R. (2011). Comparison between k-nn and svm method for speech emotion recognition. International Journal on Computer Science and Engineering, 3(2), 607–611.Google Scholar
  115. Khanna, P., & Kumar, M. S. (2011). Application of vector quantization in emotion recognition from human speech. In International conference on information intelligence, systems, technology and management (pp. 118–125). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
  116. Kim, E. H., Hyun, K. H., Kim, S. H., & Kwak, Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics, 14(3), 317–325.CrossRefGoogle Scholar
  117. Kim, E. H., Hyun, K. H., & Kwak, Y. K. (2006). Improvement of emotion recognition from voice by separating of obstruents. In The 15th IEEE international symposium on robot and human interactive communication, 2006. ROMAN 2006. (pp. 564–568). Piscataway: IEEE.CrossRefGoogle Scholar
  118. Kim, J. B., Park, J. S., & Oh, Y. H. (2011). On-line speaker adaptation based emotion recognition using incremental emotional information. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4948–4951). Piscataway: IEEE.CrossRefGoogle Scholar
  119. Kim, J. B., Park, J. S., & Oh, Y. H. (2012). Speaker-characterized emotion recognition using online and iterative speaker adaptation. Cognitive Computation, 4(4), 398–408.CrossRefGoogle Scholar
  120. Kim, S., Georgiou, P. G., Lee, S., & Narayanan, S. (2007). Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In IEEE 9th workshop on multimedia signal processing, 2007. MMSP 2007 (pp. 48–51).Google Scholar
  121. Kishore, K. K., & Satish, P. K. (2013). Emotion recognition in speech using MFCC and wavelet features. In 2013 IEEE 3rd international advance computing conference (IACC) (pp. 842–847). Piscataway: IEEE.CrossRefGoogle Scholar
  122. Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, Keele University 33.Google Scholar
  123. Koolagudi, S. G., & Krothapalli, R. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14(1), 35–48.CrossRefGoogle Scholar
  124. Koolagudi, S. G., & Krothapalli, S. R. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(4), 495–511.CrossRefGoogle Scholar
  125. Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech using source, system, and prosodic features. International Journal of Speech Technology, 15(2), 265–289.CrossRefGoogle Scholar
  126. Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In 2010 international conference on signal processing and communications (SPCOM) (pp. 1–5). Piscataway: IEEE.Google Scholar
  127. Kostoulas, T., Ganchev, T., Lazaridis, A., & Fakotakis, N. (2010) Enhancing Emotion recognition from speech through feature selection. In P. Sojka, A. Horák, I. Kopecek & K. Pala (Eds.) Text, speech and dialogue, lecture notes in artificial intelligence, Vol. 6231, pp. 338–344.Google Scholar
  128. Kostoulas, T., Ganchev, T., Mporas, I., & Fakotakis, N. (2007) Detection of negative emotional states in real-world scenario. In Proceedings of 19th IEEE international conference on tools with artificial intelligence, pp 502–509.Google Scholar
  129. Kotti, M., Paterno, F., & Kotropoulos, C. (2010). Speaker-independent negative emotion recognition. In 2010 2nd international workshop on cognitive information processing (CIP) (pp. 417–422). Piscataway: IEEE.Google Scholar
  130. Le, D., Aldeneh, Z., & Provost, E. M. (2017). Discretized continuous speech emotion recognition with multi-task deep recurrent neural network. Interspeech, 2017.Google Scholar
  131. Le, D., & Provost, E. M. (2013). Emotion recognition from spontaneous speech using hidden markov models with deep belief networks. In 2013 IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 216–221). Piscataway: IEEE.CrossRefGoogle Scholar
  132. Lee, J., & Tashev, I. (2015). High-level feature representation using recurrent neural network for speech emotion recognition. In INTERSPEECH (pp. 1537–1540).Google Scholar
  133. Lefter, I., Rothkrantz, L. J., Wiggers, P., & Van Leeuwen, D. A. (2010). Emotion recognition from speech by combining databases and fusion of classifiers. In Text, speech and dialogue (pp. 353–360). Berlin Heidelberg: Springer.CrossRefGoogle Scholar
  134. Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., … Sahli, H. (2013). Hybrid deep neural network–hidden markov model (DNN-HMM) based speech emotion recognition. In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. 312–317). Piscataway: IEEE.CrossRefGoogle Scholar
  135. Li, Y., Chao, L., Liu, Y., Bao, W., & Tao, J. (2015) From simulated speech to natural speech, what are the robust features for emotion recognition? In International conference on affective computing and intelligent interaction (ACII) (pp. 368–373). Piscataway: IEEEGoogle Scholar
  136. Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In Signal and information processing association annual summit and conference (APSIPA), 2016 Asia-Pacific (pp. 1–4). Piscataway: IEEE.Google Scholar
  137. Litman, D. J., & Forbes-Riley, K. (2006). Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication, 48(5), 559–590.CrossRefGoogle Scholar
  138. Liu, J., Chen, C., Bu, J., You, M., & Tao, J. (2007). Speech emotion recognition based on a fusion of all-class and pairwise-class feature selection. Computational Science–ICCS 2007, 168–175.Google Scholar
  139. Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490–501.CrossRefGoogle Scholar
  140. Lugger, M., Janoir, M. E., & Yang, B. (2009). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In Signal processing conference, 2009 17th European (1225–1229). Piscataway: IEEE.Google Scholar
  141. Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–17). Piscataway: IEEE.Google Scholar
  142. Lugger, M., & Yang, B. (2007). An incremental analysis of different feature groups in speaker independent emotion recognition. In 16th Int. congress of phonetic sciences.Google Scholar
  143. Mannepalli, K., Sastry, P. N., & Suman, M. (2016). A novel adaptive fractional deep belief networks for speaker emotion recognition. Alexandria Engineering Journal.Google Scholar
  144. Mao, Q., Xue, W., Rao, Q., Zhang, F., & Zhan, Y. (2016). Domain adaptation for speech emotion recognition by sharing priors between related source and target classes. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2608–2612). Piscataway: IEEE.CrossRefGoogle Scholar
  145. Mao, X., Chen, L., & Fu, L. (2009). Multi-level speech emotion recognition based on HMM and ANN. In 2009 WRI World congress on computer science and information engineering (Vol. 7, pp. 225–229). Piscataway: IEEE.CrossRefGoogle Scholar
  146. Mao, X., Zhang, B., & Luo, Y. (2007). Speech emotion recognition based on a hybrid of HMM/ANN. In Proceedings of the 7th conference on 7th WSEAS international conference on applied informatics and communications (Vol. 7, pp. 367–370).Google Scholar
  147. Mencattini, A., Martinelli, E., Ringeval, F., Schuller, B., & Di Natlae, C. (2017). Continuous estimation of emotions in speech by dynamic cooperative speaker models. In IEEE transactions on affective computing.Google Scholar
  148. Milton, A., Roy, S. S., & Selvi, S. T. (2013). Svm scheme for speech emotion recognition using mfcc feature. International Journal of Computer Applications, 69(9).Google Scholar
  149. Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech & Language, 28(3), 727–742.CrossRefGoogle Scholar
  150. Mishra, H. K., & Sekhar, C. C. (2009). Variational Gaussian mixture models for speech emotion recognition. In Seventh international conference on advances in pattern recognition, 2009. ICAPR09. (pp. 183–186). Piscataway: IEEE.CrossRefGoogle Scholar
  151. Morales-Perez, M., Echeverry-Correa, J., Orozco-Gutierrez, A., & Castellanos-Dominguez, G. (2008). Feature extraction of speech signals in emotion identification. In Engineering in medicine and biology society, 2008. EMBS 2008. 30th annual international conference of the IEEE (pp. 2590–2593). Piscataway: IEEE.Google Scholar
  152. Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech communication, 49(2), 98–112.CrossRefGoogle Scholar
  153. Navas, E., Hernáez, I., & Luengo, I. (2006). An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS. EEE Transactions on Audio, Speech and Language Processing 14, 1117–1127.CrossRefGoogle Scholar
  154. Neiberg, D., & Elenius, K. (2008). Automatic recognition of anger in spontaneous speech. In 9th annual conference of the international speech communication association.Google Scholar
  155. Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2(2), 92–105.CrossRefGoogle Scholar
  156. Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transactions on Affective Computing, 3(1), 116–125.CrossRefGoogle Scholar
  157. Pan, Y., Shen, P., & Shen, L. (2012). Speech emotion recognition using support vector machine. International Journal of Smart Home, 6(2), 101–108.Google Scholar
  158. Pao, T. L., Chien, C. S., Chen, Y. T., Yeh, J. H., Cheng, Y. M., & Liao, W. Y. (2007). Combination of multiple classifiers for improving emotion recognition in Mandarin speech. In 3rd international conference on intelligent information hiding and multimedia signal processing, 2007. IIHMSP 2007 (Vol. 1, pp. 35–38). Piscataway: IEEE.Google Scholar
  159. Pao, T. L., Wang, C. H., & Li, Y. J. (2012). A study on the search of the most discriminative speech features in the speaker dependent speech emotion recognition. In 2012 fifth international symposium on parallel architectures, algorithms and programming (PAAP) (pp. 157–162). Piscataway: IEEE.CrossRefGoogle Scholar
  160. Pathak, S., & Kulkarni, A. (2011). Recognizing emotions from speech. In 2011 3rd international conference on electronics computer technology (ICECT) (Vol. 4, pp. 107–109). Piscataway: IEEE.CrossRefGoogle Scholar
  161. Philippou-Hübner, D., Vlasenko, B., Böck, R., & Wendemuth, A. (2012). The performance of the speaking rate parameter in emotion recognition from speech. In 2012 IEEE international conference on multimedia and expo (ICME) (pp. 248–253). Piscataway: IEEE.CrossRefGoogle Scholar
  162. Picard, R. W., & Picard, R. (1997). Affective computing (252). Cambridge: MIT press.Google Scholar
  163. Pierre-Yves, O. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies, 59(1), 157–183.CrossRefGoogle Scholar
  164. Planet, S., & Iriondo, I. (2012). Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition. In 2012 7th Iberian conference on information systems and technologies (CISTI) (pp. 1–6). Piscataway: IEEE.Google Scholar
  165. Plutchik, R. (1991). The emotions. Lanham, MD: University Press of America.Google Scholar
  166. Pohjalainen, J., Fabien Ringeval, F., Zhang, Z., & Schuller, B. (2016). Spectral and cepstral audio noise reduction techniques in speech emotion recognition. In Proceedings of the 2016 ACM on multimedia conference (pp. 670–674). ACM.Google Scholar
  167. Polzehl, T., Schmitt, A., Metze, F., & Wagner, M. (2011). Anger recognition in speech using acoustic and linguistic cues. Speech Communication, 53(9), 1198–1209.CrossRefGoogle Scholar
  168. Přibil, J., & Přibilová, A. (2013). Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 8.CrossRefGoogle Scholar
  169. Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16(2), 143–160.CrossRefGoogle Scholar
  170. Rao, K. S., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S. V. S. K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3(2), 3603–3607.Google Scholar
  171. Rehmam, B., Halim, Z., Abbas, G., & Muhammad, T. (2015). Artificial neural network-based speech recognition using Dwt analysis applied on isolated words from oriental languages. Malaysian Journal of Computer Science, 28(3), 242–262.CrossRefGoogle Scholar
  172. Ringeval, F., & Chetouani, M. (2008). Exploiting a vowel based approach for acted emotion recognition. In Verbal and nonverbal features of human-human and human-machine interaction, pp. 243–254.Google Scholar
  173. Rodríguez, P. H., Hernández, J. B. A., Ballester, M. A. F., González, C. M. T., & Orozco-Arroyave, J. R. (2013). Global selection of features for nonlinear dynamics characterization of emotional speech. Cognitive Computation, 5(4), 517–525.CrossRefGoogle Scholar
  174. Rong, J., Li, G., & Chen, Y. P. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing & Management, 45(3), 315–328.CrossRefGoogle Scholar
  175. Sagha, H., Deng, J., Gavryukova, M., Han, J., & Schuller, B. (2016). Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5800–5804). Piscataway: IEEE.Google Scholar
  176. Sánchez-Gutiérrez, M. E., Albornoz, E. M., Martinez-Licona, F., Rufiner, H. L., & Goddard, J. (2014). Deep learning for emotional speech recognition. In Mexican conference on pattern recognition (pp. 311–320). Cham: Springer International Publishing.Google Scholar
  177. Scherer, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, audio, image and biomedical signal processing using neural networks, pp. 49–70.Google Scholar
  178. Scherer, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, audio, image and biomedical signal processing using neural networks (pp. 49–70). Berlin Heidelberg: Springer.CrossRefGoogle Scholar
  179. Scherer, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, audio, image and biomedical signal processing using neural networks, 49–70.Google Scholar
  180. Scherer, S., Schwenker, F., & Palm, G. (2009). Classifier fusion for emotion recognition from speech. In Advanced intelligent environments (pp. 95–117). Springer US.Google Scholar
  181. Schmitt, M., Ringeval, F., & Schuller, B. W. (2016). At the border of acoustics and linguistics: bag-of-audio-words for the recognition of emotions in speech. In INTERSPEECH (pp. 495–499).Google Scholar
  182. Schuller, B., Seppi, D., Batliner, A., Maier, A., & Steidl, S. (2007). Towards more reality in the recognition of emotional speech. In 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP’07 (Vol. 4, pp. IV–941). Piscataway: IEEE.Google Scholar
  183. Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., & Rigoll, G. (2010). Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Transactions on Affective Computing, 1(2), 119–131.CrossRefGoogle Scholar
  184. Schuller, B., Vlasenko, B., Minguez, R., Rigoll, G., & Wendemuth, A. (2007). Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In IEEE workshop on automatic speech recognition & understanding, 2007. ASRU (pp. 596–600). Piscataway: IEEE.CrossRefGoogle Scholar
  185. Schuller, B. W. (2008). Speaker, noise, and acoustic space adaptation for emotion recognition in the automotive environment. In 2008 ITG conference on voice communication (SprachKommunikation) (pp. 1–4). VDE.Google Scholar
  186. Schwenker, F., Scherer, S., Magdi, Y. M., & Palm, G. (2009). The GMM-SVM supervector approach for the recognition of the emotional status from speech. In International conference on artificial neural networks (pp. 894–903). Berlin, Heidelberg: Springer.Google Scholar
  187. Sedaaghi, M. H., Kotropoulos, C., & Ververidis, D. (2007). Using adaptive genetic algorithms to improve speech emotion recognition. In IEEE 9th workshop on multimedia signal processing, 2007. MMSP 2007. (pp. 461–464). Piscataway: IEEE.CrossRefGoogle Scholar
  188. Seehapoch, T., & Wongthanavasu, S. (2013). Speech emotion recognition using support vector machines. In 2013 5th international conference on knowledge and smart technology (KST) (pp. 86–91). Piscataway: IEEE.CrossRefGoogle Scholar
  189. Ser, W., Cen, L., & Yu, Z. L. (2008). A hybrid PNN-GMM classification scheme for speech emotion recognition. In 19th international conference on pattern recognition, 2008. ICPR 2008 (pp. 1–4). Piscataway: IEEE.Google Scholar
  190. Sethu, V., Ambikairajah, E., & Epps, J. (2007). Speaker normalisation for speech-based emotion detection. In 2007 15th international conference on digital signal processing (pp. 611–614). Piscataway: IEEE.Google Scholar
  191. Sethu, V., Ambikairajah, E., & Epps, J. (2008a). Phonetic and speaker variations in automatic emotion classification. In 9th annual conference of the international speech communication association.Google Scholar
  192. Sethu, V., Ambikairajah, E., & Epps, J. (2008b). Empirical mode decomposition based weighted frequency feature for speech-based emotion classification. In IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. (pp. 5017–5020). Piscataway: IEEE.CrossRefGoogle Scholar
  193. Sethu, V., Ambikairajah, E., & Epps, J. (2009). Speaker dependency of spectral features and speech production cues for automatic emotion classification. In IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009. (pp. 4693–4696). Piscataway: IEEE.CrossRefGoogle Scholar
  194. Sethu, V., Ambikairajah, E., & Epps, J. (2013). On the use of speech parameter contours for emotion recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 19.CrossRefGoogle Scholar
  195. Shah, F. (2009). Automatic emotion recognition from speech using artificial neural networks with gender-dependent databases. In International conference on advances in computing, control, & telecommunication technologies, 2009. ACT09. (pp. 162–164). Piscataway: IEEE.Google Scholar
  196. Shah, M., Miao, L., Chakrabarti, C., & Spanias, A. (2013). A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2553–2557). Piscataway: IEEE.CrossRefGoogle Scholar
  197. Shami, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication, 49(3), 201–212.CrossRefGoogle Scholar
  198. Shaukat, A., & Chen, K. (2011). Emotional state recognition from speech via soft-competition on different acoustic representations. In The 2011 international joint conference on neural networks (IJCNN) (pp. 1910–1917). Piscataway: IEEE.CrossRefGoogle Scholar
  199. Shaw, A., Vardhan, R. K., & Saxena, S. (2016). Emotion recognition and classification in speech using Artificial neural networks. International Journal of Computer Applications, 145(8).Google Scholar
  200. Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.CrossRefGoogle Scholar
  201. Sheikhan, M., Gharavian, D., & Ashoftedel, F. (2012). Using DTW neural–based MFCC warping to improve emotional speech recognition. Neural Computing and Applications, 21(7), 1765–1773.CrossRefGoogle Scholar
  202. Shen, P., Changjun, Z., & Chen, X. (2011). Automatic speech emotion recognition using support vector machine. In 2011 international conference on electronic and mechanical engineering and information technology (EMEIT) (Vol. 2, pp. 621–625). Piscataway: IEEE.CrossRefGoogle Scholar
  203. Sidorov, M., Ultes, S., & Schmitt, A. (2014). Emotions are a personal thing: Towards speaker-adaptive emotion recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4803–4807). Piscataway: IEEE.CrossRefGoogle Scholar
  204. Soltani, K., & Ainon, R. N. (2007). Speech emotion detection based on neural networks. In 9th international symposium on signal processing and its applications, 2007. ISSPA 2007. (pp. 1–3). Piscataway: IEEE.Google Scholar
  205. Song, P., Ou, S., Zheng, W., Jin, Y., & Zhao, L. (2016). Speech emotion recognition using transfer non-negative matrix factorization. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5180–5184). Piscataway: IEEE.CrossRefGoogle Scholar
  206. Song, P., Zheng, W., Ou, S., Zhang, X., Jin, Y., Liu, J., & Yu, Y. (2016). Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization. Speech Communication, 83, 34–41.CrossRefGoogle Scholar
  207. Steidl, S., Batliner, A., Nöth, E., & Hornegger, J. (2008). Quantification of segmentation and F0 errors and their effect on emotion recognition. In Text, speech and dialogue (pp. 525–534). Berlin/Heidelberg: Springer.CrossRefGoogle Scholar
  208. Sun, Y., & Wen, G. (2015). Emotion recognition using semi-supervised feature selection with speaker normalization. International Journal of Speech Technology, 18(3), 317–331.CrossRefGoogle Scholar
  209. Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical Signal Processing and Control, 18, 80–90.CrossRefGoogle Scholar
  210. Sun, Y., Zhou, Y., Zhao, Q., & Yan, Y. (2009). Acoustic feature optimization for emotion affected speech recognition. In International conference on information engineering and computer science, 2009. ICIECS 2009. (pp. 1–4). Piscataway: IEEE.Google Scholar
  211. Swain, M., Sahoo, S., Routray, A., Kabisatpathy, P., & Kundu, J. N. (2015). Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition. International Journal of Speech Technology, 18(3), 387–393.CrossRefGoogle Scholar
  212. Sztahó, D., Imre, V., & Vicsi, K. (2011). Automatic classification of emotions in spontaneous speech. Analysis of verbal and nonverbal communication and enactment. The Processing Issues, pp. 229–239.Google Scholar
  213. Tabatabaei, T. S., Krishnan, S., & Guergachi, A. (2007). Emotion recognition using novel speech signal features. In IEEE international symposium on circuits and systems, 2007. ISCAS 2007 (pp. 345–348). Piscataway: IEEE.CrossRefGoogle Scholar
  214. Tahon, M., & Devillers, L. (2015). Towards a small set of robust acoustic features for emotion recognition: IEEE/ACM transactions on challenges audio, speech, and language processing, 24(1), 16–28.Google Scholar
  215. Tamulevicius, G., & Liogiene, T. (2015). Low-order multi-level features for speech emotions recognition. Baltic Journal of Modern Computing, 3(4), 234–247.Google Scholar
  216. Tarasov, A., & Delany, S. J. (2011). Benchmarking classification models for emotion recognition in natural speech: A multi-corporal study. In 2011 IEEE international conference on automatic face & gesture recognition and workshops (FG 2011) (pp. 841–846). Piscataway: IEEE.Google Scholar
  217. Ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication, 40(1), 213–225.zbMATHCrossRefGoogle Scholar
  218. Thapliyal, N., & Amoli, G. (2012). Speech based emotion recognition with gaussian mixture model. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 1(5), 65.Google Scholar
  219. Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M. A., Schuller, B., & Zafeiriou, S. (2016). Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 5200–5204). Piscataway: IEEE.CrossRefGoogle Scholar
  220. Truong, K., & Van Leeuwen, D. (2007). An ‘open-set’detection evaluation methodology for automatic emotion recognition in speech. In Workshop on paralinguistic speech-between models and data (pp. 5–10).Google Scholar
  221. Tseng, M., Hu, Y., Han, W. W., & Bergen, B. (2005). “Searching for happiness” or” Full of Joy”? Source domain activation matters. In annual meeting of the Berkeley linguistics society (Vol. 31, No. 1, pp. 359–370).Google Scholar
  222. Utane, A. S., & Nalbalwar, S. L. (2013). Emotion recognition through speech using gaussian mixture model and support vector machine. Emotion, 2, 8.Google Scholar
  223. Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.CrossRefGoogle Scholar
  224. Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., & Wendemuth, A. (2011a). Vowels formants analysis allows straightforward detection of high arousal emotions. In 2011 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). Piscataway: IEEE.Google Scholar
  225. Vlasenko, B., Prylipko, D., Philippou-Hübner, D., & Wendemuth, A. (2011b). Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In 12th annual conference of the international speech communication association.Google Scholar
  226. Vlasenko, B., Schuller, B., Wendemut, A., & Rigoll, G. (2007) Frame vs Turn-level: emotion recognition from speech considering static and dynamic processing. In Proceedings 2nd international conference on affective computing and intelligent interaction, pp 139–147.Google Scholar
  227. Vogt, T., & André, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceeding language resources and evaluation conference (LREC 2006), Genoa.Google Scholar
  228. Vogt, T., & André, E. (2009). Exploring the benefits of discretization of acoustic features for speech emotion recognition. In 10th annual conference of the international speech communication association.Google Scholar
  229. Vogt, T., & André, E. (2011). An evaluation of emotion units and feature types for real-time speech emotion recognition. KI-Künstliche Intelligenz, 25(3), 213–223.CrossRefGoogle Scholar
  230. Vondra, M., & Vích, R. (2009). Evaluation of speech emotion classification based on GMM and data fusion. In Cross-modal analysis of speech, gestures, gaze and facial expressions, pp. 98–105.Google Scholar
  231. Wagner, J., Vogt, T., & André, E. (2007). A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In international conference on affective computing and intelligent interaction (pp. 114–125). Springer, Berlin, Heidelberg.Google Scholar
  232. Wang, F., Verhelst, W., & Sahli, H. (2011). Relevance vector machine based speech emotion recognition. In Affective computing and intelligent interaction, pp. 111–120.Google Scholar
  233. Weninger, F., Ringeval, F., Marchi, E., & Schuller, B. W. (2016). Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio. In IJCAI (pp. 2196–2202).Google Scholar
  234. Wenjing, H., Haifeng, L., & Chunyu, G. (2009). A hybrid speech emotion perception method of VQ-based feature processing and ANN recognition. In WRI global congress on intelligent systems, 2009. GCIS’09. (Vol. 2, pp. 145–149). Piscataway: IEEE.CrossRefGoogle Scholar
  235. Womack, B. D., & Hansen, J. H. (1999). N-channel hidden Markov models for combined stressed speech classification and recognition. IEEE Transactions on Speech and Audio Processing, 7(6), 668–677.CrossRefGoogle Scholar
  236. Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction Affective Computing, 2, 10–21.CrossRefGoogle Scholar
  237. Wu, S., Falk, T. H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In 2009 16th international conference on digital signal processing (pp. 1–6). Piscataway: IEEE.Google Scholar
  238. Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785.CrossRefGoogle Scholar
  239. Wu, T., Yang, Y., Wu, Z., & Li, D. (2006). MASC: A speech corpus in mandarin for emotion analysis and affective speaker recognition. In Speaker and language recognition workshop, 2006. IEEE Odyssey 2006 (pp. 1–5). Piscataway: IEEE.Google Scholar
  240. Xiao, Z., Dellandréa, E., Chen, L., & Dou, W. (2009). Recognition of emotions in speech by a hierarchical approach. In 3rd international conference on affective computing and intelligent interaction and workshops, 2009. ACII 2009. (pp. 1–8). Piscataway: IEEE.Google Scholar
  241. Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2006). Two-stage classification of emotional speech. In international conference on digital telecommunications, 2006. ICDT06. (pp. 32–32). Piscataway: IEEE.Google Scholar
  242. Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2007, December). Automatic hierarchical classification of emotional speech. In 9th IEEE international symposium on multimedia workshops, 2007. ISMW07. (pp. 291–296). Piscataway: IEEE.Google Scholar
  243. Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2007). Hierarchical classification of emotional speech. IEEE Transactions on Multimedia, 37.Google Scholar
  244. Yang, B., & Lugger, M. (2010). Emotion recognition from speech signals using new harmony features. Signal Processing, 90(5), 1415–1423.zbMATHCrossRefGoogle Scholar
  245. Yang, N., Muraleedharan, R., Kohl, J., Demirkol, I., Heinzelman, W., & Sturge-Apple, M. (2012). Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion. In Spoken Language Technology Workshop (SLT), 2012 IEEE (pp. 455–460). Piscataway: IEEE.CrossRefGoogle Scholar
  246. Ye, C., Liu, J., Chen, C., Song, M., & Bu, J. (2008). Speech emotion classification on a Riemannian manifold. In Advances in multimedia information processing-PCM 2008, pp. 61–69.Google Scholar
  247. Yeh, J. H., Pao, T. L., Lin, C. Y., Tsai, Y. W., & Chen, Y. T. (2011). Segment-based emotion recognition from continuous Mandarin Chinese speech. Computers in Human Behavior, 27(5), 1545–1552.CrossRefGoogle Scholar
  248. You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2006). A hierarchical framework for speech emotion recognition. In 2006 IEEE international symposium on industrial electronics (Vol. 1, pp. 515–519). Piscataway: IEEE.CrossRefGoogle Scholar
  249. You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2006). Emotional speech analysis on nonlinear manifold. In 18th international conference on pattern recognition, 2006. ICPR 2006. (Vol. 3, pp. 91–94). Piscataway: IEEE.Google Scholar
  250. Yun, S., & Yoo, C. D. (2012). Loss-scaled large-margin Gaussian mixture models for speech emotion classification. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 585–598.CrossRefGoogle Scholar
  251. Yüncü, E., Hacihabiboglu, H., & Bozsahin, C. (2014). Automatic speech emotion recognition using auditory models with binary decision tree and svm. In 2014 22nd international conference on pattern recognition (ICPR) (pp. 773–778). Piscataway: IEEE.Google Scholar
  252. Zbancioc, M., & Feraru, S. M. (2012). Emotion recognition of the SROL Romanian database using fuzzy KNN algorithm. In 10th international symposium on electronics and telecommunications (ISETC), 2012 (pp. 347–350). Piscataway: IEEE.Google Scholar
  253. Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 39–58.CrossRefGoogle Scholar
  254. Zha, C., Yang, P., Zhang, X., & Zhao, L. (2016). Spontaneous speech emotion recognition via multiple kernel learning. In 2016 eighth international conference on measuring technology and mechatronics automation (ICMTMA) (pp. 621–623). Piscataway: IEEE.Google Scholar
  255. Zhang, S., Lei, B., Chen, A., Chen, C., & Chen, Y. (2010). Spoken emotion recognition using local fisher discriminant analysis. In 10th international conference on signal processing (ICSP), 2010 IEEE (pp. 538–540). Piscataway: IEEE.Google Scholar
  256. Zhang, S., & Zhao, Z. (2008). Feature selection filtering methods for emotion recognition in Chinese speech signal. In 9th international conference on signal processing, 2008. ICSP 2008. (pp. 1699–1702). Piscataway: IEEE.Google Scholar
  257. Zheng, W. Q., Yu, J. S., & Zou, Y. X. (2015). An experimental study of speech emotion recognition based on deep convolutional neural networks. In 2015 international conference on affective computing and intelligent interaction (ACII) (pp. 827–831). Piscataway: IEEE.Google Scholar
  258. Zhou, J., Wang, G., Yang, Y., & Chen, P. (2006). Speech emotion recognition based on rough set and SVM. In 5th IEEE international conference on cognitive informatics, 2006. ICCI 2006. (Vol. 1, pp. 53–61). Piscataway: IEEE.Google Scholar
  259. Zhou, Y., Sun, Y., Yang, L., & Yan, Y. (2009). Applying articulatory features to speech emotion recognition. In international conference on research challenges in computer science, 2009. ICRCCS09. (pp. 73–76). Piscataway: IEEE.Google Scholar
  260. Zhu, L., Chen, L., Zhao, D., Zhou, J., & Zhang, W. (2017). Emotion recognition from chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17(7), 1694.CrossRefGoogle Scholar
  261. Zong, Y., Zheng, W., Zhang, T., & Huang, X. (2016). Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Processing Letters, 23(5), 585–589.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Software Engineering, Faculty of Computer Science and Information TechnologyUniversity of MalayaKuala LumpurMalaysia
  2. 2.Faculty of Business Finance and HospitalityMahsa UniversitySelangorMalaysia
  3. 3.Department of Operation and Management Information System, Faculty of Business and AccountancyUniversity MalayaKuala LumpurMalaysia
  4. 4.Department of English Language, Faculty of Languages and LinguisticsUniversity of MalayaKuala LumpurMalaysia
  5. 5.Department of Software Engineering, Faculty of Computer Science and Information TechnologyUniversity of MalayaKuala LumpurMalaysia

Personalised recommendations