Skip to main content
Log in

Speech emotion recognition research: an analysis of research focus

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This article analyses research in speech emotion recognition (“SER”) from 2006 to 2017 in order to identify the current focus of research, and areas in which research is lacking. The objective is to examine what is being done in this field of research. Searching on selected keywords, we extracted and analysed 260 articles from well-known online databases. The analysis indicates that SER research is an active field of research, dozens of articles being published each year in journals and conference proceedings. The majority of articles concentrate on three critical aspects of SER, namely (1) databases, (2) suitable speech features, and (3) classification techniques to maximize the recognition accuracy of SER systems. Having carried out association analysis of the critical aspects and how they influence the performance of the SER system in term of recognition accuracy, we found that certain combination of databases, speech features and classifiers influence the recognition accuracy of the SER system. We have also suggested aspects of SER that could be taken into consideration in future works based on our review.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://www.cs.waikato.ac.nz/ml/weka/.

References

  • Abdelwahab, M., & Busso, C. (2017). Incremental adaptation using active learning for acoustic emotion Recognition. In International conference on acoustics, speech and signal processing.

  • Alam, M. J., Attabi, Y., Dumouchel, P., Kenny, P., & O’Shaughnessy, D. D. (2013). Amplitude modulation features for emotion recognition from speech. In INTERSPEECH (pp. 2420–2424).

  • Albornoz, E. M., Crolla, M. B., & Milone, D. H. (2008). Recognition of emotions in speech. In Proceedings of XXXIV CLEI, Santa Fe Argentina, pp. 1120–1129.

  • Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, 25(3), 556–570.

    Article  Google Scholar 

  • Altun, H., & Polat, G. (2009). Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Systems with Applications, 36(4), 8197–8203.

    Article  Google Scholar 

  • Álvarez, A., Cearreta, I., López, J. M., Arruti, A., Lazkano, E., Sierra, B., & Garay, N. (2007). A comparison using different speech parameters in the automatic emotion recognition using Feature Subset Selection based on Evolutionary Algorithms. In International conference on text, speech and dialogue (pp. 423–430). Berlin: Springer.

    Chapter  Google Scholar 

  • Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2012). Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155–177.

    Article  Google Scholar 

  • Ananthakrishnan, S., Vembu, A. N., & Prasad, R. (2011). Model-based parametric features for emotion recognition from speech. In 2011 IEEE workshop on automatic speech recognition and understanding (ASRU), (pp. 529–534). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Arias, J. P., Busso, C., & Yoma, N. B. (2013). Energy and F0 contour modeling with functional data analysis for emotional speech detection. In INTERSPEECH (pp. 2871–2875).

  • Arias, J. P., Busso, C., & Yoma, N. B. (2014). Shape-based modeling of the fundamental frequency contour for emotion detection in speech. Computer Speech & Language, 28(1), 278–294.

    Article  Google Scholar 

  • Atassi, H., & Esposito, A. (2008). A speaker independent approach to the classification of emotional vocal expressions. In 20th IEEE international conference on tools with artificial intelligence, 2008. ICTAI08. (Vol. 2, pp. 147–152). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Atassi, H., Smekal, Z., & Esposito, A. (2012). Emotion recognition from spontaneous Slavic speech. In 2012 IEEE 3rd international conference on cognitive infocommunications (CogInfoCom) (pp. 389–394). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Athanaselis, T., Bakamidis, S., Dologlou, I., Cowie, R., Douglas-Cowie, E., & Cox, C. (2005). ASR for emotional speech: Clarifying the issues and enhancing performance. Neural Networks, 18(4), 437–444.

    Article  Google Scholar 

  • Attabi, Y., & Dumouchel, P. (2012). Emotion recognition from speech: WOC-NN and class-interaction. In 2012 11th international conference on information science, signal processing and their applications (ISSPA) (pp. 126–131). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Attabi, Y., & Dumouchel, P. (2013). Anchor models for emotion recognition from speech. IEEE Transactions on Affective Computing, 4(3), 280–290.

    Article  Google Scholar 

  • Bahreini, K., Nadolski, R., & Westera, W. (2016). Towards real-time speech emotion recognition for affective e-learning. Education and Information Technologies, 21(5), 1367–1386.

    Article  Google Scholar 

  • Balti, H., & Elmaghraby, A. S. (2013). Speech emotion detection using time dependent self organizing maps. In 2013 IEEE international symposium on signal processing and information technology (ISSPIT) (pp. 000470–000478). Piscataway: IEEE.

    Google Scholar 

  • Barra Chicote, R., Fernández Martínez, F., Lutfi, L., Binti, S., Lucas Cuesta, J. M., Macías Guarasa, J., … Pardo Muñoz, J. M. (2009). Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions. ISCA.

  • Batliner, A., Schuller, B., Seppi, D., Steidl, S., Devillers, L., Vidrascu, L., & Amir, N. (2011). The automatic recognition of emotions in speech. In Emotion-oriented systems (pp. 71–99). Berlin Heidelberg: Springer.

    Chapter  Google Scholar 

  • Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., … Aharonson, V. (2006). Combining efforts for improving automatic classification of emotional user states. Proc. IS-LTC 240–245.

  • Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Devillers, L., … Aharonson, V. (2007). The impact of F0 extraction errors on the classification of prominence and emotion. Proc. ICPhS 2201–2204.

  • Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., & Rose, R. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10), 763–786.

    Article  Google Scholar 

  • Bertero, D., & Fung, P. (2017). A first look into a Convolutional Neural Network for speech emotion detection. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5115–5119). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Bhaykar, M., Yadav, J., & Rao, K. S. (2013). Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM. 2013 national conference on communications (NCC) (pp. 1–5). Piscataway: IEEE.

    Google Scholar 

  • Bitouk, D., Nenkova, A., & Verma, R. (2009). Improving emotion recognition using class-level spectral features. In INTERSPEECH (pp. 2023–2026).

  • Böck, R., Hübner, D., & Wendemuth, A. (2010). Determining optimal signal features and parameters for hmm-based emotion classification. In 15th IEEE mediterranean electrotechnical conference MELECON 2010–2010 (pp. 1586–1590). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Bojanić, M., Crnojević, V., & Delić, V. (2012). Application of neural networks in emotional speech recognition. In 2012 11th symposium on neural network applications in electrical engineering (NEUREL) (pp. 223–226). Piscataway: IEEE.

    Google Scholar 

  • Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2010). Use of line spectral frequencies for emotion recognition from speech. In 2010 20th international conference on pattern recognition (ICPR) (pp. 3708–3711). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication, 53(9), 1186–1197.

    Article  Google Scholar 

  • Bozkurt, E., Erzin, E., Eroğlu Erdem, Ç, & Erdem, T. (2009). Improving automatic emotion recognition from speech signals. In 10th annual conference of the international speech communication association 2009 (INTERSPEECH 2009). International Speech Communications Association.

  • Brester, C., Semenkin, E., & Sidorov, M. (2016). Multi-objective heuristic feature selection for speech-based multilingual emotion recognition. Journal of Artificial Intelligence and Soft Computing Research, 6(4), 243–253.

    Article  Google Scholar 

  • Brooks, C. A., Thompson, C., & Kovanović, V. (2016). Introduction to data mining for educational researchers. In Proceedings of the 6th international conference on learning analytics & knowledge (pp. 505–506). ACM.

  • Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 582–596.

    Article  Google Scholar 

  • Busso, C., Mariooryad, S., Metallinou, A., & Narayanan, S. (2013). Iterative feature normalization scheme for automatic emotion detection from speech. IEEE Transactions on Affective Computing, 4(4), 386–397.

    Article  Google Scholar 

  • Busso, C., Metallinou, A., & Narayanan, S. S. (2011). Iterative feature normalization for emotional speech detection. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5692–5695). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Calvo, R. A., & D’Mello, S. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing, 1(1), 18–37.

    Article  Google Scholar 

  • Casale, S., Russo, A., Scebba, G., & Serrano, S. (2008). Speech emotion classification using machine learning algorithms. In 2008 IEEE international conference on semantic computing (pp. 158–165). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Casale, S., Russo, A., & Serrano, S. (2010). Analysis of robustness of attributes selection applied to speech emotion recognition. In 2010 18th European signal processing conference (pp. 1174–1178). Piscataway: IEEE.

    Google Scholar 

  • Chakraborty, R., Pandharipande, M., & Kopparapu, S. K. (2016). Knowledge-based framework for intelligent emotion recognition in spontaneous speech. Procedia Computer Science, 96, 587–596.

    Article  Google Scholar 

  • Chandaka, S., Chatterjee, A., & Munshi, S. (2009). Support vector machines employing cross-correlation for emotional speech recognition. Measurement, 42(4), 611–618.

    Article  Google Scholar 

  • Chandrakala, S., & Sekhar, C. C. (2009). Combination of generative models and SVM based classifier for speech emotion recognition. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 497–502). Piscataway: IEEE.

    Google Scholar 

  • Chavhan, Y., Dhore, M. L., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.

    Article  Google Scholar 

  • Chavhan, Y. D., Yelure, B. S., & Tayade, K. N. (2015). Speech emotion recognition using RBF kernel of LIBSVM. In 2015 2nd international conference on electronics and communication systems (ICECS) (pp. 1132–1135). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Chen, L., Mao, X., Wei, P., Xue, Y., & Ishizuka, M. (2012). Mandarin emotion recognition combining acoustic and emotional point information. Applied Intelligence, 37(4), 602–612.

    Article  Google Scholar 

  • Chenchah, F., & Lachiri, Z. (2014). Speech emotion recognition in acted and spontaneous context. Procedia Computer Science, 39, 139–145.

    Article  Google Scholar 

  • Cheng, X., & Duan, Q. (2012). Speech emotion recognition using gaussian mixture model. In The 2nd international conference on computer application and system modeling.

  • Chiou, B. C., & Chen, C. P. (2013). Feature space dimension reduction in speech emotion recognition using support vector machine. In Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific (pp. 1–6). Piscataway: IEEE.

    Google Scholar 

  • Christina, I. J., & Milton, A. (2012). Analysis of all pole model to recognize emotions from speech signal. In 2012 international conference on computing, electronics and electrical technologies (ICCEET) (pp. 723–728). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schroder, M. (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In Proceedings of ISCA speech and emotion workshop, pp 19–24.

  • Cummins, N., Amiriparian, S., Hagerer, G., Batliner, A., Steidl, S., & Schuller, B. (2017). An Image-based deep spectrum feature representation for the recognition of emotional speech. In Proceedings of the 25th ACM international conference on multimedia, MM. Piscataway: IEEE.

    Google Scholar 

  • D’Mello, S., & Kory, J. (2012). Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies. In Proceedings of the 14th ACM international conference on multimodal interaction (pp. 31–38). ACM.

  • Dai, K., Fell, H. J., & MacAuslan, J. (2008). Recognizing emotion in speech using neural networks. Telehealth and Assistive Technologies, 31, 38–43.

    Google Scholar 

  • Delic, V., Bojanic, M., Gnjatovic, M., Secujski, M., & Jovicic, S. T. (2012). Discrimination capability of prosodic and spectral features for emotional speech recognition. Elektronika ir Elektrotechnika, 18(9), 51–54.

    Article  Google Scholar 

  • Deng, J., Han, W., & Schuller, B. (2012). Confidence measures for speech emotion recognition: A start. In Proceedings of speech communication; 10. ITG symposium (pp. 1–4). VDE.

  • Deng, J., Xu, X., Zhang, Z., Frühholz, S., Grandjean, D., & Schuller, B. (2017). Fisher kernels on phase-based features for speech emotion recognition. In Dialogues with social robots (pp. 195–203). Springer: Singapore.

    Chapter  Google Scholar 

  • Deng, J., Zhang, Z., Eyben, F., & Schuller, B. (2014). Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Processing Letters, 21(9), 1068–1072.

    Article  Google Scholar 

  • Deng, J., Zhang, Z., Marchi, E., & Schuller, B. (2013). Sparse autoencoder-based feature transfer learning for speech emotion recognition. In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. 511–516). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Devillers, L., & Vidrascu, L. (2006). Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In 9th international conference on spoken language processing.

  • Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: towards a new generation of databases. Speech Communication, 40, 33–60.

    Article  MATH  Google Scholar 

  • Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., McRorie, M., Martin, J. C., Devillers, L., Abrilan, S., Batliner, A., Amir, N., & Karpouzis, K. (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of international conference affective computing and intelligent interaction, pp 488–500.

  • Ekman, P. (1957). A methodological discussion of non-verbal behavior. Journal of Psychology, 43, 141–149.

    Article  Google Scholar 

  • Ekman, P. (1972). Universals and cultural differences in facial expression of emotion. In J. Cole (Ed.), Nebraska symposium on motivation (pp. 207–283). Lincoln, NE: University of Nebraska Press.

    Google Scholar 

  • Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion. Chichester: Wiley.

    Google Scholar 

  • El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.

    Article  MATH  Google Scholar 

  • Elbarougy, R., & Akagi, M. (2012). Speech emotion recognition system based on a dimensional approach using a three-layered model. In Signal & information processing association annual summit and conference (APSIPA ASC), 2012 Asia-Pacific (pp. 1–9). Piscataway: IEEE.

    Google Scholar 

  • Elbarougy, R., & Akagi, M. (2013). Cross-lingual speech emotion recognition system based on a three-layer model for human perception. In Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific (pp. 1–10). Piscataway: IEEE.

    Google Scholar 

  • Erdem, C. E., Bozkurt, E., Erzin, E., & Erdem, A. T. (2010). RANSAC-based training data selection for emotion recognition from spontaneous speech. In Proceedings of the 3rd international workshop on affective interaction in natural environments (pp. 9–14). ACM.

  • Esmaileyan, Z., & Marvi, H. (2014). Recognition of emotion in speech using variogram based features. Malaysian Journal of Computer Science, 27(3), 156–170.

    Google Scholar 

  • Espinosa, H. P., García, C. A. R., & Pineda, L. V. (2010). Features selection for primitives estimation on emotional speech. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 5138–5141). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Fayek, H. M., Lech, M., & Cavedon, L. (2016). On the correlation and transferability of features between automatic speech recognition and speech emotion recognition. In INTERSPEECH (pp. 3618–3622).

  • Feraru, M., & Zbancioc, M. (2013). Speech emotion recognition for SROL database using weighted KNN algorithm. In 2013 international conference on electronics, computers and artificial intelligence (ECAI) (pp. 1–4). Piscataway: IEEE.

    Google Scholar 

  • Fernandez, R., & Picard, R. (2011). Recognizing affect from speech prosody using hierarchical graphical models. Speech Communication, 53(9), 1088–1103.

    Article  Google Scholar 

  • Firoz Shah, A., Vimal, K. V. R., Raji, S. A., Jayakumar, A., & Babu, A. P. (2009) Speaker independent automatic emotion recognition from speech: a comparison of MFCCs and discrete wavelet transforms. In Proceedings of international conference on advances in recent technologies in communication and computing, pp 528–531.

  • Fu, L., Mao, X., & Chen, L. (2008a). Relative speech emotion recognition based artificial neural network. In Pacific-Asia workshop on computational intelligence and industrial application, 2008. PACIIA’08. (Vol. 2, pp. 140–144). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Fu, L., Mao, X., & Chen, L. (2008b). Speaker independent emotion recognition based on SVM/HMMs fusion system. In International conference on audio, language and image processing, 2008. ICALIP 2008 (pp. 61–65). Piscataway: IEEE.

    Google Scholar 

  • Gamage, K. W., Sethu, V., & Ambikairajah, E. (2017). Salience based lexical features for emotion recognition. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5830–5834). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Garg, V., Kumar, H., & Sinha, R. (2013). Speech based emotion recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers. In 2013 national conference on communications (NCC) (pp. 1–5). Piscataway: IEEE.

    Google Scholar 

  • Gaurav, M. (2008). Performance analysis of spectral and prosodic features and their fusion for emotion recognition in speech. In Spoken language technology workshop, 2008. SLT 2008 (pp. 313–316). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Georgogiannis, A., & Digalakis, V. (2012). Speech emotion recognition using non-linear teager energy based features in noisy environments. In 2012 proceedings of the 20th European signal processing conference (EUSIPCO) (pp. 2045–2049). Piscataway: IEEE.

    Google Scholar 

  • Gharavian, D., Sheikhan, M., & Ashoftedel, F. (2013). Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model. Neural Computing and Applications, 22(6), 1181–1191.

    Article  Google Scholar 

  • Gharavian, D., Sheikhan, M., & Janipour, M. (2010). Pitch in emotional speech and emotional speech recognition using pitch frequency. Majlesi Journal of Electrical Engineering, 4(1).

  • Gharavian, D., Sheikhan, M., Nazerieh, A., & Garoucy, S. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21(8), 2115–2126.

    Article  Google Scholar 

  • Gharsellaoui, S., Selouani, S. A., & Dahmane, A. O. (2015). Automatic emotion recognition using auditory and prosodic indicative features. In 2015 IEEE 28th Canadian conference on electrical and computer engineering (CCECE) (pp. 1265–1270). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Giannoulis, P., & Potamianos, G. (2012). A hierarchical approach with feature selection for emotion recognition from speech. In LREC (pp. 1203–1206).

  • Glüge, S., Böck, R., & Wendemuth, A. (2011). Segmented-memory recurrent neural networks versus hidden markov models in emotion recognition from speech. In IJCCI (NCTA) (pp. 308–315).

  • Grimm, M., Kroschel, K., Mower, E., & Narayanan, S. (2007a). Primitives-based evaluation and estimation of emotions in speech. Speech Communication, 49(10), 787–800.

    Article  Google Scholar 

  • Grimm, M., Kroschel, K., & Narayanan, S. (2007b). Support vector regression for automatic recognition of spontaneous emotions in speech. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–1085). Piscataway: IEEE.

    Google Scholar 

  • Hamidi, M., & Mansoorizade, M. (2012). Emotion recognition from persian speech with neural network. International Journal of Artificial Intelligence & Applications, 3(5), 107.

    Article  Google Scholar 

  • Han, J., Zhang, Z., Ringeval, F., & Schuller, B. (2017). Prediction-based learning for continuous emotion recognition in speech. In 42nd IEEE international conference on acoustics, speech, and signal processing, ICASSP 2017.

  • Han, K., Yu, D., & Tashev, I. (2014). Speech emotion recognition using deep neural network and extreme learning machine. In 15th annual conference of the international speech communication association.

  • Harimi, A., Fakhr, H. S., & Bakhshi, A. (2016). Recognition of emotion using reconstructed phase space of speech. Malaysian Journal of Computer Science, 29(4), 262–271.

    Article  Google Scholar 

  • Hassan, A., & Damper, R. I. (2009). Emotion recognition from speech using extended feature selection and a simple classifier. In 10th annual conference of the international speech communication association.

  • He, L., Lech, M., Maddage, N. C., & Allen, N. B. (2011). Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control, 6(2), 139–146.

    Article  Google Scholar 

  • Henríquez, P., Alonso, J. B., Ferrer, M. A., Travieso, C. M., & Orozco-Arroyave, J. R. (2014). Nonlinear dynamics characterization of emotional speech. Neurocomputing, 132, 126–135.

    Article  Google Scholar 

  • Hu, H., Xu, M. X., & Wu, W. (2007). Fusion of global statistical and segmental spectral features for speech emotion recognition. In INTERSPEECH (pp. 2269–2272).

  • Hu, H., Xu, M. X., & Wu, W. (2007). GMM supervector based SVM with spectral features for speech emotion recognition. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–413). Piscataway: IEEE.

    Google Scholar 

  • Huang, R., & Ma, C. (2006). Toward a speaker-independent real-time affect detection system. In 18th international conference on pattern recognition, 2006. ICPR 2006. (Vol. 1, pp. 1204–1207). Piscataway: IEEE.

    Google Scholar 

  • Huang, Y., Wu, A., Zhang, G., & Li, Y. (2016). Speech emotion recognition based on deep belief networks and wavelet packet cepstral coefficients. International Journal of Simulation: Systems, Science and Technology, 17(28), 28–31.

    Google Scholar 

  • Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2014). Speech emotion recognition using CNN. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 801–804). ACM.

  • Hussain, L., Shafi, I., Saeed, S., Abbas, A., Awan, I. A., Nadeem, S. A., … Rahman, B. (2017). A radial base neural network approach for emotion recognition in human speech. IJCSNS, 17(8), 52.

    Google Scholar 

  • Iliev, A. I., & Scordilis, M. S. (2008). Emotion recognition in speech using inter-sentence Glottal statistics. In 15th international conference on systems, signals and image processing, 2008. IWSSIP 2008. (pp. 465–468). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falcão, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460.

    Article  Google Scholar 

  • Iliou, T., & Anagnostopoulos, C. N. (2009). Comparison of different classifiers for emotion recognition. In 13th panhellenic conference on informatics, 2009. PCI’09. (pp. 102–106). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Iliou, T., & Anagnostopoulos, C. N. (2010a). SVM-MLP-PNN classifiers on speech emotion recognition field—A comparative study. In 2010 fifth international conference on digital telecommunications (ICDT) (pp. 1–6). Piscataway: IEEE.

    Google Scholar 

  • Iliou, T., & Anagnostopoulos, C. N. (2010b). Classification on speech emotion recognition-a comparative study. Animation, 4, 5.

    Google Scholar 

  • Iriondo, I., Planet, S., Alías, F., Socoró, J. C., & Martínez, E. (2007). Validation of an expressive speech corpus by mapping automatic classification to subjective evaluation. Computational and Ambient Intelligence, 646–653.

  • Ivanov, A., & Riccardi, G. (2012). Kolmogorov-Smirnov test for feature selection in emotion recognition from speech. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5125–5128). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Javidi, M. M., & Roshan, E. F. (2013). Speech emotion recognition by using combinations of C5. 0, neural network (NN), and support vector machines (SVM) classification methods. International Journal of Applied Mathematics and Computer Science, 6, 191–200.

    Google Scholar 

  • Jeon, J. H., Xia, R., & Liu, Y. (2011). Sentence level emotion recognition based on decisions from subsentence segments. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4940–4943). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Jiang, J., Wu, Z., Xu, M., Jia, J., & Cai, L. (2012). Comparison of adaptation methods for GMM-SVM based speech emotion recognition. In 2012 IEEE spoken language technology workshop (SLT) (pp. 269–273). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Jin, Q., Li, C., Chen, S., & Wu, H. (2015). Speech emotion recognition with acoustic and lexical features. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4749–4753). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Kamińska, D., & Pelikant, A. (2012). Recognition of human emotion from a speech signal based on Plutchik’s model. International Journal of Electronics and Telecommunications, 58(2), 165–170.

    Article  Google Scholar 

  • Kandali, A. B., Routray, A., & Basu, T. K. (2008). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In TENCON 2008–2008 IEEE region 10 conference (pp. 1–5). Piscataway: IEEE.

    Google Scholar 

  • Khan, M., Goskula, T., Nasiruddin, M., & Quazi, R. (2011). Comparison between k-nn and svm method for speech emotion recognition. International Journal on Computer Science and Engineering, 3(2), 607–611.

    Google Scholar 

  • Khanna, P., & Kumar, M. S. (2011). Application of vector quantization in emotion recognition from human speech. In International conference on information intelligence, systems, technology and management (pp. 118–125). Berlin, Heidelberg: Springer.

    Chapter  Google Scholar 

  • Kim, E. H., Hyun, K. H., Kim, S. H., & Kwak, Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics, 14(3), 317–325.

    Article  Google Scholar 

  • Kim, E. H., Hyun, K. H., & Kwak, Y. K. (2006). Improvement of emotion recognition from voice by separating of obstruents. In The 15th IEEE international symposium on robot and human interactive communication, 2006. ROMAN 2006. (pp. 564–568). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Kim, J. B., Park, J. S., & Oh, Y. H. (2011). On-line speaker adaptation based emotion recognition using incremental emotional information. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4948–4951). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Kim, J. B., Park, J. S., & Oh, Y. H. (2012). Speaker-characterized emotion recognition using online and iterative speaker adaptation. Cognitive Computation, 4(4), 398–408.

    Article  Google Scholar 

  • Kim, S., Georgiou, P. G., Lee, S., & Narayanan, S. (2007). Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In IEEE 9th workshop on multimedia signal processing, 2007. MMSP 2007 (pp. 48–51).

  • Kishore, K. K., & Satish, P. K. (2013). Emotion recognition in speech using MFCC and wavelet features. In 2013 IEEE 3rd international advance computing conference (IACC) (pp. 842–847). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, Keele University 33.

  • Koolagudi, S. G., & Krothapalli, R. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14(1), 35–48.

    Article  Google Scholar 

  • Koolagudi, S. G., & Krothapalli, S. R. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(4), 495–511.

    Article  Google Scholar 

  • Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech using source, system, and prosodic features. International Journal of Speech Technology, 15(2), 265–289.

    Article  Google Scholar 

  • Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In 2010 international conference on signal processing and communications (SPCOM) (pp. 1–5). Piscataway: IEEE.

    Google Scholar 

  • Kostoulas, T., Ganchev, T., Lazaridis, A., & Fakotakis, N. (2010) Enhancing Emotion recognition from speech through feature selection. In P. Sojka, A. Horák, I. Kopecek & K. Pala (Eds.) Text, speech and dialogue, lecture notes in artificial intelligence, Vol. 6231, pp. 338–344.

  • Kostoulas, T., Ganchev, T., Mporas, I., & Fakotakis, N. (2007) Detection of negative emotional states in real-world scenario. In Proceedings of 19th IEEE international conference on tools with artificial intelligence, pp 502–509.

  • Kotti, M., Paterno, F., & Kotropoulos, C. (2010). Speaker-independent negative emotion recognition. In 2010 2nd international workshop on cognitive information processing (CIP) (pp. 417–422). Piscataway: IEEE.

  • Le, D., Aldeneh, Z., & Provost, E. M. (2017). Discretized continuous speech emotion recognition with multi-task deep recurrent neural network. Interspeech, 2017.

  • Le, D., & Provost, E. M. (2013). Emotion recognition from spontaneous speech using hidden markov models with deep belief networks. In 2013 IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 216–221). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Lee, J., & Tashev, I. (2015). High-level feature representation using recurrent neural network for speech emotion recognition. In INTERSPEECH (pp. 1537–1540).

  • Lefter, I., Rothkrantz, L. J., Wiggers, P., & Van Leeuwen, D. A. (2010). Emotion recognition from speech by combining databases and fusion of classifiers. In Text, speech and dialogue (pp. 353–360). Berlin Heidelberg: Springer.

    Chapter  Google Scholar 

  • Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., … Sahli, H. (2013). Hybrid deep neural network–hidden markov model (DNN-HMM) based speech emotion recognition. In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. 312–317). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Li, Y., Chao, L., Liu, Y., Bao, W., & Tao, J. (2015) From simulated speech to natural speech, what are the robust features for emotion recognition? In International conference on affective computing and intelligent interaction (ACII) (pp. 368–373). Piscataway: IEEE

    Google Scholar 

  • Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In Signal and information processing association annual summit and conference (APSIPA), 2016 Asia-Pacific (pp. 1–4). Piscataway: IEEE.

    Google Scholar 

  • Litman, D. J., & Forbes-Riley, K. (2006). Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication, 48(5), 559–590.

    Article  Google Scholar 

  • Liu, J., Chen, C., Bu, J., You, M., & Tao, J. (2007). Speech emotion recognition based on a fusion of all-class and pairwise-class feature selection. Computational Science–ICCS 2007, 168–175.

  • Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490–501.

    Article  Google Scholar 

  • Lugger, M., Janoir, M. E., & Yang, B. (2009). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In Signal processing conference, 2009 17th European (1225–1229). Piscataway: IEEE.

    Google Scholar 

  • Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–17). Piscataway: IEEE.

    Google Scholar 

  • Lugger, M., & Yang, B. (2007). An incremental analysis of different feature groups in speaker independent emotion recognition. In 16th Int. congress of phonetic sciences.

  • Mannepalli, K., Sastry, P. N., & Suman, M. (2016). A novel adaptive fractional deep belief networks for speaker emotion recognition. Alexandria Engineering Journal.

  • Mao, Q., Xue, W., Rao, Q., Zhang, F., & Zhan, Y. (2016). Domain adaptation for speech emotion recognition by sharing priors between related source and target classes. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2608–2612). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Mao, X., Chen, L., & Fu, L. (2009). Multi-level speech emotion recognition based on HMM and ANN. In 2009 WRI World congress on computer science and information engineering (Vol. 7, pp. 225–229). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Mao, X., Zhang, B., & Luo, Y. (2007). Speech emotion recognition based on a hybrid of HMM/ANN. In Proceedings of the 7th conference on 7th WSEAS international conference on applied informatics and communications (Vol. 7, pp. 367–370).

  • Mencattini, A., Martinelli, E., Ringeval, F., Schuller, B., & Di Natlae, C. (2017). Continuous estimation of emotions in speech by dynamic cooperative speaker models. In IEEE transactions on affective computing.

  • Milton, A., Roy, S. S., & Selvi, S. T. (2013). Svm scheme for speech emotion recognition using mfcc feature. International Journal of Computer Applications, 69(9).

  • Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech & Language, 28(3), 727–742.

    Article  Google Scholar 

  • Mishra, H. K., & Sekhar, C. C. (2009). Variational Gaussian mixture models for speech emotion recognition. In Seventh international conference on advances in pattern recognition, 2009. ICAPR09. (pp. 183–186). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Morales-Perez, M., Echeverry-Correa, J., Orozco-Gutierrez, A., & Castellanos-Dominguez, G. (2008). Feature extraction of speech signals in emotion identification. In Engineering in medicine and biology society, 2008. EMBS 2008. 30th annual international conference of the IEEE (pp. 2590–2593). Piscataway: IEEE.

  • Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech communication, 49(2), 98–112.

    Article  Google Scholar 

  • Navas, E., Hernáez, I., & Luengo, I. (2006). An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS. EEE Transactions on Audio, Speech and Language Processing 14, 1117–1127.

    Article  Google Scholar 

  • Neiberg, D., & Elenius, K. (2008). Automatic recognition of anger in spontaneous speech. In 9th annual conference of the international speech communication association.

  • Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2(2), 92–105.

    Article  Google Scholar 

  • Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transactions on Affective Computing, 3(1), 116–125.

    Article  Google Scholar 

  • Pan, Y., Shen, P., & Shen, L. (2012). Speech emotion recognition using support vector machine. International Journal of Smart Home, 6(2), 101–108.

    Google Scholar 

  • Pao, T. L., Chien, C. S., Chen, Y. T., Yeh, J. H., Cheng, Y. M., & Liao, W. Y. (2007). Combination of multiple classifiers for improving emotion recognition in Mandarin speech. In 3rd international conference on intelligent information hiding and multimedia signal processing, 2007. IIHMSP 2007 (Vol. 1, pp. 35–38). Piscataway: IEEE.

    Google Scholar 

  • Pao, T. L., Wang, C. H., & Li, Y. J. (2012). A study on the search of the most discriminative speech features in the speaker dependent speech emotion recognition. In 2012 fifth international symposium on parallel architectures, algorithms and programming (PAAP) (pp. 157–162). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Pathak, S., & Kulkarni, A. (2011). Recognizing emotions from speech. In 2011 3rd international conference on electronics computer technology (ICECT) (Vol. 4, pp. 107–109). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Philippou-Hübner, D., Vlasenko, B., Böck, R., & Wendemuth, A. (2012). The performance of the speaking rate parameter in emotion recognition from speech. In 2012 IEEE international conference on multimedia and expo (ICME) (pp. 248–253). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Picard, R. W., & Picard, R. (1997). Affective computing (252). Cambridge: MIT press.

    Google Scholar 

  • Pierre-Yves, O. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies, 59(1), 157–183.

    Article  Google Scholar 

  • Planet, S., & Iriondo, I. (2012). Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition. In 2012 7th Iberian conference on information systems and technologies (CISTI) (pp. 1–6). Piscataway: IEEE.

    Google Scholar 

  • Plutchik, R. (1991). The emotions. Lanham, MD: University Press of America.

    Google Scholar 

  • Pohjalainen, J., Fabien Ringeval, F., Zhang, Z., & Schuller, B. (2016). Spectral and cepstral audio noise reduction techniques in speech emotion recognition. In Proceedings of the 2016 ACM on multimedia conference (pp. 670–674). ACM.

  • Polzehl, T., Schmitt, A., Metze, F., & Wagner, M. (2011). Anger recognition in speech using acoustic and linguistic cues. Speech Communication, 53(9), 1198–1209.

    Article  Google Scholar 

  • Přibil, J., & Přibilová, A. (2013). Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 8.

    Article  Google Scholar 

  • Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16(2), 143–160.

    Article  Google Scholar 

  • Rao, K. S., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S. V. S. K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3(2), 3603–3607.

    Google Scholar 

  • Rehmam, B., Halim, Z., Abbas, G., & Muhammad, T. (2015). Artificial neural network-based speech recognition using Dwt analysis applied on isolated words from oriental languages. Malaysian Journal of Computer Science, 28(3), 242–262.

    Article  Google Scholar 

  • Ringeval, F., & Chetouani, M. (2008). Exploiting a vowel based approach for acted emotion recognition. In Verbal and nonverbal features of human-human and human-machine interaction, pp. 243–254.

  • Rodríguez, P. H., Hernández, J. B. A., Ballester, M. A. F., González, C. M. T., & Orozco-Arroyave, J. R. (2013). Global selection of features for nonlinear dynamics characterization of emotional speech. Cognitive Computation, 5(4), 517–525.

    Article  Google Scholar 

  • Rong, J., Li, G., & Chen, Y. P. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing & Management, 45(3), 315–328.

    Article  Google Scholar 

  • Sagha, H., Deng, J., Gavryukova, M., Han, J., & Schuller, B. (2016). Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5800–5804). Piscataway: IEEE.

  • Sánchez-Gutiérrez, M. E., Albornoz, E. M., Martinez-Licona, F., Rufiner, H. L., & Goddard, J. (2014). Deep learning for emotional speech recognition. In Mexican conference on pattern recognition (pp. 311–320). Cham: Springer International Publishing.

    Google Scholar 

  • Scherer, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, audio, image and biomedical signal processing using neural networks, pp. 49–70.

  • Scherer, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, audio, image and biomedical signal processing using neural networks (pp. 49–70). Berlin Heidelberg: Springer.

    Chapter  Google Scholar 

  • Scherer, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, audio, image and biomedical signal processing using neural networks, 49–70.

  • Scherer, S., Schwenker, F., & Palm, G. (2009). Classifier fusion for emotion recognition from speech. In Advanced intelligent environments (pp. 95–117). Springer US.

  • Schmitt, M., Ringeval, F., & Schuller, B. W. (2016). At the border of acoustics and linguistics: bag-of-audio-words for the recognition of emotions in speech. In INTERSPEECH (pp. 495–499).

  • Schuller, B., Seppi, D., Batliner, A., Maier, A., & Steidl, S. (2007). Towards more reality in the recognition of emotional speech. In 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP’07 (Vol. 4, pp. IV–941). Piscataway: IEEE.

    Google Scholar 

  • Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., & Rigoll, G. (2010). Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Transactions on Affective Computing, 1(2), 119–131.

    Article  Google Scholar 

  • Schuller, B., Vlasenko, B., Minguez, R., Rigoll, G., & Wendemuth, A. (2007). Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In IEEE workshop on automatic speech recognition & understanding, 2007. ASRU (pp. 596–600). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Schuller, B. W. (2008). Speaker, noise, and acoustic space adaptation for emotion recognition in the automotive environment. In 2008 ITG conference on voice communication (SprachKommunikation) (pp. 1–4). VDE.

  • Schwenker, F., Scherer, S., Magdi, Y. M., & Palm, G. (2009). The GMM-SVM supervector approach for the recognition of the emotional status from speech. In International conference on artificial neural networks (pp. 894–903). Berlin, Heidelberg: Springer.

    Google Scholar 

  • Sedaaghi, M. H., Kotropoulos, C., & Ververidis, D. (2007). Using adaptive genetic algorithms to improve speech emotion recognition. In IEEE 9th workshop on multimedia signal processing, 2007. MMSP 2007. (pp. 461–464). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Seehapoch, T., & Wongthanavasu, S. (2013). Speech emotion recognition using support vector machines. In 2013 5th international conference on knowledge and smart technology (KST) (pp. 86–91). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Ser, W., Cen, L., & Yu, Z. L. (2008). A hybrid PNN-GMM classification scheme for speech emotion recognition. In 19th international conference on pattern recognition, 2008. ICPR 2008 (pp. 1–4). Piscataway: IEEE.

    Google Scholar 

  • Sethu, V., Ambikairajah, E., & Epps, J. (2007). Speaker normalisation for speech-based emotion detection. In 2007 15th international conference on digital signal processing (pp. 611–614). Piscataway: IEEE.

    Google Scholar 

  • Sethu, V., Ambikairajah, E., & Epps, J. (2008a). Phonetic and speaker variations in automatic emotion classification. In 9th annual conference of the international speech communication association.

  • Sethu, V., Ambikairajah, E., & Epps, J. (2008b). Empirical mode decomposition based weighted frequency feature for speech-based emotion classification. In IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. (pp. 5017–5020). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Sethu, V., Ambikairajah, E., & Epps, J. (2009). Speaker dependency of spectral features and speech production cues for automatic emotion classification. In IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009. (pp. 4693–4696). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Sethu, V., Ambikairajah, E., & Epps, J. (2013). On the use of speech parameter contours for emotion recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 19.

    Article  Google Scholar 

  • Shah, F. (2009). Automatic emotion recognition from speech using artificial neural networks with gender-dependent databases. In International conference on advances in computing, control, & telecommunication technologies, 2009. ACT09. (pp. 162–164). Piscataway: IEEE.

    Google Scholar 

  • Shah, M., Miao, L., Chakrabarti, C., & Spanias, A. (2013). A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2553–2557). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Shami, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication, 49(3), 201–212.

    Article  Google Scholar 

  • Shaukat, A., & Chen, K. (2011). Emotional state recognition from speech via soft-competition on different acoustic representations. In The 2011 international joint conference on neural networks (IJCNN) (pp. 1910–1917). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Shaw, A., Vardhan, R. K., & Saxena, S. (2016). Emotion recognition and classification in speech using Artificial neural networks. International Journal of Computer Applications, 145(8).

  • Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.

    Article  Google Scholar 

  • Sheikhan, M., Gharavian, D., & Ashoftedel, F. (2012). Using DTW neural–based MFCC warping to improve emotional speech recognition. Neural Computing and Applications, 21(7), 1765–1773.

    Article  Google Scholar 

  • Shen, P., Changjun, Z., & Chen, X. (2011). Automatic speech emotion recognition using support vector machine. In 2011 international conference on electronic and mechanical engineering and information technology (EMEIT) (Vol. 2, pp. 621–625). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Sidorov, M., Ultes, S., & Schmitt, A. (2014). Emotions are a personal thing: Towards speaker-adaptive emotion recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4803–4807). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Soltani, K., & Ainon, R. N. (2007). Speech emotion detection based on neural networks. In 9th international symposium on signal processing and its applications, 2007. ISSPA 2007. (pp. 1–3). Piscataway: IEEE.

    Google Scholar 

  • Song, P., Ou, S., Zheng, W., Jin, Y., & Zhao, L. (2016). Speech emotion recognition using transfer non-negative matrix factorization. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5180–5184). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Song, P., Zheng, W., Ou, S., Zhang, X., Jin, Y., Liu, J., & Yu, Y. (2016). Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization. Speech Communication, 83, 34–41.

    Article  Google Scholar 

  • Steidl, S., Batliner, A., Nöth, E., & Hornegger, J. (2008). Quantification of segmentation and F0 errors and their effect on emotion recognition. In Text, speech and dialogue (pp. 525–534). Berlin/Heidelberg: Springer.

    Chapter  Google Scholar 

  • Sun, Y., & Wen, G. (2015). Emotion recognition using semi-supervised feature selection with speaker normalization. International Journal of Speech Technology, 18(3), 317–331.

    Article  Google Scholar 

  • Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical Signal Processing and Control, 18, 80–90.

    Article  Google Scholar 

  • Sun, Y., Zhou, Y., Zhao, Q., & Yan, Y. (2009). Acoustic feature optimization for emotion affected speech recognition. In International conference on information engineering and computer science, 2009. ICIECS 2009. (pp. 1–4). Piscataway: IEEE.

    Google Scholar 

  • Swain, M., Sahoo, S., Routray, A., Kabisatpathy, P., & Kundu, J. N. (2015). Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition. International Journal of Speech Technology, 18(3), 387–393.

    Article  Google Scholar 

  • Sztahó, D., Imre, V., & Vicsi, K. (2011). Automatic classification of emotions in spontaneous speech. Analysis of verbal and nonverbal communication and enactment. The Processing Issues, pp. 229–239.

  • Tabatabaei, T. S., Krishnan, S., & Guergachi, A. (2007). Emotion recognition using novel speech signal features. In IEEE international symposium on circuits and systems, 2007. ISCAS 2007 (pp. 345–348). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Tahon, M., & Devillers, L. (2015). Towards a small set of robust acoustic features for emotion recognition: IEEE/ACM transactions on challenges audio, speech, and language processing, 24(1), 16–28.

  • Tamulevicius, G., & Liogiene, T. (2015). Low-order multi-level features for speech emotions recognition. Baltic Journal of Modern Computing, 3(4), 234–247.

    Google Scholar 

  • Tarasov, A., & Delany, S. J. (2011). Benchmarking classification models for emotion recognition in natural speech: A multi-corporal study. In 2011 IEEE international conference on automatic face & gesture recognition and workshops (FG 2011) (pp. 841–846). Piscataway: IEEE.

    Google Scholar 

  • Ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication, 40(1), 213–225.

    Article  MATH  Google Scholar 

  • Thapliyal, N., & Amoli, G. (2012). Speech based emotion recognition with gaussian mixture model. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 1(5), 65.

    Google Scholar 

  • Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M. A., Schuller, B., & Zafeiriou, S. (2016). Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 5200–5204). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Truong, K., & Van Leeuwen, D. (2007). An ‘open-set’detection evaluation methodology for automatic emotion recognition in speech. In Workshop on paralinguistic speech-between models and data (pp. 5–10).

  • Tseng, M., Hu, Y., Han, W. W., & Bergen, B. (2005). “Searching for happiness” or” Full of Joy”? Source domain activation matters. In annual meeting of the Berkeley linguistics society (Vol. 31, No. 1, pp. 359–370).

  • Utane, A. S., & Nalbalwar, S. L. (2013). Emotion recognition through speech using gaussian mixture model and support vector machine. Emotion, 2, 8.

    Google Scholar 

  • Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.

    Article  Google Scholar 

  • Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., & Wendemuth, A. (2011a). Vowels formants analysis allows straightforward detection of high arousal emotions. In 2011 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). Piscataway: IEEE.

  • Vlasenko, B., Prylipko, D., Philippou-Hübner, D., & Wendemuth, A. (2011b). Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In 12th annual conference of the international speech communication association.

  • Vlasenko, B., Schuller, B., Wendemut, A., & Rigoll, G. (2007) Frame vs Turn-level: emotion recognition from speech considering static and dynamic processing. In Proceedings 2nd international conference on affective computing and intelligent interaction, pp 139–147.

  • Vogt, T., & André, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceeding language resources and evaluation conference (LREC 2006), Genoa.

  • Vogt, T., & André, E. (2009). Exploring the benefits of discretization of acoustic features for speech emotion recognition. In 10th annual conference of the international speech communication association.

  • Vogt, T., & André, E. (2011). An evaluation of emotion units and feature types for real-time speech emotion recognition. KI-Künstliche Intelligenz, 25(3), 213–223.

    Article  Google Scholar 

  • Vondra, M., & Vích, R. (2009). Evaluation of speech emotion classification based on GMM and data fusion. In Cross-modal analysis of speech, gestures, gaze and facial expressions, pp. 98–105.

  • Wagner, J., Vogt, T., & André, E. (2007). A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In international conference on affective computing and intelligent interaction (pp. 114–125). Springer, Berlin, Heidelberg.

  • Wang, F., Verhelst, W., & Sahli, H. (2011). Relevance vector machine based speech emotion recognition. In Affective computing and intelligent interaction, pp. 111–120.

  • Weninger, F., Ringeval, F., Marchi, E., & Schuller, B. W. (2016). Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio. In IJCAI (pp. 2196–2202).

  • Wenjing, H., Haifeng, L., & Chunyu, G. (2009). A hybrid speech emotion perception method of VQ-based feature processing and ANN recognition. In WRI global congress on intelligent systems, 2009. GCIS’09. (Vol. 2, pp. 145–149). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Womack, B. D., & Hansen, J. H. (1999). N-channel hidden Markov models for combined stressed speech classification and recognition. IEEE Transactions on Speech and Audio Processing, 7(6), 668–677.

    Article  Google Scholar 

  • Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction Affective Computing, 2, 10–21.

    Article  Google Scholar 

  • Wu, S., Falk, T. H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In 2009 16th international conference on digital signal processing (pp. 1–6). Piscataway: IEEE.

  • Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785.

    Article  Google Scholar 

  • Wu, T., Yang, Y., Wu, Z., & Li, D. (2006). MASC: A speech corpus in mandarin for emotion analysis and affective speaker recognition. In Speaker and language recognition workshop, 2006. IEEE Odyssey 2006 (pp. 1–5). Piscataway: IEEE.

    Google Scholar 

  • Xiao, Z., Dellandréa, E., Chen, L., & Dou, W. (2009). Recognition of emotions in speech by a hierarchical approach. In 3rd international conference on affective computing and intelligent interaction and workshops, 2009. ACII 2009. (pp. 1–8). Piscataway: IEEE.

    Google Scholar 

  • Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2006). Two-stage classification of emotional speech. In international conference on digital telecommunications, 2006. ICDT06. (pp. 32–32). Piscataway: IEEE.

  • Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2007, December). Automatic hierarchical classification of emotional speech. In 9th IEEE international symposium on multimedia workshops, 2007. ISMW07. (pp. 291–296). Piscataway: IEEE.

  • Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2007). Hierarchical classification of emotional speech. IEEE Transactions on Multimedia, 37.

  • Yang, B., & Lugger, M. (2010). Emotion recognition from speech signals using new harmony features. Signal Processing, 90(5), 1415–1423.

    Article  MATH  Google Scholar 

  • Yang, N., Muraleedharan, R., Kohl, J., Demirkol, I., Heinzelman, W., & Sturge-Apple, M. (2012). Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion. In Spoken Language Technology Workshop (SLT), 2012 IEEE (pp. 455–460). Piscataway: IEEE.

    Chapter  Google Scholar 

  • Ye, C., Liu, J., Chen, C., Song, M., & Bu, J. (2008). Speech emotion classification on a Riemannian manifold. In Advances in multimedia information processing-PCM 2008, pp. 61–69.

  • Yeh, J. H., Pao, T. L., Lin, C. Y., Tsai, Y. W., & Chen, Y. T. (2011). Segment-based emotion recognition from continuous Mandarin Chinese speech. Computers in Human Behavior, 27(5), 1545–1552.

    Article  Google Scholar 

  • You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2006). A hierarchical framework for speech emotion recognition. In 2006 IEEE international symposium on industrial electronics (Vol. 1, pp. 515–519). Piscataway: IEEE.

    Chapter  Google Scholar 

  • You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2006). Emotional speech analysis on nonlinear manifold. In 18th international conference on pattern recognition, 2006. ICPR 2006. (Vol. 3, pp. 91–94). Piscataway: IEEE.

    Google Scholar 

  • Yun, S., & Yoo, C. D. (2012). Loss-scaled large-margin Gaussian mixture models for speech emotion classification. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 585–598.

    Article  Google Scholar 

  • Yüncü, E., Hacihabiboglu, H., & Bozsahin, C. (2014). Automatic speech emotion recognition using auditory models with binary decision tree and svm. In 2014 22nd international conference on pattern recognition (ICPR) (pp. 773–778). Piscataway: IEEE.

  • Zbancioc, M., & Feraru, S. M. (2012). Emotion recognition of the SROL Romanian database using fuzzy KNN algorithm. In 10th international symposium on electronics and telecommunications (ISETC), 2012 (pp. 347–350). Piscataway: IEEE.

  • Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 39–58.

    Article  Google Scholar 

  • Zha, C., Yang, P., Zhang, X., & Zhao, L. (2016). Spontaneous speech emotion recognition via multiple kernel learning. In 2016 eighth international conference on measuring technology and mechatronics automation (ICMTMA) (pp. 621–623). Piscataway: IEEE.

  • Zhang, S., Lei, B., Chen, A., Chen, C., & Chen, Y. (2010). Spoken emotion recognition using local fisher discriminant analysis. In 10th international conference on signal processing (ICSP), 2010 IEEE (pp. 538–540). Piscataway: IEEE.

  • Zhang, S., & Zhao, Z. (2008). Feature selection filtering methods for emotion recognition in Chinese speech signal. In 9th international conference on signal processing, 2008. ICSP 2008. (pp. 1699–1702). Piscataway: IEEE.

  • Zheng, W. Q., Yu, J. S., & Zou, Y. X. (2015). An experimental study of speech emotion recognition based on deep convolutional neural networks. In 2015 international conference on affective computing and intelligent interaction (ACII) (pp. 827–831). Piscataway: IEEE.

  • Zhou, J., Wang, G., Yang, Y., & Chen, P. (2006). Speech emotion recognition based on rough set and SVM. In 5th IEEE international conference on cognitive informatics, 2006. ICCI 2006. (Vol. 1, pp. 53–61). Piscataway: IEEE.

  • Zhou, Y., Sun, Y., Yang, L., & Yan, Y. (2009). Applying articulatory features to speech emotion recognition. In international conference on research challenges in computer science, 2009. ICRCCS09. (pp. 73–76). Piscataway: IEEE.

  • Zhu, L., Chen, L., Zhao, D., Zhou, J., & Zhang, W. (2017). Emotion recognition from chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17(7), 1694.

    Article  Google Scholar 

  • Zong, Y., Zheng, W., Zhang, T., & Huang, X. (2016). Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Processing Letters, 23(5), 585–589.

    Article  Google Scholar 

Download references

Funding

The funding was provided by University of Malaya Research Grant (AFR (Frontier Science)) (Grant Number: RG284-14AFR), Postgraduate Research Grant (PPP) (Grant Number: PG220-2014B).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mumtaz Begum Mustafa.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mustafa, M.B., Yusoof, M.A.M., Don, Z.M. et al. Speech emotion recognition research: an analysis of research focus. Int J Speech Technol 21, 137–156 (2018). https://doi.org/10.1007/s10772-018-9493-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9493-x

Keywords

Navigation