Speech emotion recognition research: an analysis of research focus

Mustafa, Mumtaz Begum; Yusoof, Mansoor A. M.; Don, Zuraidah M.; Malekzadeh, Mehdi

doi:10.1007/s10772-018-9493-x

Speech emotion recognition research: an analysis of research focus

Published: 22 January 2018

Volume 21, pages 137–156, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Mumtaz Begum Mustafa ORCID: orcid.org/0000-0002-2835-4084¹,
Mansoor A. M. Yusoof^2,3,
Zuraidah M. Don⁴ &
…
Mehdi Malekzadeh⁵

2060 Accesses
56 Citations
Explore all metrics

Abstract

This article analyses research in speech emotion recognition (“SER”) from 2006 to 2017 in order to identify the current focus of research, and areas in which research is lacking. The objective is to examine what is being done in this field of research. Searching on selected keywords, we extracted and analysed 260 articles from well-known online databases. The analysis indicates that SER research is an active field of research, dozens of articles being published each year in journals and conference proceedings. The majority of articles concentrate on three critical aspects of SER, namely (1) databases, (2) suitable speech features, and (3) classification techniques to maximize the recognition accuracy of SER systems. Having carried out association analysis of the critical aspects and how they influence the performance of the SER system in term of recognition accuracy, we found that certain combination of databases, speech features and classifiers influence the recognition accuracy of the SER system. We have also suggested aspects of SER that could be taken into consideration in future works based on our review.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Transformer models for text-based emotion detection: a review of BERT-based approaches

Article 08 February 2021

Notes

http://www.cs.waikato.ac.nz/ml/weka/.

References

Abdelwahab, M., & Busso, C. (2017). Incremental adaptation using active learning for acoustic emotion Recognition. In International conference on acoustics, speech and signal processing.
Alam, M. J., Attabi, Y., Dumouchel, P., Kenny, P., & O’Shaughnessy, D. D. (2013). Amplitude modulation features for emotion recognition from speech. In INTERSPEECH (pp. 2420–2424).
Albornoz, E. M., Crolla, M. B., & Milone, D. H. (2008). Recognition of emotions in speech. In Proceedings of XXXIV CLEI, Santa Fe Argentina, pp. 1120–1129.
Albornoz, E. M., Milone, D. H., & Rufiner, H. L. (2011). Spoken emotion recognition using hierarchical classifiers. Computer Speech & Language, 25(3), 556–570.
Article Google Scholar
Altun, H., & Polat, G. (2009). Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection. Expert Systems with Applications, 36(4), 8197–8203.
Article Google Scholar
Álvarez, A., Cearreta, I., López, J. M., Arruti, A., Lazkano, E., Sierra, B., & Garay, N. (2007). A comparison using different speech parameters in the automatic emotion recognition using Feature Subset Selection based on Evolutionary Algorithms. In International conference on text, speech and dialogue (pp. 423–430). Berlin: Springer.
Chapter Google Scholar
Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2012). Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155–177.
Article Google Scholar
Ananthakrishnan, S., Vembu, A. N., & Prasad, R. (2011). Model-based parametric features for emotion recognition from speech. In 2011 IEEE workshop on automatic speech recognition and understanding (ASRU), (pp. 529–534). Piscataway: IEEE.
Chapter Google Scholar
Arias, J. P., Busso, C., & Yoma, N. B. (2013). Energy and F0 contour modeling with functional data analysis for emotional speech detection. In INTERSPEECH (pp. 2871–2875).
Arias, J. P., Busso, C., & Yoma, N. B. (2014). Shape-based modeling of the fundamental frequency contour for emotion detection in speech. Computer Speech & Language, 28(1), 278–294.
Article Google Scholar
Atassi, H., & Esposito, A. (2008). A speaker independent approach to the classification of emotional vocal expressions. In 20th IEEE international conference on tools with artificial intelligence, 2008. ICTAI’08. (Vol. 2, pp. 147–152). Piscataway: IEEE.
Chapter Google Scholar
Atassi, H., Smekal, Z., & Esposito, A. (2012). Emotion recognition from spontaneous Slavic speech. In 2012 IEEE 3rd international conference on cognitive infocommunications (CogInfoCom) (pp. 389–394). Piscataway: IEEE.
Chapter Google Scholar
Athanaselis, T., Bakamidis, S., Dologlou, I., Cowie, R., Douglas-Cowie, E., & Cox, C. (2005). ASR for emotional speech: Clarifying the issues and enhancing performance. Neural Networks, 18(4), 437–444.
Article Google Scholar
Attabi, Y., & Dumouchel, P. (2012). Emotion recognition from speech: WOC-NN and class-interaction. In 2012 11th international conference on information science, signal processing and their applications (ISSPA) (pp. 126–131). Piscataway: IEEE.
Chapter Google Scholar
Attabi, Y., & Dumouchel, P. (2013). Anchor models for emotion recognition from speech. IEEE Transactions on Affective Computing, 4(3), 280–290.
Article Google Scholar
Bahreini, K., Nadolski, R., & Westera, W. (2016). Towards real-time speech emotion recognition for affective e-learning. Education and Information Technologies, 21(5), 1367–1386.
Article Google Scholar
Balti, H., & Elmaghraby, A. S. (2013). Speech emotion detection using time dependent self organizing maps. In 2013 IEEE international symposium on signal processing and information technology (ISSPIT) (pp. 000470–000478). Piscataway: IEEE.
Google Scholar
Barra Chicote, R., Fernández Martínez, F., Lutfi, L., Binti, S., Lucas Cuesta, J. M., Macías Guarasa, J., … Pardo Muñoz, J. M. (2009). Acoustic emotion recognition using dynamic Bayesian networks and multi-space distributions. ISCA.
Batliner, A., Schuller, B., Seppi, D., Steidl, S., Devillers, L., Vidrascu, L., & Amir, N. (2011). The automatic recognition of emotions in speech. In Emotion-oriented systems (pp. 71–99). Berlin Heidelberg: Springer.
Chapter Google Scholar
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Laskowski, K., Vogt, T., … Aharonson, V. (2006). Combining efforts for improving automatic classification of emotional user states. Proc. IS-LTC 240–245.
Batliner, A., Steidl, S., Schuller, B., Seppi, D., Vogt, T., Devillers, L., … Aharonson, V. (2007). The impact of F0 extraction errors on the classification of prominence and emotion. Proc. ICPhS 2201–2204.
Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., & Rose, R. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10), 763–786.
Article Google Scholar
Bertero, D., & Fung, P. (2017). A first look into a Convolutional Neural Network for speech emotion detection. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5115–5119). Piscataway: IEEE.
Chapter Google Scholar
Bhaykar, M., Yadav, J., & Rao, K. S. (2013). Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM. 2013 national conference on communications (NCC) (pp. 1–5). Piscataway: IEEE.
Google Scholar
Bitouk, D., Nenkova, A., & Verma, R. (2009). Improving emotion recognition using class-level spectral features. In INTERSPEECH (pp. 2023–2026).
Böck, R., Hübner, D., & Wendemuth, A. (2010). Determining optimal signal features and parameters for hmm-based emotion classification. In 15th IEEE mediterranean electrotechnical conference MELECON 2010–2010 (pp. 1586–1590). Piscataway: IEEE.
Chapter Google Scholar
Bojanić, M., Crnojević, V., & Delić, V. (2012). Application of neural networks in emotional speech recognition. In 2012 11th symposium on neural network applications in electrical engineering (NEUREL) (pp. 223–226). Piscataway: IEEE.
Google Scholar
Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2010). Use of line spectral frequencies for emotion recognition from speech. In 2010 20th international conference on pattern recognition (ICPR) (pp. 3708–3711). Piscataway: IEEE.
Chapter Google Scholar
Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2011). Formant position based weighted spectral features for emotion recognition. Speech Communication, 53(9), 1186–1197.
Article Google Scholar
Bozkurt, E., Erzin, E., Eroğlu Erdem, Ç, & Erdem, T. (2009). Improving automatic emotion recognition from speech signals. In 10th annual conference of the international speech communication association 2009 (INTERSPEECH 2009). International Speech Communications Association.
Brester, C., Semenkin, E., & Sidorov, M. (2016). Multi-objective heuristic feature selection for speech-based multilingual emotion recognition. Journal of Artificial Intelligence and Soft Computing Research, 6(4), 243–253.
Article Google Scholar
Brooks, C. A., Thompson, C., & Kovanović, V. (2016). Introduction to data mining for educational researchers. In Proceedings of the 6th international conference on learning analytics & knowledge (pp. 505–506). ACM.
Busso, C., Lee, S., & Narayanan, S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 582–596.
Article Google Scholar
Busso, C., Mariooryad, S., Metallinou, A., & Narayanan, S. (2013). Iterative feature normalization scheme for automatic emotion detection from speech. IEEE Transactions on Affective Computing, 4(4), 386–397.
Article Google Scholar
Busso, C., Metallinou, A., & Narayanan, S. S. (2011). Iterative feature normalization for emotional speech detection. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5692–5695). Piscataway: IEEE.
Chapter Google Scholar
Calvo, R. A., & D’Mello, S. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Transactions on Affective Computing, 1(1), 18–37.
Article Google Scholar
Casale, S., Russo, A., Scebba, G., & Serrano, S. (2008). Speech emotion classification using machine learning algorithms. In 2008 IEEE international conference on semantic computing (pp. 158–165). Piscataway: IEEE.
Chapter Google Scholar
Casale, S., Russo, A., & Serrano, S. (2010). Analysis of robustness of attributes selection applied to speech emotion recognition. In 2010 18th European signal processing conference (pp. 1174–1178). Piscataway: IEEE.
Google Scholar
Chakraborty, R., Pandharipande, M., & Kopparapu, S. K. (2016). Knowledge-based framework for intelligent emotion recognition in spontaneous speech. Procedia Computer Science, 96, 587–596.
Article Google Scholar
Chandaka, S., Chatterjee, A., & Munshi, S. (2009). Support vector machines employing cross-correlation for emotional speech recognition. Measurement, 42(4), 611–618.
Article Google Scholar
Chandrakala, S., & Sekhar, C. C. (2009). Combination of generative models and SVM based classifier for speech emotion recognition. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 497–502). Piscataway: IEEE.
Google Scholar
Chavhan, Y., Dhore, M. L., & Yesaware, P. (2010). Speech emotion recognition using support vector machine. International Journal of Computer Applications, 1(20), 6–9.
Article Google Scholar
Chavhan, Y. D., Yelure, B. S., & Tayade, K. N. (2015). Speech emotion recognition using RBF kernel of LIBSVM. In 2015 2nd international conference on electronics and communication systems (ICECS) (pp. 1132–1135). Piscataway: IEEE.
Chapter Google Scholar
Chen, L., Mao, X., Wei, P., Xue, Y., & Ishizuka, M. (2012). Mandarin emotion recognition combining acoustic and emotional point information. Applied Intelligence, 37(4), 602–612.
Article Google Scholar
Chenchah, F., & Lachiri, Z. (2014). Speech emotion recognition in acted and spontaneous context. Procedia Computer Science, 39, 139–145.
Article Google Scholar
Cheng, X., & Duan, Q. (2012). Speech emotion recognition using gaussian mixture model. In The 2nd international conference on computer application and system modeling.
Chiou, B. C., & Chen, C. P. (2013). Feature space dimension reduction in speech emotion recognition using support vector machine. In Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific (pp. 1–6). Piscataway: IEEE.
Google Scholar
Christina, I. J., & Milton, A. (2012). Analysis of all pole model to recognize emotions from speech signal. In 2012 international conference on computing, electronics and electrical technologies (ICCEET) (pp. 723–728). Piscataway: IEEE.
Chapter Google Scholar
Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schroder, M. (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In Proceedings of ISCA speech and emotion workshop, pp 19–24.
Cummins, N., Amiriparian, S., Hagerer, G., Batliner, A., Steidl, S., & Schuller, B. (2017). An Image-based deep spectrum feature representation for the recognition of emotional speech. In Proceedings of the 25th ACM international conference on multimedia, MM. Piscataway: IEEE.
Google Scholar
D’Mello, S., & Kory, J. (2012). Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies. In Proceedings of the 14th ACM international conference on multimodal interaction (pp. 31–38). ACM.
Dai, K., Fell, H. J., & MacAuslan, J. (2008). Recognizing emotion in speech using neural networks. Telehealth and Assistive Technologies, 31, 38–43.
Google Scholar
Delic, V., Bojanic, M., Gnjatovic, M., Secujski, M., & Jovicic, S. T. (2012). Discrimination capability of prosodic and spectral features for emotional speech recognition. Elektronika ir Elektrotechnika, 18(9), 51–54.
Article Google Scholar
Deng, J., Han, W., & Schuller, B. (2012). Confidence measures for speech emotion recognition: A start. In Proceedings of speech communication; 10. ITG symposium (pp. 1–4). VDE.
Deng, J., Xu, X., Zhang, Z., Frühholz, S., Grandjean, D., & Schuller, B. (2017). Fisher kernels on phase-based features for speech emotion recognition. In Dialogues with social robots (pp. 195–203). Springer: Singapore.
Chapter Google Scholar
Deng, J., Zhang, Z., Eyben, F., & Schuller, B. (2014). Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Signal Processing Letters, 21(9), 1068–1072.
Article Google Scholar
Deng, J., Zhang, Z., Marchi, E., & Schuller, B. (2013). Sparse autoencoder-based feature transfer learning for speech emotion recognition. In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. 511–516). Piscataway: IEEE.
Chapter Google Scholar
Devillers, L., & Vidrascu, L. (2006). Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In 9th international conference on spoken language processing.
Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: towards a new generation of databases. Speech Communication, 40, 33–60.
Article MATH Google Scholar
Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, O., McRorie, M., Martin, J. C., Devillers, L., Abrilan, S., Batliner, A., Amir, N., & Karpouzis, K. (2007) The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of international conference affective computing and intelligent interaction, pp 488–500.
Ekman, P. (1957). A methodological discussion of non-verbal behavior. Journal of Psychology, 43, 141–149.
Article Google Scholar
Ekman, P. (1972). Universals and cultural differences in facial expression of emotion. In J. Cole (Ed.), Nebraska symposium on motivation (pp. 207–283). Lincoln, NE: University of Nebraska Press.
Google Scholar
Ekman, P. (1999). Basic emotions. In T. Dalgleish & M. Power (Eds.), Handbook of cognition and emotion. Chichester: Wiley.
Google Scholar
El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587.
Article MATH Google Scholar
Elbarougy, R., & Akagi, M. (2012). Speech emotion recognition system based on a dimensional approach using a three-layered model. In Signal & information processing association annual summit and conference (APSIPA ASC), 2012 Asia-Pacific (pp. 1–9). Piscataway: IEEE.
Google Scholar
Elbarougy, R., & Akagi, M. (2013). Cross-lingual speech emotion recognition system based on a three-layer model for human perception. In Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific (pp. 1–10). Piscataway: IEEE.
Google Scholar
Erdem, C. E., Bozkurt, E., Erzin, E., & Erdem, A. T. (2010). RANSAC-based training data selection for emotion recognition from spontaneous speech. In Proceedings of the 3rd international workshop on affective interaction in natural environments (pp. 9–14). ACM.
Esmaileyan, Z., & Marvi, H. (2014). Recognition of emotion in speech using variogram based features. Malaysian Journal of Computer Science, 27(3), 156–170.
Google Scholar
Espinosa, H. P., García, C. A. R., & Pineda, L. V. (2010). Features selection for primitives estimation on emotional speech. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 5138–5141). Piscataway: IEEE.
Chapter Google Scholar
Fayek, H. M., Lech, M., & Cavedon, L. (2016). On the correlation and transferability of features between automatic speech recognition and speech emotion recognition. In INTERSPEECH (pp. 3618–3622).
Feraru, M., & Zbancioc, M. (2013). Speech emotion recognition for SROL database using weighted KNN algorithm. In 2013 international conference on electronics, computers and artificial intelligence (ECAI) (pp. 1–4). Piscataway: IEEE.
Google Scholar
Fernandez, R., & Picard, R. (2011). Recognizing affect from speech prosody using hierarchical graphical models. Speech Communication, 53(9), 1088–1103.
Article Google Scholar
Firoz Shah, A., Vimal, K. V. R., Raji, S. A., Jayakumar, A., & Babu, A. P. (2009) Speaker independent automatic emotion recognition from speech: a comparison of MFCCs and discrete wavelet transforms. In Proceedings of international conference on advances in recent technologies in communication and computing, pp 528–531.
Fu, L., Mao, X., & Chen, L. (2008a). Relative speech emotion recognition based artificial neural network. In Pacific-Asia workshop on computational intelligence and industrial application, 2008. PACIIA’08. (Vol. 2, pp. 140–144). Piscataway: IEEE.
Chapter Google Scholar
Fu, L., Mao, X., & Chen, L. (2008b). Speaker independent emotion recognition based on SVM/HMMs fusion system. In International conference on audio, language and image processing, 2008. ICALIP 2008 (pp. 61–65). Piscataway: IEEE.
Google Scholar
Gamage, K. W., Sethu, V., & Ambikairajah, E. (2017). Salience based lexical features for emotion recognition. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5830–5834). Piscataway: IEEE.
Chapter Google Scholar
Garg, V., Kumar, H., & Sinha, R. (2013). Speech based emotion recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers. In 2013 national conference on communications (NCC) (pp. 1–5). Piscataway: IEEE.
Google Scholar
Gaurav, M. (2008). Performance analysis of spectral and prosodic features and their fusion for emotion recognition in speech. In Spoken language technology workshop, 2008. SLT 2008 (pp. 313–316). Piscataway: IEEE.
Chapter Google Scholar
Georgogiannis, A., & Digalakis, V. (2012). Speech emotion recognition using non-linear teager energy based features in noisy environments. In 2012 proceedings of the 20th European signal processing conference (EUSIPCO) (pp. 2045–2049). Piscataway: IEEE.
Google Scholar
Gharavian, D., Sheikhan, M., & Ashoftedel, F. (2013). Emotion recognition improvement using normalized formant supplementary features by hybrid of DTW-MLP-GMM model. Neural Computing and Applications, 22(6), 1181–1191.
Article Google Scholar
Gharavian, D., Sheikhan, M., & Janipour, M. (2010). Pitch in emotional speech and emotional speech recognition using pitch frequency. Majlesi Journal of Electrical Engineering, 4(1).
Gharavian, D., Sheikhan, M., Nazerieh, A., & Garoucy, S. (2012). Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Computing and Applications, 21(8), 2115–2126.
Article Google Scholar
Gharsellaoui, S., Selouani, S. A., & Dahmane, A. O. (2015). Automatic emotion recognition using auditory and prosodic indicative features. In 2015 IEEE 28th Canadian conference on electrical and computer engineering (CCECE) (pp. 1265–1270). Piscataway: IEEE.
Chapter Google Scholar
Giannoulis, P., & Potamianos, G. (2012). A hierarchical approach with feature selection for emotion recognition from speech. In LREC (pp. 1203–1206).
Glüge, S., Böck, R., & Wendemuth, A. (2011). Segmented-memory recurrent neural networks versus hidden markov models in emotion recognition from speech. In IJCCI (NCTA) (pp. 308–315).
Grimm, M., Kroschel, K., Mower, E., & Narayanan, S. (2007a). Primitives-based evaluation and estimation of emotions in speech. Speech Communication, 49(10), 787–800.
Article Google Scholar
Grimm, M., Kroschel, K., & Narayanan, S. (2007b). Support vector regression for automatic recognition of spontaneous emotions in speech. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–1085). Piscataway: IEEE.
Google Scholar
Hamidi, M., & Mansoorizade, M. (2012). Emotion recognition from persian speech with neural network. International Journal of Artificial Intelligence & Applications, 3(5), 107.
Article Google Scholar
Han, J., Zhang, Z., Ringeval, F., & Schuller, B. (2017). Prediction-based learning for continuous emotion recognition in speech. In 42nd IEEE international conference on acoustics, speech, and signal processing, ICASSP 2017.
Han, K., Yu, D., & Tashev, I. (2014). Speech emotion recognition using deep neural network and extreme learning machine. In 15th annual conference of the international speech communication association.
Harimi, A., Fakhr, H. S., & Bakhshi, A. (2016). Recognition of emotion using reconstructed phase space of speech. Malaysian Journal of Computer Science, 29(4), 262–271.
Article Google Scholar
Hassan, A., & Damper, R. I. (2009). Emotion recognition from speech using extended feature selection and a simple classifier. In 10th annual conference of the international speech communication association.
He, L., Lech, M., Maddage, N. C., & Allen, N. B. (2011). Study of empirical mode decomposition and spectral analysis for stress and emotion classification in natural speech. Biomedical Signal Processing and Control, 6(2), 139–146.
Article Google Scholar
Henríquez, P., Alonso, J. B., Ferrer, M. A., Travieso, C. M., & Orozco-Arroyave, J. R. (2014). Nonlinear dynamics characterization of emotional speech. Neurocomputing, 132, 126–135.
Article Google Scholar
Hu, H., Xu, M. X., & Wu, W. (2007). Fusion of global statistical and segmental spectral features for speech emotion recognition. In INTERSPEECH (pp. 2269–2272).
Hu, H., Xu, M. X., & Wu, W. (2007). GMM supervector based SVM with spectral features for speech emotion recognition. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–413). Piscataway: IEEE.
Google Scholar
Huang, R., & Ma, C. (2006). Toward a speaker-independent real-time affect detection system. In 18th international conference on pattern recognition, 2006. ICPR 2006. (Vol. 1, pp. 1204–1207). Piscataway: IEEE.
Google Scholar
Huang, Y., Wu, A., Zhang, G., & Li, Y. (2016). Speech emotion recognition based on deep belief networks and wavelet packet cepstral coefficients. International Journal of Simulation: Systems, Science and Technology, 17(28), 28–31.
Google Scholar
Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2014). Speech emotion recognition using CNN. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 801–804). ACM.
Hussain, L., Shafi, I., Saeed, S., Abbas, A., Awan, I. A., Nadeem, S. A., … Rahman, B. (2017). A radial base neural network approach for emotion recognition in human speech. IJCSNS, 17(8), 52.
Google Scholar
Iliev, A. I., & Scordilis, M. S. (2008). Emotion recognition in speech using inter-sentence Glottal statistics. In 15th international conference on systems, signals and image processing, 2008. IWSSIP 2008. (pp. 465–468). Piscataway: IEEE.
Chapter Google Scholar
Iliev, A. I., Scordilis, M. S., Papa, J. P., & Falcão, A. X. (2010). Spoken emotion recognition through optimum-path forest classification using glottal features. Computer Speech & Language, 24(3), 445–460.
Article Google Scholar
Iliou, T., & Anagnostopoulos, C. N. (2009). Comparison of different classifiers for emotion recognition. In 13th panhellenic conference on informatics, 2009. PCI’09. (pp. 102–106). Piscataway: IEEE.
Chapter Google Scholar
Iliou, T., & Anagnostopoulos, C. N. (2010a). SVM-MLP-PNN classifiers on speech emotion recognition field—A comparative study. In 2010 fifth international conference on digital telecommunications (ICDT) (pp. 1–6). Piscataway: IEEE.
Google Scholar
Iliou, T., & Anagnostopoulos, C. N. (2010b). Classification on speech emotion recognition-a comparative study. Animation, 4, 5.
Google Scholar
Iriondo, I., Planet, S., Alías, F., Socoró, J. C., & Martínez, E. (2007). Validation of an expressive speech corpus by mapping automatic classification to subjective evaluation. Computational and Ambient Intelligence, 646–653.
Ivanov, A., & Riccardi, G. (2012). Kolmogorov-Smirnov test for feature selection in emotion recognition from speech. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5125–5128). Piscataway: IEEE.
Chapter Google Scholar
Javidi, M. M., & Roshan, E. F. (2013). Speech emotion recognition by using combinations of C5. 0, neural network (NN), and support vector machines (SVM) classification methods. International Journal of Applied Mathematics and Computer Science, 6, 191–200.
Google Scholar
Jeon, J. H., Xia, R., & Liu, Y. (2011). Sentence level emotion recognition based on decisions from subsentence segments. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4940–4943). Piscataway: IEEE.
Chapter Google Scholar
Jiang, J., Wu, Z., Xu, M., Jia, J., & Cai, L. (2012). Comparison of adaptation methods for GMM-SVM based speech emotion recognition. In 2012 IEEE spoken language technology workshop (SLT) (pp. 269–273). Piscataway: IEEE.
Chapter Google Scholar
Jin, Q., Li, C., Chen, S., & Wu, H. (2015). Speech emotion recognition with acoustic and lexical features. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4749–4753). Piscataway: IEEE.
Chapter Google Scholar
Kamińska, D., & Pelikant, A. (2012). Recognition of human emotion from a speech signal based on Plutchik’s model. International Journal of Electronics and Telecommunications, 58(2), 165–170.
Article Google Scholar
Kandali, A. B., Routray, A., & Basu, T. K. (2008). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In TENCON 2008–2008 IEEE region 10 conference (pp. 1–5). Piscataway: IEEE.
Google Scholar
Khan, M., Goskula, T., Nasiruddin, M., & Quazi, R. (2011). Comparison between k-nn and svm method for speech emotion recognition. International Journal on Computer Science and Engineering, 3(2), 607–611.
Google Scholar
Khanna, P., & Kumar, M. S. (2011). Application of vector quantization in emotion recognition from human speech. In International conference on information intelligence, systems, technology and management (pp. 118–125). Berlin, Heidelberg: Springer.
Chapter Google Scholar
Kim, E. H., Hyun, K. H., Kim, S. H., & Kwak, Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. IEEE/ASME Transactions on Mechatronics, 14(3), 317–325.
Article Google Scholar
Kim, E. H., Hyun, K. H., & Kwak, Y. K. (2006). Improvement of emotion recognition from voice by separating of obstruents. In The 15th IEEE international symposium on robot and human interactive communication, 2006. ROMAN 2006. (pp. 564–568). Piscataway: IEEE.
Chapter Google Scholar
Kim, J. B., Park, J. S., & Oh, Y. H. (2011). On-line speaker adaptation based emotion recognition using incremental emotional information. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4948–4951). Piscataway: IEEE.
Chapter Google Scholar
Kim, J. B., Park, J. S., & Oh, Y. H. (2012). Speaker-characterized emotion recognition using online and iterative speaker adaptation. Cognitive Computation, 4(4), 398–408.
Article Google Scholar
Kim, S., Georgiou, P. G., Lee, S., & Narayanan, S. (2007). Real-time emotion detection system using speech: Multi-modal fusion of different timescale features. In IEEE 9th workshop on multimedia signal processing, 2007. MMSP 2007 (pp. 48–51).
Kishore, K. K., & Satish, P. K. (2013). Emotion recognition in speech using MFCC and wavelet features. In 2013 IEEE 3rd international advance computing conference (IACC) (pp. 842–847). Piscataway: IEEE.
Chapter Google Scholar
Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, Keele University 33.
Koolagudi, S. G., & Krothapalli, R. S. (2011). Two stage emotion recognition based on speaking rate. International Journal of Speech Technology, 14(1), 35–48.
Article Google Scholar
Koolagudi, S. G., & Krothapalli, S. R. (2012). Emotion recognition from speech using sub-syllabic and pitch synchronous spectral features. International Journal of Speech Technology, 15(4), 495–511.
Article Google Scholar
Koolagudi, S. G., & Rao, K. S. (2012). Emotion recognition from speech using source, system, and prosodic features. International Journal of Speech Technology, 15(2), 265–289.
Article Google Scholar
Koolagudi, S. G., Reddy, R., & Rao, K. S. (2010). Emotion recognition from speech signal using epoch parameters. In 2010 international conference on signal processing and communications (SPCOM) (pp. 1–5). Piscataway: IEEE.
Google Scholar
Kostoulas, T., Ganchev, T., Lazaridis, A., & Fakotakis, N. (2010) Enhancing Emotion recognition from speech through feature selection. In P. Sojka, A. Horák, I. Kopecek & K. Pala (Eds.) Text, speech and dialogue, lecture notes in artificial intelligence, Vol. 6231, pp. 338–344.
Kostoulas, T., Ganchev, T., Mporas, I., & Fakotakis, N. (2007) Detection of negative emotional states in real-world scenario. In Proceedings of 19th IEEE international conference on tools with artificial intelligence, pp 502–509.
Kotti, M., Paterno, F., & Kotropoulos, C. (2010). Speaker-independent negative emotion recognition. In 2010 2nd international workshop on cognitive information processing (CIP) (pp. 417–422). Piscataway: IEEE.
Le, D., Aldeneh, Z., & Provost, E. M. (2017). Discretized continuous speech emotion recognition with multi-task deep recurrent neural network. Interspeech, 2017.
Le, D., & Provost, E. M. (2013). Emotion recognition from spontaneous speech using hidden markov models with deep belief networks. In 2013 IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 216–221). Piscataway: IEEE.
Chapter Google Scholar
Lee, J., & Tashev, I. (2015). High-level feature representation using recurrent neural network for speech emotion recognition. In INTERSPEECH (pp. 1537–1540).
Lefter, I., Rothkrantz, L. J., Wiggers, P., & Van Leeuwen, D. A. (2010). Emotion recognition from speech by combining databases and fusion of classifiers. In Text, speech and dialogue (pp. 353–360). Berlin Heidelberg: Springer.
Chapter Google Scholar
Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., … Sahli, H. (2013). Hybrid deep neural network–hidden markov model (DNN-HMM) based speech emotion recognition. In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. 312–317). Piscataway: IEEE.
Chapter Google Scholar
Li, Y., Chao, L., Liu, Y., Bao, W., & Tao, J. (2015) From simulated speech to natural speech, what are the robust features for emotion recognition? In International conference on affective computing and intelligent interaction (ACII) (pp. 368–373). Piscataway: IEEE
Google Scholar
Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In Signal and information processing association annual summit and conference (APSIPA), 2016 Asia-Pacific (pp. 1–4). Piscataway: IEEE.
Google Scholar
Litman, D. J., & Forbes-Riley, K. (2006). Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Communication, 48(5), 559–590.
Article Google Scholar
Liu, J., Chen, C., Bu, J., You, M., & Tao, J. (2007). Speech emotion recognition based on a fusion of all-class and pairwise-class feature selection. Computational Science–ICCS 2007, 168–175.
Luengo, I., Navas, E., & Hernáez, I. (2010). Feature analysis and evaluation for automatic emotion identification in speech. IEEE Transactions on Multimedia, 12(6), 490–501.
Article Google Scholar
Lugger, M., Janoir, M. E., & Yang, B. (2009). Combining classifiers with diverse feature sets for robust speaker independent emotion recognition. In Signal processing conference, 2009 17th European (1225–1229). Piscataway: IEEE.
Google Scholar
Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007. (Vol. 4, pp. IV–17). Piscataway: IEEE.
Google Scholar
Lugger, M., & Yang, B. (2007). An incremental analysis of different feature groups in speaker independent emotion recognition. In 16th Int. congress of phonetic sciences.
Mannepalli, K., Sastry, P. N., & Suman, M. (2016). A novel adaptive fractional deep belief networks for speaker emotion recognition. Alexandria Engineering Journal.
Mao, Q., Xue, W., Rao, Q., Zhang, F., & Zhan, Y. (2016). Domain adaptation for speech emotion recognition by sharing priors between related source and target classes. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2608–2612). Piscataway: IEEE.
Chapter Google Scholar
Mao, X., Chen, L., & Fu, L. (2009). Multi-level speech emotion recognition based on HMM and ANN. In 2009 WRI World congress on computer science and information engineering (Vol. 7, pp. 225–229). Piscataway: IEEE.
Chapter Google Scholar
Mao, X., Zhang, B., & Luo, Y. (2007). Speech emotion recognition based on a hybrid of HMM/ANN. In Proceedings of the 7th conference on 7th WSEAS international conference on applied informatics and communications (Vol. 7, pp. 367–370).
Mencattini, A., Martinelli, E., Ringeval, F., Schuller, B., & Di Natlae, C. (2017). Continuous estimation of emotions in speech by dynamic cooperative speaker models. In IEEE transactions on affective computing.
Milton, A., Roy, S. S., & Selvi, S. T. (2013). Svm scheme for speech emotion recognition using mfcc feature. International Journal of Computer Applications, 69(9).
Milton, A., & Selvi, S. T. (2014). Class-specific multiple classifiers scheme to recognize emotions from speech signals. Computer Speech & Language, 28(3), 727–742.
Article Google Scholar
Mishra, H. K., & Sekhar, C. C. (2009). Variational Gaussian mixture models for speech emotion recognition. In Seventh international conference on advances in pattern recognition, 2009. ICAPR’09. (pp. 183–186). Piscataway: IEEE.
Chapter Google Scholar
Morales-Perez, M., Echeverry-Correa, J., Orozco-Gutierrez, A., & Castellanos-Dominguez, G. (2008). Feature extraction of speech signals in emotion identification. In Engineering in medicine and biology society, 2008. EMBS 2008. 30th annual international conference of the IEEE (pp. 2590–2593). Piscataway: IEEE.
Morrison, D., Wang, R., & De Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech communication, 49(2), 98–112.
Article Google Scholar
Navas, E., Hernáez, I., & Luengo, I. (2006). An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS. EEE Transactions on Audio, Speech and Language Processing 14, 1117–1127.
Article Google Scholar
Neiberg, D., & Elenius, K. (2008). Automatic recognition of anger in spontaneous speech. In 9th annual conference of the international speech communication association.
Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2(2), 92–105.
Article Google Scholar
Ntalampiras, S., & Fakotakis, N. (2012). Modeling the temporal evolution of acoustic parameters for speech emotion recognition. IEEE Transactions on Affective Computing, 3(1), 116–125.
Article Google Scholar
Pan, Y., Shen, P., & Shen, L. (2012). Speech emotion recognition using support vector machine. International Journal of Smart Home, 6(2), 101–108.
Google Scholar
Pao, T. L., Chien, C. S., Chen, Y. T., Yeh, J. H., Cheng, Y. M., & Liao, W. Y. (2007). Combination of multiple classifiers for improving emotion recognition in Mandarin speech. In 3rd international conference on intelligent information hiding and multimedia signal processing, 2007. IIHMSP 2007 (Vol. 1, pp. 35–38). Piscataway: IEEE.
Google Scholar
Pao, T. L., Wang, C. H., & Li, Y. J. (2012). A study on the search of the most discriminative speech features in the speaker dependent speech emotion recognition. In 2012 fifth international symposium on parallel architectures, algorithms and programming (PAAP) (pp. 157–162). Piscataway: IEEE.
Chapter Google Scholar
Pathak, S., & Kulkarni, A. (2011). Recognizing emotions from speech. In 2011 3rd international conference on electronics computer technology (ICECT) (Vol. 4, pp. 107–109). Piscataway: IEEE.
Chapter Google Scholar
Philippou-Hübner, D., Vlasenko, B., Böck, R., & Wendemuth, A. (2012). The performance of the speaking rate parameter in emotion recognition from speech. In 2012 IEEE international conference on multimedia and expo (ICME) (pp. 248–253). Piscataway: IEEE.
Chapter Google Scholar
Picard, R. W., & Picard, R. (1997). Affective computing (252). Cambridge: MIT press.
Google Scholar
Pierre-Yves, O. (2003). The production and recognition of emotions in speech: features and algorithms. International Journal of Human-Computer Studies, 59(1), 157–183.
Article Google Scholar
Planet, S., & Iriondo, I. (2012). Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition. In 2012 7th Iberian conference on information systems and technologies (CISTI) (pp. 1–6). Piscataway: IEEE.
Google Scholar
Plutchik, R. (1991). The emotions. Lanham, MD: University Press of America.
Google Scholar
Pohjalainen, J., Fabien Ringeval, F., Zhang, Z., & Schuller, B. (2016). Spectral and cepstral audio noise reduction techniques in speech emotion recognition. In Proceedings of the 2016 ACM on multimedia conference (pp. 670–674). ACM.
Polzehl, T., Schmitt, A., Metze, F., & Wagner, M. (2011). Anger recognition in speech using acoustic and linguistic cues. Speech Communication, 53(9), 1198–1209.
Article Google Scholar
Přibil, J., & Přibilová, A. (2013). Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 8.
Article Google Scholar
Rao, K. S., Koolagudi, S. G., & Vempada, R. R. (2013). Emotion recognition from speech using global and local prosodic features. International Journal of Speech Technology, 16(2), 143–160.
Article Google Scholar
Rao, K. S., Kumar, T. P., Anusha, K., Leela, B., Bhavana, I., & Gowtham, S. V. S. K. (2012). Emotion recognition from speech. International Journal of Computer Science and Information Technologies, 3(2), 3603–3607.
Google Scholar
Rehmam, B., Halim, Z., Abbas, G., & Muhammad, T. (2015). Artificial neural network-based speech recognition using Dwt analysis applied on isolated words from oriental languages. Malaysian Journal of Computer Science, 28(3), 242–262.
Article Google Scholar
Ringeval, F., & Chetouani, M. (2008). Exploiting a vowel based approach for acted emotion recognition. In Verbal and nonverbal features of human-human and human-machine interaction, pp. 243–254.
Rodríguez, P. H., Hernández, J. B. A., Ballester, M. A. F., González, C. M. T., & Orozco-Arroyave, J. R. (2013). Global selection of features for nonlinear dynamics characterization of emotional speech. Cognitive Computation, 5(4), 517–525.
Article Google Scholar
Rong, J., Li, G., & Chen, Y. P. P. (2009). Acoustic feature selection for automatic emotion recognition from speech. Information Processing & Management, 45(3), 315–328.
Article Google Scholar
Sagha, H., Deng, J., Gavryukova, M., Han, J., & Schuller, B. (2016). Cross lingual speech emotion recognition using canonical correlation analysis on principal component subspace. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5800–5804). Piscataway: IEEE.
Sánchez-Gutiérrez, M. E., Albornoz, E. M., Martinez-Licona, F., Rufiner, H. L., & Goddard, J. (2014). Deep learning for emotional speech recognition. In Mexican conference on pattern recognition (pp. 311–320). Cham: Springer International Publishing.
Google Scholar
Scherer, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, audio, image and biomedical signal processing using neural networks, pp. 49–70.
Scherer, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, audio, image and biomedical signal processing using neural networks (pp. 49–70). Berlin Heidelberg: Springer.
Chapter Google Scholar
Scherer, S., Schwenker, F., & Palm, G. (2008). Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, audio, image and biomedical signal processing using neural networks, 49–70.
Scherer, S., Schwenker, F., & Palm, G. (2009). Classifier fusion for emotion recognition from speech. In Advanced intelligent environments (pp. 95–117). Springer US.
Schmitt, M., Ringeval, F., & Schuller, B. W. (2016). At the border of acoustics and linguistics: bag-of-audio-words for the recognition of emotions in speech. In INTERSPEECH (pp. 495–499).
Schuller, B., Seppi, D., Batliner, A., Maier, A., & Steidl, S. (2007). Towards more reality in the recognition of emotional speech. In 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP’07 (Vol. 4, pp. IV–941). Piscataway: IEEE.
Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., & Rigoll, G. (2010). Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Transactions on Affective Computing, 1(2), 119–131.
Article Google Scholar
Schuller, B., Vlasenko, B., Minguez, R., Rigoll, G., & Wendemuth, A. (2007). Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. In IEEE workshop on automatic speech recognition & understanding, 2007. ASRU (pp. 596–600). Piscataway: IEEE.
Chapter Google Scholar
Schuller, B. W. (2008). Speaker, noise, and acoustic space adaptation for emotion recognition in the automotive environment. In 2008 ITG conference on voice communication (SprachKommunikation) (pp. 1–4). VDE.
Schwenker, F., Scherer, S., Magdi, Y. M., & Palm, G. (2009). The GMM-SVM supervector approach for the recognition of the emotional status from speech. In International conference on artificial neural networks (pp. 894–903). Berlin, Heidelberg: Springer.
Google Scholar
Sedaaghi, M. H., Kotropoulos, C., & Ververidis, D. (2007). Using adaptive genetic algorithms to improve speech emotion recognition. In IEEE 9th workshop on multimedia signal processing, 2007. MMSP 2007. (pp. 461–464). Piscataway: IEEE.
Chapter Google Scholar
Seehapoch, T., & Wongthanavasu, S. (2013). Speech emotion recognition using support vector machines. In 2013 5th international conference on knowledge and smart technology (KST) (pp. 86–91). Piscataway: IEEE.
Chapter Google Scholar
Ser, W., Cen, L., & Yu, Z. L. (2008). A hybrid PNN-GMM classification scheme for speech emotion recognition. In 19th international conference on pattern recognition, 2008. ICPR 2008 (pp. 1–4). Piscataway: IEEE.
Google Scholar
Sethu, V., Ambikairajah, E., & Epps, J. (2007). Speaker normalisation for speech-based emotion detection. In 2007 15th international conference on digital signal processing (pp. 611–614). Piscataway: IEEE.
Google Scholar
Sethu, V., Ambikairajah, E., & Epps, J. (2008a). Phonetic and speaker variations in automatic emotion classification. In 9th annual conference of the international speech communication association.
Sethu, V., Ambikairajah, E., & Epps, J. (2008b). Empirical mode decomposition based weighted frequency feature for speech-based emotion classification. In IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. (pp. 5017–5020). Piscataway: IEEE.
Chapter Google Scholar
Sethu, V., Ambikairajah, E., & Epps, J. (2009). Speaker dependency of spectral features and speech production cues for automatic emotion classification. In IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009. (pp. 4693–4696). Piscataway: IEEE.
Chapter Google Scholar
Sethu, V., Ambikairajah, E., & Epps, J. (2013). On the use of speech parameter contours for emotion recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1), 19.
Article Google Scholar
Shah, F. (2009). Automatic emotion recognition from speech using artificial neural networks with gender-dependent databases. In International conference on advances in computing, control, & telecommunication technologies, 2009. ACT’09. (pp. 162–164). Piscataway: IEEE.
Google Scholar
Shah, M., Miao, L., Chakrabarti, C., & Spanias, A. (2013). A speech emotion recognition framework based on latent Dirichlet allocation: Algorithm and FPGA implementation. In 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2553–2557). Piscataway: IEEE.
Chapter Google Scholar
Shami, M., & Verhelst, W. (2007). An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Communication, 49(3), 201–212.
Article Google Scholar
Shaukat, A., & Chen, K. (2011). Emotional state recognition from speech via soft-competition on different acoustic representations. In The 2011 international joint conference on neural networks (IJCNN) (pp. 1910–1917). Piscataway: IEEE.
Chapter Google Scholar
Shaw, A., Vardhan, R. K., & Saxena, S. (2016). Emotion recognition and classification in speech using Artificial neural networks. International Journal of Computer Applications, 145(8).
Sheikhan, M., Bejani, M., & Gharavian, D. (2013). Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications, 23(1), 215–227.
Article Google Scholar
Sheikhan, M., Gharavian, D., & Ashoftedel, F. (2012). Using DTW neural–based MFCC warping to improve emotional speech recognition. Neural Computing and Applications, 21(7), 1765–1773.
Article Google Scholar
Shen, P., Changjun, Z., & Chen, X. (2011). Automatic speech emotion recognition using support vector machine. In 2011 international conference on electronic and mechanical engineering and information technology (EMEIT) (Vol. 2, pp. 621–625). Piscataway: IEEE.
Chapter Google Scholar
Sidorov, M., Ultes, S., & Schmitt, A. (2014). Emotions are a personal thing: Towards speaker-adaptive emotion recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4803–4807). Piscataway: IEEE.
Chapter Google Scholar
Soltani, K., & Ainon, R. N. (2007). Speech emotion detection based on neural networks. In 9th international symposium on signal processing and its applications, 2007. ISSPA 2007. (pp. 1–3). Piscataway: IEEE.
Google Scholar
Song, P., Ou, S., Zheng, W., Jin, Y., & Zhao, L. (2016). Speech emotion recognition using transfer non-negative matrix factorization. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5180–5184). Piscataway: IEEE.
Chapter Google Scholar
Song, P., Zheng, W., Ou, S., Zhang, X., Jin, Y., Liu, J., & Yu, Y. (2016). Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization. Speech Communication, 83, 34–41.
Article Google Scholar
Steidl, S., Batliner, A., Nöth, E., & Hornegger, J. (2008). Quantification of segmentation and F0 errors and their effect on emotion recognition. In Text, speech and dialogue (pp. 525–534). Berlin/Heidelberg: Springer.
Chapter Google Scholar
Sun, Y., & Wen, G. (2015). Emotion recognition using semi-supervised feature selection with speaker normalization. International Journal of Speech Technology, 18(3), 317–331.
Article Google Scholar
Sun, Y., Wen, G., & Wang, J. (2015). Weighted spectral features based on local Hu moments for speech emotion recognition. Biomedical Signal Processing and Control, 18, 80–90.
Article Google Scholar
Sun, Y., Zhou, Y., Zhao, Q., & Yan, Y. (2009). Acoustic feature optimization for emotion affected speech recognition. In International conference on information engineering and computer science, 2009. ICIECS 2009. (pp. 1–4). Piscataway: IEEE.
Google Scholar
Swain, M., Sahoo, S., Routray, A., Kabisatpathy, P., & Kundu, J. N. (2015). Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition. International Journal of Speech Technology, 18(3), 387–393.
Article Google Scholar
Sztahó, D., Imre, V., & Vicsi, K. (2011). Automatic classification of emotions in spontaneous speech. Analysis of verbal and nonverbal communication and enactment. The Processing Issues, pp. 229–239.
Tabatabaei, T. S., Krishnan, S., & Guergachi, A. (2007). Emotion recognition using novel speech signal features. In IEEE international symposium on circuits and systems, 2007. ISCAS 2007 (pp. 345–348). Piscataway: IEEE.
Chapter Google Scholar
Tahon, M., & Devillers, L. (2015). Towards a small set of robust acoustic features for emotion recognition: IEEE/ACM transactions on challenges audio, speech, and language processing, 24(1), 16–28.
Tamulevicius, G., & Liogiene, T. (2015). Low-order multi-level features for speech emotions recognition. Baltic Journal of Modern Computing, 3(4), 234–247.
Google Scholar
Tarasov, A., & Delany, S. J. (2011). Benchmarking classification models for emotion recognition in natural speech: A multi-corporal study. In 2011 IEEE international conference on automatic face & gesture recognition and workshops (FG 2011) (pp. 841–846). Piscataway: IEEE.
Google Scholar
Ten Bosch, L. (2003). Emotions, speech and the ASR framework. Speech Communication, 40(1), 213–225.
Article MATH Google Scholar
Thapliyal, N., & Amoli, G. (2012). Speech based emotion recognition with gaussian mixture model. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 1(5), 65.
Google Scholar
Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M. A., Schuller, B., & Zafeiriou, S. (2016). Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 5200–5204). Piscataway: IEEE.
Chapter Google Scholar
Truong, K., & Van Leeuwen, D. (2007). An ‘open-set’detection evaluation methodology for automatic emotion recognition in speech. In Workshop on paralinguistic speech-between models and data (pp. 5–10).
Tseng, M., Hu, Y., Han, W. W., & Bergen, B. (2005). “Searching for happiness” or” Full of Joy”? Source domain activation matters. In annual meeting of the Berkeley linguistics society (Vol. 31, No. 1, pp. 359–370).
Utane, A. S., & Nalbalwar, S. L. (2013). Emotion recognition through speech using gaussian mixture model and support vector machine. Emotion, 2, 8.
Google Scholar
Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181.
Article Google Scholar
Vlasenko, B., Philippou-Hübner, D., Prylipko, D., Böck, R., Siegert, I., & Wendemuth, A. (2011a). Vowels formants analysis allows straightforward detection of high arousal emotions. In 2011 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). Piscataway: IEEE.
Vlasenko, B., Prylipko, D., Philippou-Hübner, D., & Wendemuth, A. (2011b). Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions. In 12th annual conference of the international speech communication association.
Vlasenko, B., Schuller, B., Wendemut, A., & Rigoll, G. (2007) Frame vs Turn-level: emotion recognition from speech considering static and dynamic processing. In Proceedings 2nd international conference on affective computing and intelligent interaction, pp 139–147.
Vogt, T., & André, E. (2006). Improving automatic emotion recognition from speech via gender differentiation. In Proceeding language resources and evaluation conference (LREC 2006), Genoa.
Vogt, T., & André, E. (2009). Exploring the benefits of discretization of acoustic features for speech emotion recognition. In 10th annual conference of the international speech communication association.
Vogt, T., & André, E. (2011). An evaluation of emotion units and feature types for real-time speech emotion recognition. KI-Künstliche Intelligenz, 25(3), 213–223.
Article Google Scholar
Vondra, M., & Vích, R. (2009). Evaluation of speech emotion classification based on GMM and data fusion. In Cross-modal analysis of speech, gestures, gaze and facial expressions, pp. 98–105.
Wagner, J., Vogt, T., & André, E. (2007). A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In international conference on affective computing and intelligent interaction (pp. 114–125). Springer, Berlin, Heidelberg.
Wang, F., Verhelst, W., & Sahli, H. (2011). Relevance vector machine based speech emotion recognition. In Affective computing and intelligent interaction, pp. 111–120.
Weninger, F., Ringeval, F., Marchi, E., & Schuller, B. W. (2016). Discriminatively trained recurrent neural networks for continuous dimensional emotion recognition from audio. In IJCAI (pp. 2196–2202).
Wenjing, H., Haifeng, L., & Chunyu, G. (2009). A hybrid speech emotion perception method of VQ-based feature processing and ANN recognition. In WRI global congress on intelligent systems, 2009. GCIS’09. (Vol. 2, pp. 145–149). Piscataway: IEEE.
Chapter Google Scholar
Womack, B. D., & Hansen, J. H. (1999). N-channel hidden Markov models for combined stressed speech classification and recognition. IEEE Transactions on Speech and Audio Processing, 7(6), 668–677.
Article Google Scholar
Wu, C. H., & Liang, W. B. (2011). Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Transaction Affective Computing, 2, 10–21.
Article Google Scholar
Wu, S., Falk, T. H., & Chan, W. Y. (2009). Automatic recognition of speech emotion using long-term spectro-temporal features. In 2009 16th international conference on digital signal processing (pp. 1–6). Piscataway: IEEE.
Wu, S., Falk, T. H., & Chan, W. Y. (2011). Automatic speech emotion recognition using modulation spectral features. Speech Communication, 53(5), 768–785.
Article Google Scholar
Wu, T., Yang, Y., Wu, Z., & Li, D. (2006). MASC: A speech corpus in mandarin for emotion analysis and affective speaker recognition. In Speaker and language recognition workshop, 2006. IEEE Odyssey 2006 (pp. 1–5). Piscataway: IEEE.
Google Scholar
Xiao, Z., Dellandréa, E., Chen, L., & Dou, W. (2009). Recognition of emotions in speech by a hierarchical approach. In 3rd international conference on affective computing and intelligent interaction and workshops, 2009. ACII 2009. (pp. 1–8). Piscataway: IEEE.
Google Scholar
Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2006). Two-stage classification of emotional speech. In international conference on digital telecommunications, 2006. ICDT’06. (pp. 32–32). Piscataway: IEEE.
Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2007, December). Automatic hierarchical classification of emotional speech. In 9th IEEE international symposium on multimedia workshops, 2007. ISMW’07. (pp. 291–296). Piscataway: IEEE.
Xiao, Z., Dellandrea, E., Dou, W., & Chen, L. (2007). Hierarchical classification of emotional speech. IEEE Transactions on Multimedia, 37.
Yang, B., & Lugger, M. (2010). Emotion recognition from speech signals using new harmony features. Signal Processing, 90(5), 1415–1423.
Article MATH Google Scholar
Yang, N., Muraleedharan, R., Kohl, J., Demirkol, I., Heinzelman, W., & Sturge-Apple, M. (2012). Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion. In Spoken Language Technology Workshop (SLT), 2012 IEEE (pp. 455–460). Piscataway: IEEE.
Chapter Google Scholar
Ye, C., Liu, J., Chen, C., Song, M., & Bu, J. (2008). Speech emotion classification on a Riemannian manifold. In Advances in multimedia information processing-PCM 2008, pp. 61–69.
Yeh, J. H., Pao, T. L., Lin, C. Y., Tsai, Y. W., & Chen, Y. T. (2011). Segment-based emotion recognition from continuous Mandarin Chinese speech. Computers in Human Behavior, 27(5), 1545–1552.
Article Google Scholar
You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2006). A hierarchical framework for speech emotion recognition. In 2006 IEEE international symposium on industrial electronics (Vol. 1, pp. 515–519). Piscataway: IEEE.
Chapter Google Scholar
You, M., Chen, C., Bu, J., Liu, J., & Tao, J. (2006). Emotional speech analysis on nonlinear manifold. In 18th international conference on pattern recognition, 2006. ICPR 2006. (Vol. 3, pp. 91–94). Piscataway: IEEE.
Google Scholar
Yun, S., & Yoo, C. D. (2012). Loss-scaled large-margin Gaussian mixture models for speech emotion classification. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 585–598.
Article Google Scholar
Yüncü, E., Hacihabiboglu, H., & Bozsahin, C. (2014). Automatic speech emotion recognition using auditory models with binary decision tree and svm. In 2014 22nd international conference on pattern recognition (ICPR) (pp. 773–778). Piscataway: IEEE.
Zbancioc, M., & Feraru, S. M. (2012). Emotion recognition of the SROL Romanian database using fuzzy KNN algorithm. In 10th international symposium on electronics and telecommunications (ISETC), 2012 (pp. 347–350). Piscataway: IEEE.
Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 39–58.
Article Google Scholar
Zha, C., Yang, P., Zhang, X., & Zhao, L. (2016). Spontaneous speech emotion recognition via multiple kernel learning. In 2016 eighth international conference on measuring technology and mechatronics automation (ICMTMA) (pp. 621–623). Piscataway: IEEE.
Zhang, S., Lei, B., Chen, A., Chen, C., & Chen, Y. (2010). Spoken emotion recognition using local fisher discriminant analysis. In 10th international conference on signal processing (ICSP), 2010 IEEE (pp. 538–540). Piscataway: IEEE.
Zhang, S., & Zhao, Z. (2008). Feature selection filtering methods for emotion recognition in Chinese speech signal. In 9th international conference on signal processing, 2008. ICSP 2008. (pp. 1699–1702). Piscataway: IEEE.
Zheng, W. Q., Yu, J. S., & Zou, Y. X. (2015). An experimental study of speech emotion recognition based on deep convolutional neural networks. In 2015 international conference on affective computing and intelligent interaction (ACII) (pp. 827–831). Piscataway: IEEE.
Zhou, J., Wang, G., Yang, Y., & Chen, P. (2006). Speech emotion recognition based on rough set and SVM. In 5th IEEE international conference on cognitive informatics, 2006. ICCI 2006. (Vol. 1, pp. 53–61). Piscataway: IEEE.
Zhou, Y., Sun, Y., Yang, L., & Yan, Y. (2009). Applying articulatory features to speech emotion recognition. In international conference on research challenges in computer science, 2009. ICRCCS’09. (pp. 73–76). Piscataway: IEEE.
Zhu, L., Chen, L., Zhao, D., Zhou, J., & Zhang, W. (2017). Emotion recognition from chinese speech for smart affective services using a combination of SVM and DBN. Sensors, 17(7), 1694.
Article Google Scholar
Zong, Y., Zheng, W., Zhang, T., & Huang, X. (2016). Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Processing Letters, 23(5), 585–589.
Article Google Scholar

Download references

Funding

The funding was provided by University of Malaya Research Grant (AFR (Frontier Science)) (Grant Number: RG284-14AFR), Postgraduate Research Grant (PPP) (Grant Number: PG220-2014B).

Author information

Authors and Affiliations

Department of Software Engineering, Faculty of Computer Science and Information Technology, University of Malaya, 50603, Kuala Lumpur, Malaysia
Mumtaz Begum Mustafa
Faculty of Business Finance and Hospitality, Mahsa University, Selangor, Malaysia
Mansoor A. M. Yusoof
Department of Operation and Management Information System, Faculty of Business and Accountancy, University Malaya, 50603, Kuala Lumpur, Malaysia
Mansoor A. M. Yusoof
Department of English Language, Faculty of Languages and Linguistics, University of Malaya, 50603, Kuala Lumpur, Malaysia
Zuraidah M. Don
Department of Software Engineering, Faculty of Computer Science and Information Technology, University of Malaya, 50603, Kuala Lumpur, Malaysia
Mehdi Malekzadeh

Authors

Mumtaz Begum Mustafa
View author publications
You can also search for this author in PubMed Google Scholar
Mansoor A. M. Yusoof
View author publications
You can also search for this author in PubMed Google Scholar
Zuraidah M. Don
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Malekzadeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mumtaz Begum Mustafa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mustafa, M.B., Yusoof, M.A.M., Don, Z.M. et al. Speech emotion recognition research: an analysis of research focus. Int J Speech Technol 21, 137–156 (2018). https://doi.org/10.1007/s10772-018-9493-x

Download citation

Received: 10 March 2017
Accepted: 11 January 2018
Published: 22 January 2018
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10772-018-9493-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech emotion recognition research: an analysis of research focus

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Transformer models for text-based emotion detection: a review of BERT-based approaches

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech emotion recognition research: an analysis of research focus

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Transformer models for text-based emotion detection: a review of BERT-based approaches

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation