Abstract
As humans, speech is the common as well as a natural way of expressing ourselves. Speech Emotion Recognition (SER) systems can be defined as an assortment of methods processes and classifies speech signals for the detection of associated emotions. Automatic emotion recognition is the technique of identification of human emotions from various signals like speech, facial expression and text. Collection of such signals and labelling them is often tiresome and needs proficient knowledge. This paper deals with the different types of open source speech emotion datasets of various languages and recent literature survey in the area of speech emotion recognition that employs a number of machine learning approaches with an objective of enhancing the classification accuracy. The paper prudently aims at identifying and synthesizing contemporary pertinent literature associated to the SER systems with different methodologies or design components, thus providing the researchers with an up-to-date understanding of the research topic in the field of SER.
Similar content being viewed by others
Data Availability
The data used to support the finding of this study are included within the article.
References
Aouani, H., & Ayed, Y. B. (2020). Speech emotion Recognition with Deep Learning. Procedia Computer Science, 176, 251–260. https://doi.org/10.1016/j.procs.2020.08.027.
Cheng, H., & Tang, X. (2020). Speech Emotion Recognition based on Interactive Convolutional Neural Network (2020). In IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP), pp. 163–167. https://doi.org/10.1109/ICICSP50920.2020.9232071.
Cornejo, J. Y. R., & Pedrini, H. (2019). Audio-Visual Emotion Recognition Using a Hybrid Deep Convolutional Neural Network based on Census Transform. In IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 3396–3402. https://doi.org/10.1109/SMC.2019.8914193.
Qadri, S. A. A., Gunawan, T. S., Wani, T. M., Ambikairajah, E., Kartiwi, M., & Ihsanto, E. (2021). Speech emotion Recognition using deep neural networks on multilingual databases. In J. A. Mat Jizat, et al. (Eds.), Advances in Robotics, automation and data analytics. iCITES 2020 (vol. 1350). Advances in Intelligent Systems and Computing. Cham: Springer. https://doi.org/10.1007/978-3-030-70917-4_3.
Abo absa, A. H., Deriche, M., & Mohandes, M. (2018). A Bilingual Emotion Recognition System Using Deep Learning Neural Networks. In 15th International Multi-Conference on Systems, Signals & Devices (SSD), pp. 1241–1245, https://doi.org/10.1109/SSD.2018.8570407.
Hasan, H. M. M., & Islam, M. A. (2020). Emotion Recognition from Bengali Speech using RNN Modulation-based Categorization. In Third International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 1131–1136, https://doi.org/10.1109/ICSSIT48917.2020.9214196.
Cai, L., Dong, J., & Wei, M. (2020). Multi-Modal Emotion Recognition From Speech and Facial Expression Based on Deep Learning. In Chinese Automation Congress (CAC), pp. 5726–5729, https://doi.org/10.1109/CAC51589.2020.9327178.
Bharti, D., & Kukana, P. (2020). A Hybrid Machine Learning Model for Emotion Recognition from Speech Signals. In International Conference on Smart Electronics and Communication (ICOSEC), pp. 491–496, https://doi.org/10.1109/ICOSEC49089.2020.9215376.
Dangol, R., Alsadoon, A., Prasad, P. W. C., et al. (2020). Speech emotion Recognition using convolutional neural network and long-short TermMemory. Multimed Tools Appl, 79, 32917–32934. https://doi.org/10.1007/s11042-020-09693-w.
Tang, D., Kuppens, P., Geurts, L. (2021). End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network. J Audio Speech Music Proc18 (2021), https://doi.org/10.1186/s13636-021-00208-5.
Huilian, L., Weiping, H., & Wang, Y. (2020). Speech Emotion Recognition Based on BLSTM and CNN Feature Fusion. In Proceedings of the 2020 4th International Conference on Digital Signal Processing (ICDSP 2020), Association for Computing Machinery, New York, NY, USA, 169–172. https://doi.org/10.1145/3408127.3408192
Meng, H., Yan, T., Yuan, F., & Wei, H. (2019). Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network, IEEE Access, 7, 125868–125881. https://doi.org/10.1109/ACCESS.2019.2938007.
Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1D & 2D CNN LSTM Networks. Biomedical Signal Processing and Control, 47, 312–323. https://doi.org/10.1016/j.bspc.2018.08.035.
Jiang, P., Fu, H., Tao, H., Lei, P., & Zhao, L. (2019). Parallelized Convolutional Recurrent Neural Network With Spectral Features for Speech Emotion Recognition. IEEE Access, 7, 90368–90377, https://doi.org/110.1109/ACCESS.2019.2927384.
Anvarjon, T., Mustaqeem, & Kwon, S. (2020). Deep-net: a lightweight CNN-based speech emotion recognition system using deep frequency features. Sensors (Basel, Switzerland), 20(18), 5212. https://doi.org/10.3390/s20185212.
Basavaiah, J., & Arlene Anthony, A. (2020). Tomato Leaf Disease classification using multiple feature extraction techniques. Wireless Personal Communications, 115, 633–651. https://doi.org/10.1007/s11277-020-07590-x.
Funding
This work is not funded by any governmental or non-governmental funding agencies.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Anthony, A.A., Patil, C.M. Speech Emotion Recognition Systems: A Comprehensive Review on Different Methodologies. Wireless Pers Commun 130, 515–525 (2023). https://doi.org/10.1007/s11277-023-10296-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-023-10296-5