Abstract
This paper aims to implement and analyse the performance of Convolutional Neural Networks (CNNs) in detecting and labelling emotion in speech based on the features used to describe the speech. CNNs are often associated with natural language processing, and this paper compares the results of a CNN model on two datasets with the speech in different languages. This paper thus presents the suitability of CNNs as language-agnostic speech-based emotion recognition models, along with the accuracies obtained using different feature sets, with other varying hyperparameters like the batch size. The emotions considered are happiness, sadness, anger, fear and neutrality. The features experimented with are Mel-frequency Cepstrum Coefficient (MFCC), pitch and the log of filterbank energy (LFBE). The datasets in consideration are the ‘Indian Institute of Technology Kharagpur (IIT-KGP)’ Simulated Emotion Hindi Speech Corpus (SEHSC), as well as the Berlin Database of Emotional Speech. Improving speech-based emotion recognition systems would enable them to complement other visual and textual systems to perfectly understand the emotional state of people. This could be highly useful in advertising, reading review sentiment and in the analysis of interviews, speeches and even in the mental-healthcare industry.
Aarya Arun, Indu Rallabhandi, Swathi and Ananya Nair should be considered co-first authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Smys S, Basar A, Wang H (2020) Artificial neural network based power management for smart street lighting systems. J Artif Intell 2(01):42–52
Chen JIZ, Smys S (2020) Social multimedia security and suspicious activity detection in SDN using hybrid deep learning technique. J Inf Technol 2(02):108–115
Tripathi S, Kumar A, Ramesh A, Singh C, Yenigalla P (2019) Deep learning based emotion recognition system using speech features and transcriptions. arXiv:1906.05681
Huang CC, Gong W, Fu WL, Feng DY (2014) A research of speech emotion recognition based on deep belief network and SVM. Math Probl Eng
Akçay MB, Oğz K (2020) Speech emotion recognition: emotional models, databases,features, preprocessing methods, supporting modalities, and classifiers. Speech Commun 116:56–76. https://doi.org/10.1016/j.specom.2019.12.001http://www.sciencedirect.com/scien ce/article/pii/S0167639319302262
Rawat A, Mishra PK (2015) Emotion recognition through speech using neural network. Int J Adv Res Comput Sci Softw Eng, 422–428
Davletcharova A, Sugathan S, Abraham B, James AP (2015) Detection and analysis of emotion from speech signals. Proc Comput Sci 58:91–96. ISSN 1877-0509
Poosarala AR (2020) Survey of transfer learning and a case study of emotion recognition using inductive approach. In: Sharma N, Chakrabarti A, Balas VE, Martinovic J (eds) Data management, analytics and innovation. Advances in intelligent systems and computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_9
Krishnan PT, Alex Noel JR, Rajangam V (2021) Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell Syst. https://doi.org/10.1007/s40747-021-00295-z
Lakomkin E et al (2017) Reusing neural speech representations for auditory emotion recognition. ArXiv abs/1803.11508, n. pag
Abdul Qayyum AB, Arefeen A, Shahnaz C (2019) Convolutional Neural Network (CNN) based speech-emotion recognition. In: 2019 IEEE international conference on signal processing, information, communication systems (SPICSCON), pp.122–125. https://doi.org/10.1109/SPICSCON48833.2019.9065172
Satt A, Rozenberg S, Hoory R (2017) Efficient emotion recognition from speech using deep learning on spectrograms. Proc Interspeech 2017:1089–1093. https://doi.org/10.21437/Interspeech.2017-200
Kwon O-W, Chan K, Hao J, Lee T-W (2003) Emotion recognition by speech signals. In: EUROSPEECH-2003, pp 125-128
Bozkurt E, Erzin E, Erdem Ç, Erdem AT (2009) Improving automatic emotion recognition from speech signals. In: INTERSPEECH-2009, pp 324–327
Chuang Z-J, Wu C-H (2004) Multi-modal emotion recognition from speech and text. Int J Comput Ling Chin Lang Process 9:1–18
Paliwal KK (1999) On the use of filter-bank energies as features for robust speech recognition. In: ISSPA ’99 proceedings of the fifth international symposium on signal processing and its applications (IEEE Cat. No.99EX359), Brisbane, QLD, Australia, pp 641–644, vol 2. https://doi.org/10.1109/ISSPA.1999.815754
Izard CE (2007) Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspect Psychol Sci 2:260–280
Koolagudi SG, Reddy R, Yadav J, Rao KS (2011) IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In: 2011 international conference on devices and communications (ICDeCom), Mesra, pp 1–5. https://doi.org/10.1109/ICDECOM.2011.5738540
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In Proceedings of interspeech 2005, Lissabon, Portugal, pp 1517–1520
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Arun, A., Rallabhandi, I., Swathi, Nair, A., Jayashree, R. (2022). Emotion Recognition in Speech Using Convolutional Neural Networks. In: Jacob, I.J., Kolandapalayam Shanmugam, S., Bestak, R. (eds) Data Intelligence and Cognitive Informatics. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-16-6460-1_9
Download citation
DOI: https://doi.org/10.1007/978-981-16-6460-1_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6459-5
Online ISBN: 978-981-16-6460-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)