Emotion Recognition in Speech Using Convolutional Neural Networks

Arun, Aarya; Rallabhandi, Indu; Swathi; Nair, Ananya; Jayashree, R.

doi:10.1007/978-981-16-6460-1_9

Aarya Arun⁷,
Indu Rallabhandi⁷,
Swathi⁷,
Ananya Nair⁷ &
…
R. Jayashree⁷

Part of the book series: Algorithms for Intelligent Systems ((AIS))

732 Accesses

Abstract

This paper aims to implement and analyse the performance of Convolutional Neural Networks (CNNs) in detecting and labelling emotion in speech based on the features used to describe the speech. CNNs are often associated with natural language processing, and this paper compares the results of a CNN model on two datasets with the speech in different languages. This paper thus presents the suitability of CNNs as language-agnostic speech-based emotion recognition models, along with the accuracies obtained using different feature sets, with other varying hyperparameters like the batch size. The emotions considered are happiness, sadness, anger, fear and neutrality. The features experimented with are Mel-frequency Cepstrum Coefficient (MFCC), pitch and the log of filterbank energy (LFBE). The datasets in consideration are the ‘Indian Institute of Technology Kharagpur (IIT-KGP)’ Simulated Emotion Hindi Speech Corpus (SEHSC), as well as the Berlin Database of Emotional Speech. Improving speech-based emotion recognition systems would enable them to complement other visual and textual systems to perfectly understand the emotional state of people. This could be highly useful in advertising, reading review sentiment and in the analysis of interviews, speeches and even in the mental-healthcare industry.

Aarya Arun, Indu Rallabhandi, Swathi and Ananya Nair should be considered co-first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Emotion Recognition from Speech Using Convolutional Neural Networks

A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora

Article 21 February 2023

Speech Emotion Recognition Using Deep Learning

References

Smys S, Basar A, Wang H (2020) Artificial neural network based power management for smart street lighting systems. J Artif Intell 2(01):42–52
Google Scholar
Chen JIZ, Smys S (2020) Social multimedia security and suspicious activity detection in SDN using hybrid deep learning technique. J Inf Technol 2(02):108–115
Google Scholar
Tripathi S, Kumar A, Ramesh A, Singh C, Yenigalla P (2019) Deep learning based emotion recognition system using speech features and transcriptions. arXiv:1906.05681
Huang CC, Gong W, Fu WL, Feng DY (2014) A research of speech emotion recognition based on deep belief network and SVM. Math Probl Eng
Google Scholar
Akçay MB, Oğz K (2020) Speech emotion recognition: emotional models, databases,features, preprocessing methods, supporting modalities, and classifiers. Speech Commun 116:56–76. https://doi.org/10.1016/j.specom.2019.12.001 http://www.sciencedirect.com/scien ce/article/pii/S0167639319302262
Rawat A, Mishra PK (2015) Emotion recognition through speech using neural network. Int J Adv Res Comput Sci Softw Eng, 422–428
Google Scholar
Davletcharova A, Sugathan S, Abraham B, James AP (2015) Detection and analysis of emotion from speech signals. Proc Comput Sci 58:91–96. ISSN 1877-0509
Google Scholar
Poosarala AR (2020) Survey of transfer learning and a case study of emotion recognition using inductive approach. In: Sharma N, Chakrabarti A, Balas VE, Martinovic J (eds) Data management, analytics and innovation. Advances in intelligent systems and computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_9
Krishnan PT, Alex Noel JR, Rajangam V (2021) Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell Syst. https://doi.org/10.1007/s40747-021-00295-z
Article Google Scholar
Lakomkin E et al (2017) Reusing neural speech representations for auditory emotion recognition. ArXiv abs/1803.11508, n. pag
Google Scholar
Abdul Qayyum AB, Arefeen A, Shahnaz C (2019) Convolutional Neural Network (CNN) based speech-emotion recognition. In: 2019 IEEE international conference on signal processing, information, communication systems (SPICSCON), pp.122–125. https://doi.org/10.1109/SPICSCON48833.2019.9065172
Satt A, Rozenberg S, Hoory R (2017) Efficient emotion recognition from speech using deep learning on spectrograms. Proc Interspeech 2017:1089–1093. https://doi.org/10.21437/Interspeech.2017-200
Article Google Scholar
Kwon O-W, Chan K, Hao J, Lee T-W (2003) Emotion recognition by speech signals. In: EUROSPEECH-2003, pp 125-128
Google Scholar
Bozkurt E, Erzin E, Erdem Ç, Erdem AT (2009) Improving automatic emotion recognition from speech signals. In: INTERSPEECH-2009, pp 324–327
Google Scholar
Chuang Z-J, Wu C-H (2004) Multi-modal emotion recognition from speech and text. Int J Comput Ling Chin Lang Process 9:1–18
Google Scholar
Paliwal KK (1999) On the use of filter-bank energies as features for robust speech recognition. In: ISSPA ’99 proceedings of the fifth international symposium on signal processing and its applications (IEEE Cat. No.99EX359), Brisbane, QLD, Australia, pp 641–644, vol 2. https://doi.org/10.1109/ISSPA.1999.815754
Izard CE (2007) Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspect Psychol Sci 2:260–280
Article Google Scholar
Koolagudi SG, Reddy R, Yadav J, Rao KS (2011) IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In: 2011 international conference on devices and communications (ICDeCom), Mesra, pp 1–5. https://doi.org/10.1109/ICDECOM.2011.5738540
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In Proceedings of interspeech 2005, Lissabon, Portugal, pp 1517–1520
Google Scholar

Download references

Author information

Authors and Affiliations

PES University, Bangalore, India
Aarya Arun, Indu Rallabhandi, Swathi, Ananya Nair & R. Jayashree

Authors

Aarya Arun
View author publications
You can also search for this author in PubMed Google Scholar
Indu Rallabhandi
View author publications
You can also search for this author in PubMed Google Scholar
Swathi
View author publications
You can also search for this author in PubMed Google Scholar
Ananya Nair
View author publications
You can also search for this author in PubMed Google Scholar
R. Jayashree
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, GITAM University, Bangalore, India
I. Jeena Jacob
Assistant Professor of Computer Science, Department of Mathematics and Computer Science, Ashland University, Ashland, OH, USA
Selvanayaki Kolandapalayam Shanmugam
Department of Telecommunication Engineering, Czech Technical University in Prague, Prague, Czech Republic
Robert Bestak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arun, A., Rallabhandi, I., Swathi, Nair, A., Jayashree, R. (2022). Emotion Recognition in Speech Using Convolutional Neural Networks. In: Jacob, I.J., Kolandapalayam Shanmugam, S., Bestak, R. (eds) Data Intelligence and Cognitive Informatics. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-16-6460-1_9

Download citation

DOI: https://doi.org/10.1007/978-981-16-6460-1_9
Published: 01 February 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6459-5
Online ISBN: 978-981-16-6460-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Emotion Recognition in Speech Using Convolutional Neural Networks

Abstract

Access this chapter

Similar content being viewed by others

Emotion Recognition from Speech Using Convolutional Neural Networks

A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora

Speech Emotion Recognition Using Deep Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Emotion Recognition in Speech Using Convolutional Neural Networks

Abstract

Access this chapter

Similar content being viewed by others

Emotion Recognition from Speech Using Convolutional Neural Networks

A lightweight 2D CNN based approach for speaker-independent emotion recognition from speech with new Indian Emotional Speech Corpora

Speech Emotion Recognition Using Deep Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation