Skip to main content

Emotion Recognition in Speech Using Convolutional Neural Networks

  • Conference paper
  • First Online:
Data Intelligence and Cognitive Informatics

Part of the book series: Algorithms for Intelligent Systems ((AIS))

  • 732 Accesses

Abstract

This paper aims to implement and analyse the performance of Convolutional Neural Networks (CNNs) in detecting and labelling emotion in speech based on the features used to describe the speech. CNNs are often associated with natural language processing, and this paper compares the results of a CNN model on two datasets with the speech in different languages. This paper thus presents the suitability of CNNs as language-agnostic speech-based emotion recognition models, along with the accuracies obtained using different feature sets, with other varying hyperparameters like the batch size. The emotions considered are happiness, sadness, anger, fear and neutrality. The features experimented with are Mel-frequency Cepstrum Coefficient (MFCC), pitch and the log of filterbank energy (LFBE). The datasets in consideration are the ‘Indian Institute of Technology Kharagpur (IIT-KGP)’ Simulated Emotion Hindi Speech Corpus (SEHSC), as well as the Berlin Database of Emotional Speech. Improving speech-based emotion recognition systems would enable them to complement other visual and textual systems to perfectly understand the emotional state of people. This could be highly useful in advertising, reading review sentiment and in the analysis of interviews, speeches and even in the mental-healthcare industry.

Aarya Arun, Indu Rallabhandi, Swathi and Ananya Nair should be considered co-first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Smys S, Basar A, Wang H (2020) Artificial neural network based power management for smart street lighting systems. J Artif Intell 2(01):42–52

    Google Scholar 

  2. Chen JIZ, Smys S (2020) Social multimedia security and suspicious activity detection in SDN using hybrid deep learning technique. J Inf Technol 2(02):108–115

    Google Scholar 

  3. Tripathi S, Kumar A, Ramesh A, Singh C, Yenigalla P (2019) Deep learning based emotion recognition system using speech features and transcriptions. arXiv:1906.05681

  4. Huang CC, Gong W, Fu WL, Feng DY (2014) A research of speech emotion recognition based on deep belief network and SVM. Math Probl Eng

    Google Scholar 

  5. Akçay MB, Oğz K (2020) Speech emotion recognition: emotional models, databases,features, preprocessing methods, supporting modalities, and classifiers. Speech Commun 116:56–76. https://doi.org/10.1016/j.specom.2019.12.001http://www.sciencedirect.com/scien ce/article/pii/S0167639319302262

  6. Rawat A, Mishra PK (2015) Emotion recognition through speech using neural network. Int J Adv Res Comput Sci Softw Eng, 422–428

    Google Scholar 

  7. Davletcharova A, Sugathan S, Abraham B, James AP (2015) Detection and analysis of emotion from speech signals. Proc Comput Sci 58:91–96. ISSN 1877-0509

    Google Scholar 

  8. Poosarala AR (2020) Survey of transfer learning and a case study of emotion recognition using inductive approach. In: Sharma N, Chakrabarti A, Balas VE, Martinovic J (eds) Data management, analytics and innovation. Advances in intelligent systems and computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_9

  9. Krishnan PT, Alex Noel JR, Rajangam V (2021) Emotion classification from speech signal based on empirical mode decomposition and non-linear features. Complex Intell Syst. https://doi.org/10.1007/s40747-021-00295-z

    Article  Google Scholar 

  10. Lakomkin E et al (2017) Reusing neural speech representations for auditory emotion recognition. ArXiv abs/1803.11508, n. pag

    Google Scholar 

  11. Abdul Qayyum AB, Arefeen A, Shahnaz C (2019) Convolutional Neural Network (CNN) based speech-emotion recognition. In: 2019 IEEE international conference on signal processing, information, communication systems (SPICSCON), pp.122–125. https://doi.org/10.1109/SPICSCON48833.2019.9065172

  12. Satt A, Rozenberg S, Hoory R (2017) Efficient emotion recognition from speech using deep learning on spectrograms. Proc Interspeech 2017:1089–1093. https://doi.org/10.21437/Interspeech.2017-200

    Article  Google Scholar 

  13. Kwon O-W, Chan K, Hao J, Lee T-W (2003) Emotion recognition by speech signals. In: EUROSPEECH-2003, pp 125-128

    Google Scholar 

  14. Bozkurt E, Erzin E, Erdem Ç, Erdem AT (2009) Improving automatic emotion recognition from speech signals. In: INTERSPEECH-2009, pp 324–327

    Google Scholar 

  15. Chuang Z-J, Wu C-H (2004) Multi-modal emotion recognition from speech and text. Int J Comput Ling Chin Lang Process 9:1–18

    Google Scholar 

  16. Paliwal KK (1999) On the use of filter-bank energies as features for robust speech recognition. In: ISSPA ’99 proceedings of the fifth international symposium on signal processing and its applications (IEEE Cat. No.99EX359), Brisbane, QLD, Australia, pp 641–644, vol 2. https://doi.org/10.1109/ISSPA.1999.815754

  17. Izard CE (2007) Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspect Psychol Sci 2:260–280

    Article  Google Scholar 

  18. Koolagudi SG, Reddy R, Yadav J, Rao KS (2011) IITKGP-SEHSC: Hindi speech corpus for emotion analysis. In: 2011 international conference on devices and communications (ICDeCom), Mesra, pp 1–5. https://doi.org/10.1109/ICDECOM.2011.5738540

  19. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In Proceedings of interspeech 2005, Lissabon, Portugal, pp 1517–1520

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arun, A., Rallabhandi, I., Swathi, Nair, A., Jayashree, R. (2022). Emotion Recognition in Speech Using Convolutional Neural Networks. In: Jacob, I.J., Kolandapalayam Shanmugam, S., Bestak, R. (eds) Data Intelligence and Cognitive Informatics. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-16-6460-1_9

Download citation

Publish with us

Policies and ethics