Using Deep Learning to Recognize Emotions Through Speech Analysis

Mitra, Arion; Biswas, Ankita; Ghosh, Ananya; Ghosh, Ahona; Majumdar, Souptik Kumar; Dastidar, Jayati Ghosh

doi:10.1007/978-3-031-12419-8_9

Arion Mitra⁶,
Ankita Biswas⁶,
Ananya Ghosh⁶,
Ahona Ghosh⁷,
Souptik Kumar Majumdar⁸ &
…
Jayati Ghosh Dastidar⁹

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 231))

260 Accesses

Abstract

Emotion recognition is the identification of emotions usually through verbal communication and facial expressions such as happy, angry, sad, etc. Not only on the basis of a wide spectrum of moods, but different emotions can also be recognized in order to track mental health of as many people as possible for societal well being. Inside positive it detects specific emotions like happiness, satisfaction, or excitement -depending on how it’s configured. The main principles involved in the implementation of our sentiment recognition system that identifies various emotions: anger, happiness, depression, neutral, etc. are audio content and identification of the emotion associated with it. The application developed takes audio input, applies Mel-Frequency Cepstral Coefficients (MFCC) algorithm on it, compares them with those of the content of the existing audio file database depicting various human sentiments, and presents output in the text the emotion expressed by the user. The input from testing was gathered and meaningful spectral coefficients were extracted and stored in a database for comparison with future audio samples. The application extracts the coefficients of the external audio sample and matches it with those present in the database. MFCC algorithm is used to extract the spectral coefficients which are good and can be used for feature matching purposes discarding any static and background noise if present. We have done comparative analysis on our models for their performance evaluation, using four classification metrics and also presented the confusion matrix for better understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Emotion Recognition from Speech Signal Using Deep Learning

Speech Emotion Recognition Using Deep Learning

An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning

Article 20 January 2021

References

Mishra, A., Dey, K., Bhattacharyya, P.: Learning cognitive features from gaze data for sentiment and sarcasm classification using convolutional neural network. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 377–387 (2017)
Google Scholar
Rodden, T., Cheverst, K., Davies, K., Dix, A.: Exploiting context in HCI design for mobile systems. In: Workshop on Human Computer Interaction with Mobile Devices, vol. 12 (1998)
Google Scholar
Squire, K.: From content to context: videogames as designed experience. Educ. Res. 35(8), 19–29 (2006)
Article Google Scholar
Leggitt, J.S., Gibbs, R.W.: Emotional reactions to verbal irony. Discourse Process. 29(1), 1–24 (2000)
Article Google Scholar
How Vanity Affects Video Communication | Highfive. Access Time: 3:20 am Saturday, 15 May 2021 (IST)
Google Scholar
Somerville, L.H., Jones, R.M., Ruberry, E.J., Dyke, J.P., Glover, G., Casey, B.J.: The medial prefrontal cortex and the emergence of self-conscious emotion in adolescence. Psychol. Sci. 24(8), 1554–1562 (2013)
Article Google Scholar
Salih, H., Kulkarni, L.: Study of video based facial expression and emotions recognition methods. In: 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), pp. 692–696. IEEE (2017)
Google Scholar
Izard, C.E.: The psychology of emotions. Springer Science & Business Media (1991)
Google Scholar
Duffy, E.: Activation and behavior (1962)
Google Scholar
Izard, C.E.: The face of emotion (1971)
Google Scholar
Izard, C.E., Tomkins, S.S.: Affect and behavior: anxiety as a negative affect. Anxiety Behav. 1, 81–125 (1966)
Article Google Scholar
Gilbert, P.: Affiliative and prosocial motives and emotions in mental health. Dialogues Clin. Neurosci. 17(4), 381 (2015)
Article Google Scholar
Depue, R.A., Morrone-Strupinsky, J.V.: A neurobehavioral model of affiliative bonding: implications for conceptualizing a human trait of affiliation. Behav. Brain Sci. 28(3), 313–349 (2005)
Article Google Scholar
Le Doux, J.: The Emotional Brain. London: Weidenfeld and Nicholson. Deutsch: Das Netz der Gefühle, München: Deutscher Taschenbuch-Verlag (2001)
Google Scholar
Panksepp, J.: Affective neuroscience of the emotional Brain. Mind: evolutionary perspectives and implications for understanding depression. Dialogues Clin. Neurosci. 12(4), 533 (2010)
Google Scholar
Gilbert, P.: The compassionate mind. Robinson (2009)
Google Scholar
Gilbert, P.: The evolution and social dynamics of compassion. Soc. Pers. Psychol. Compass 9(6), 239–254 (2015)
Article Google Scholar
Gilbert, P.: Human nature and suffering. Routledge (2016)
Google Scholar
Keltner, D., Kogan, A., Piff, P.K., Saturn, S.R.: The sociocultural appraisals, values, and emotions (SAVE) framework of prosociality: Core processes from gene to meme. Annu. Rev. Psychol. 65, 425–460 (2014)
Article Google Scholar
Gilbert, P.: The origins and nature of compassion focused therapy. Br. J. Clin. Psychol. 53(1), 6–41 (2014)
Article Google Scholar
Dunbar, R.I.: The social role of touch in humans and primates: behavioural function and neurobiological mechanisms. Neurosci. Biobehav. Rev. 34(2), 260–268 (2010)
Article Google Scholar
Ingale, A.B., Chaudhari, D.S.: Speech emotion recognition. Int. J. Soft Comput. Eng. (IJSCE) 2(1), 235–238 (2012)
Google Scholar
Shen, P., Changjun, Z., Chen, X.: Automatic speech emotion recognition using support vector machine. In: Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology, vol. 2, pp. 621–625. IEEE (2011)
Google Scholar
Shaikh Nilofer, R.A., Gadhe, R.P., Deshmukh, R.R., Waghmare, V.B., Shrishrimal, P.P.: Automatic emotion recognition from speech signals: a review. Int. J. Sci. Eng. Res. 6(4) (2015)
Google Scholar
Gunawan, T.S., Alghifari, M.F., Morshidi, M.A., Kartiwi, M.: A review on emotion recognition algorithms using speech analysis. Indones. J. Electr. Eng. Inform. (IJEEI) (IJEEI) 6(1), 12–20 (2018)
Google Scholar
Basharirad, B., Moradhaseli, M. (2017) Speech emotion recognition methods: a literature review. In: AIP Conference Proceedings, vol. 1891, No. 1, p. 020105. AIP Publishing LLC
Google Scholar
VH, A., Marimuthu, R.: A study on speech recognition technology. J. Comput. Technol. 2278–3814 (2014)
Google Scholar
Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Ismir, vol. 270, pp. 1–11 (2000)
Google Scholar
Nandi, S., Banerjee, M., Sinha, P., Dastidar, J.G.: SVM based classification of sounds from musical instruments using MFCC features. Int. J. Adv. Res. Comput. 8(5) (2017)
Google Scholar
Murarka, A., Shivarkar, K., Gupta, V., Sankpal, L.: Sentiment analysis of speech. Int. J. Adv. Res. Comput. Commun. Eng. 6(11), 240–243 (2017)
Google Scholar
Davletcharova, A., Sugathan, S., Abraham, B., James, A.P.: Detection and analysis of emotion from speech signals. Procedia Comput. Sci. 58, 91–96 (2015)
Article Google Scholar
Maghilnan, S., Kumar, M.R.: Sentiment analysis on speaker specific speech data. In: 2017 International Conference on Intelligent Computing and Control (I2C2), pp. 1–5. IEEE (2017)
Google Scholar
Mermelstein, P.: Distance measures for speech recognition, psychological and instrumental. Pattern Recognit. Artif. Intell. 116, 374–388 (1976)
Google Scholar
Hochreiter, S.: JA1 4 rgen Schmidhuber (1997).“Long Short-Term Memory”. Neural Comput. 9(8)
Google Scholar
Atlas, L., Homma, T., Marks, R.: An artificial neural network for spatio-temporal bipolar patterns: application to phoneme classification. In: Neural Information Processing Systems, pp. 31–40 (1987)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)
Article Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)
Article Google Scholar
Mishra, A., Dey, K., Bhattacharyya, P.: Learning cognitive features from gaze data for sentiment and sarcasm classification using convolutional neural network. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 377–387 (2017)
Google Scholar
http://mirlab.org/jang/books/audiosignalprocessing/speechFeatureMfcc.asp?title=12-2%20MFCC. Access Time: 10:30 pm Tuesday, 2 April 2019 (IST)
Google Scholar
https://medium.com/mlreview/understanding-lstm-and-its-diagrams-37e2f46f1714. Access Time: 10:30 pm Tuesday, 2 April 2019 (IST)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Calcutta, Calcutta, India
Arion Mitra, Ankita Biswas & Ananya Ghosh
Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, West Bengal, India
Ahona Ghosh
Deloitte USI, Bengaluru, India
Souptik Kumar Majumdar
Department of Computer Science, St. Xavier’s College, Kolkata, India
Jayati Ghosh Dastidar

Authors

Arion Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Ankita Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Ananya Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Ahona Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Souptik Kumar Majumdar
View author publications
You can also search for this author in PubMed Google Scholar
Jayati Ghosh Dastidar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahona Ghosh .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, National Institute Of Technology Silchar, Cachar, Assam, India
Anupam Biswas
National Institute of Technology Bhopal, Bhopal, Madhya Pradesh, India
Vijay Bhaskar Semwal
Department of Computer Science, PDPM, Indian Institute of Information Technology, Jabalpur, India
Durgesh Singh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mitra, A., Biswas, A., Ghosh, A., Ghosh, A., Majumdar, S.K., Dastidar, J.G. (2023). Using Deep Learning to Recognize Emotions Through Speech Analysis. In: Biswas, A., Semwal, V.B., Singh, D. (eds) Artificial Intelligence for Societal Issues. Intelligent Systems Reference Library, vol 231. Springer, Cham. https://doi.org/10.1007/978-3-031-12419-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-12419-8_9
Published: 20 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12418-1
Online ISBN: 978-3-031-12419-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Using Deep Learning to Recognize Emotions Through Speech Analysis

Abstract

Access this chapter

Similar content being viewed by others

Emotion Recognition from Speech Signal Using Deep Learning

Speech Emotion Recognition Using Deep Learning

An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Using Deep Learning to Recognize Emotions Through Speech Analysis

Abstract

Access this chapter

Similar content being viewed by others

Emotion Recognition from Speech Signal Using Deep Learning

Speech Emotion Recognition Using Deep Learning

An efficient algorithm for recognition of emotions from speaker and language independent speech using deep learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation