Skip to main content

Using Deep Learning to Recognize Emotions Through Speech Analysis

  • Chapter
  • First Online:
Artificial Intelligence for Societal Issues

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 231))

  • 260 Accesses

Abstract

Emotion recognition is the identification of emotions usually through verbal communication and facial expressions such as happy, angry, sad, etc. Not only on the basis of a wide spectrum of moods, but different emotions can also be recognized in order to track mental health of as many people as possible for societal well being. Inside positive it detects specific emotions like happiness, satisfaction, or excitement -depending on how it’s configured. The main principles involved in the implementation of our sentiment recognition system that identifies various emotions: anger, happiness, depression, neutral, etc. are audio content and identification of the emotion associated with it. The application developed takes audio input, applies Mel-Frequency Cepstral Coefficients (MFCC) algorithm on it, compares them with those of the content of the existing audio file database depicting various human sentiments, and presents output in the text the emotion expressed by the user. The input from testing was gathered and meaningful spectral coefficients were extracted and stored in a database for comparison with future audio samples. The application extracts the coefficients of the external audio sample and matches it with those present in the database. MFCC algorithm is used to extract the spectral coefficients which are good and can be used for feature matching purposes discarding any static and background noise if present. We have done comparative analysis on our models for their performance evaluation, using four classification metrics and also presented the confusion matrix for better understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Mishra, A., Dey, K., Bhattacharyya, P.: Learning cognitive features from gaze data for sentiment and sarcasm classification using convolutional neural network. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 377–387 (2017)

    Google Scholar 

  2. Rodden, T., Cheverst, K., Davies, K., Dix, A.: Exploiting context in HCI design for mobile systems. In: Workshop on Human Computer Interaction with Mobile Devices, vol. 12 (1998)

    Google Scholar 

  3. Squire, K.: From content to context: videogames as designed experience. Educ. Res. 35(8), 19–29 (2006)

    Article  Google Scholar 

  4. Leggitt, J.S., Gibbs, R.W.: Emotional reactions to verbal irony. Discourse Process. 29(1), 1–24 (2000)

    Article  Google Scholar 

  5. How Vanity Affects Video Communication | Highfive. Access Time: 3:20 am Saturday, 15 May 2021 (IST)

    Google Scholar 

  6. Somerville, L.H., Jones, R.M., Ruberry, E.J., Dyke, J.P., Glover, G., Casey, B.J.: The medial prefrontal cortex and the emergence of self-conscious emotion in adolescence. Psychol. Sci. 24(8), 1554–1562 (2013)

    Article  Google Scholar 

  7. Salih, H., Kulkarni, L.: Study of video based facial expression and emotions recognition methods. In: 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), pp. 692–696. IEEE (2017)

    Google Scholar 

  8. Izard, C.E.: The psychology of emotions. Springer Science & Business Media (1991)

    Google Scholar 

  9. Duffy, E.: Activation and behavior (1962)

    Google Scholar 

  10. Izard, C.E.: The face of emotion (1971)

    Google Scholar 

  11. Izard, C.E., Tomkins, S.S.: Affect and behavior: anxiety as a negative affect. Anxiety Behav. 1, 81–125 (1966)

    Article  Google Scholar 

  12. Gilbert, P.: Affiliative and prosocial motives and emotions in mental health. Dialogues Clin. Neurosci. 17(4), 381 (2015)

    Article  Google Scholar 

  13. Depue, R.A., Morrone-Strupinsky, J.V.: A neurobehavioral model of affiliative bonding: implications for conceptualizing a human trait of affiliation. Behav. Brain Sci. 28(3), 313–349 (2005)

    Article  Google Scholar 

  14. Le Doux, J.: The Emotional Brain. London: Weidenfeld and Nicholson. Deutsch: Das Netz der Gefühle, München: Deutscher Taschenbuch-Verlag (2001)

    Google Scholar 

  15. Panksepp, J.: Affective neuroscience of the emotional Brain. Mind: evolutionary perspectives and implications for understanding depression. Dialogues Clin. Neurosci. 12(4), 533 (2010)

    Google Scholar 

  16. Gilbert, P.: The compassionate mind. Robinson (2009)

    Google Scholar 

  17. Gilbert, P.: The evolution and social dynamics of compassion. Soc. Pers. Psychol. Compass 9(6), 239–254 (2015)

    Article  Google Scholar 

  18. Gilbert, P.: Human nature and suffering. Routledge (2016)

    Google Scholar 

  19. Keltner, D., Kogan, A., Piff, P.K., Saturn, S.R.: The sociocultural appraisals, values, and emotions (SAVE) framework of prosociality: Core processes from gene to meme. Annu. Rev. Psychol. 65, 425–460 (2014)

    Article  Google Scholar 

  20. Gilbert, P.: The origins and nature of compassion focused therapy. Br. J. Clin. Psychol. 53(1), 6–41 (2014)

    Article  Google Scholar 

  21. Dunbar, R.I.: The social role of touch in humans and primates: behavioural function and neurobiological mechanisms. Neurosci. Biobehav. Rev. 34(2), 260–268 (2010)

    Article  Google Scholar 

  22. Ingale, A.B., Chaudhari, D.S.: Speech emotion recognition. Int. J. Soft Comput. Eng. (IJSCE) 2(1), 235–238 (2012)

    Google Scholar 

  23. Shen, P., Changjun, Z., Chen, X.: Automatic speech emotion recognition using support vector machine. In: Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology, vol. 2, pp. 621–625. IEEE (2011)

    Google Scholar 

  24. Shaikh Nilofer, R.A., Gadhe, R.P., Deshmukh, R.R., Waghmare, V.B., Shrishrimal, P.P.: Automatic emotion recognition from speech signals: a review. Int. J. Sci. Eng. Res. 6(4) (2015)

    Google Scholar 

  25. Gunawan, T.S., Alghifari, M.F., Morshidi, M.A., Kartiwi, M.: A review on emotion recognition algorithms using speech analysis. Indones. J. Electr. Eng. Inform. (IJEEI) (IJEEI) 6(1), 12–20 (2018)

    Google Scholar 

  26. Basharirad, B., Moradhaseli, M. (2017) Speech emotion recognition methods: a literature review. In: AIP Conference Proceedings, vol. 1891, No. 1, p. 020105. AIP Publishing LLC

    Google Scholar 

  27. VH, A., Marimuthu, R.: A study on speech recognition technology. J. Comput. Technol. 2278–3814 (2014)

    Google Scholar 

  28. Logan, B.: Mel frequency cepstral coefficients for music modeling. In: Ismir, vol. 270, pp. 1–11 (2000)

    Google Scholar 

  29. Nandi, S., Banerjee, M., Sinha, P., Dastidar, J.G.: SVM based classification of sounds from musical instruments using MFCC features. Int. J. Adv. Res. Comput. 8(5) (2017)

    Google Scholar 

  30. Murarka, A., Shivarkar, K., Gupta, V., Sankpal, L.: Sentiment analysis of speech. Int. J. Adv. Res. Comput. Commun. Eng. 6(11), 240–243 (2017)

    Google Scholar 

  31. Davletcharova, A., Sugathan, S., Abraham, B., James, A.P.: Detection and analysis of emotion from speech signals. Procedia Comput. Sci. 58, 91–96 (2015)

    Article  Google Scholar 

  32. Maghilnan, S., Kumar, M.R.: Sentiment analysis on speaker specific speech data. In: 2017 International Conference on Intelligent Computing and Control (I2C2), pp. 1–5. IEEE (2017)

    Google Scholar 

  33. Mermelstein, P.: Distance measures for speech recognition, psychological and instrumental. Pattern Recognit. Artif. Intell. 116, 374–388 (1976)

    Google Scholar 

  34. Hochreiter, S.: JA1 4 rgen Schmidhuber (1997).“Long Short-Term Memory”. Neural Comput. 9(8)

    Google Scholar 

  35. Atlas, L., Homma, T., Marks, R.: An artificial neural network for spatio-temporal bipolar patterns: application to phoneme classification. In: Neural Information Processing Systems, pp. 31–40 (1987)

    Google Scholar 

  36. Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27(8), 861–874 (2006)

    Article  MathSciNet  Google Scholar 

  37. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)

    Article  Google Scholar 

  38. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  39. Mishra, A., Dey, K., Bhattacharyya, P.: Learning cognitive features from gaze data for sentiment and sarcasm classification using convolutional neural network. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 377–387 (2017)

    Google Scholar 

  40. http://mirlab.org/jang/books/audiosignalprocessing/speechFeatureMfcc.asp?title=12-2%20MFCC. Access Time: 10:30 pm Tuesday, 2 April 2019 (IST)

    Google Scholar 

  41. https://medium.com/mlreview/understanding-lstm-and-its-diagrams-37e2f46f1714. Access Time: 10:30 pm Tuesday, 2 April 2019 (IST)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahona Ghosh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mitra, A., Biswas, A., Ghosh, A., Ghosh, A., Majumdar, S.K., Dastidar, J.G. (2023). Using Deep Learning to Recognize Emotions Through Speech Analysis. In: Biswas, A., Semwal, V.B., Singh, D. (eds) Artificial Intelligence for Societal Issues. Intelligent Systems Reference Library, vol 231. Springer, Cham. https://doi.org/10.1007/978-3-031-12419-8_9

Download citation

Publish with us

Policies and ethics