Abstract
Patients in hospitals frequently exhibit psychological issues such as sadness, pessimism, eccentricity, and anxiety. However, hospitals normally lack tools and facilities to continuously monitor the psychological health of patients. It is desirable to identify depression in patients so that it can be managed by instantly providing better therapy. This can be possible by advances in machine learning for image processing with notable applications in the domain of emotion recognition using facial expressions. In this paper, we have proposed two different methods, i.e. facial expression detection and voice analysis, to predict emotions. For facial expression recognition, we have used two approaches, one is the use of Gabor filters for feature extraction with support vector machine for classification and another is using convolutional neural network (CNN). For voice analysis, we extracted mel-frequency cepstral coefficients from speech data and, based on those features, predicted the emotions of the speech using a CNN model. Experimental results show that our proposed emotion recognition methods obtained high accuracy and thus could be potentially deployed to real-world applications.
Similar content being viewed by others
Availability of data and materials
Data is available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analysed in this study. The data can be found here: [https://www.kasrl.org/jaffe_download.html] (accessed on 12th March 2022), [https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio] (accessed on 13th April 2022), [https://www.kaggle.com/datasets/barelydedicated/savee-database] (accessed on 13th April 2022), [https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess] (accessed on 15th April 2022), and [https://www.kaggle.com/datasets/ejlok1/cremad] (accessed on 17th April 2022).
Code Availability
The code is available at the following github link: https://github.com/AayushiChaudhari5694/EmotionRecognition_Image_Speech.git.
References
Sonawane B, Sharma P. Deep learning based approach of emotion detection and grading system. Pattern Recognit Image Anal. 2020;30(4):726–40.
Kim DJ. Facial expression recognition using ASM-based post-processing technique. Pattern Recognit Image Anal. 2016;26(3):576–81.
Muhammad K, Khan S, Kumar N, Del Ser J, Mirjalili S. Vision-based personalized wireless capsule endoscopy for smart healthcare: taxonomy, literature review, opportunities and challenges. Futur Gener Comput Syst. 2020;113:266–80.
Pisor AC, Gervais MM, Purzycki BG, Ross CT. Preferences and constraints: the value of economic games for studying human behaviour. R Soc Open Sci. 2020;7(6): 192090.
Le DN, Nguyen GN, Van Chung L, Dey N. MMAS algorithm for features selection using 1D-DWT for video-based face recognition in the online video contextual advertisement user-oriented system. J Glob Inf Manag (JGIM). 2017;25(4):103–24.
Panning A, Al-Hamadi AK, Niese R, Michaelis B. Facial expression recognition based on Haar-like feature detection. Pattern Recognit Image Anal. 2008;18(3):447–52.
Tarnowski P, Kołodziej M, Majkowski A, Rak RJ. Emotion recognition using facial expressions. Proc Comput Sci. 2017;108:1175–84.
Le DN, Nguyen GN, Bhateja V, Satapathy SC. Optimizing feature selection in video-based recognition using Max-Min Ant System for the online video contextual advertisement user-oriented system. J Comput Sci. 2017;21:361–70.
Rozaliev VL, Orlova YA. Motion and posture recognition for identifying human emotional reactions. Pattern Recognit Image Anal. 2015;25(4):710–21.
Basu S, Chakraborty J, Bag A, Aftabuddin M. A review on emotion recognition using speech. In: 2017 international conference on inventive communication and computational technologies (ICICCT). IEEE; 2017. p. 109–114.
Liu M, Shan S, Wang R, Chen X. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. p. 1749–1756.
Jie S, Yongsheng Q. Multi-view facial expression recognition with multi-view facial expression light weight network. Pattern Recognit Image Anal. 2020;30(4):805–14.
Chen Y, Wang J, Chen S, Shi Z, Cai J. Facial motion prior networks for facial expression recognition. In: 2019 IEEE visual communications and image processing (VCIP). IEEE; 2019. p. 1–4.
Hibare R, Vibhute A. Feature extraction techniques in speech processing: a survey. Int J Comput Appl. 2014;107(5).
Meng Z, Liu P, Cai J, Han S, Tong Y. Identity-aware convolutional neural network for facial expression recognition. In: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017). IEEE; 2017. p. 558–565.
Meng D, Peng X, Wang K, Qiao Y. Frame attention networks for facial expression recognition in videos. In: 2019 IEEE international conference on image processing (ICIP). IEEE; 2019. p. 3866–3870.
Verma A, Dogra A, Malik K, Talwar M. Emotion recognition system for patients with behavioral disorders. In: Intelligent communication, control and devices. Singapore: Springer; 2018. p. 139–145.
Alugupally N, Samal A, Marx D, Bhatia S. Analysis of landmarks in recognition of face expressions. Pattern Recognit Image Anal. 2011;21(4):681–93.
Wang X, Huang J, Zhu J, Yang M, Yang F. Facial expression recognition with deep learning. In: Proceedings of the 10th international conference on internet multimedia computing and service. 2018. p. 1–4.
Yang H, Ciftci U, Yin L. Facial expression recognition by deexpression residue learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 2168–2177.
Kamachi M, Lyons M, Gyoba J. The Japanese Female Facial Expression (JAFFE) database. 1997. http://www.kasrl.org/jaffe.html.
Kosti R, Alvarez JM, Recasens A, Lapedriza A. Context based emotion recognition using emotic dataset. IEEE Trans Pattern Anal Mach Intell. 2019;42(11):2755–66.
Livingstone SR, Russo FA. The Ryerson audio-visual database of emotional speech and song (RAVDESS) [Data set]. In: PLoS ONE 2018;(1.0.0, Vol. 13, Number 5, p. e0196391). Zenodo. https://doi.org/10.5281/zenodo.1188976.
Cao H, Cooper D, Keutmann M, Gur R, Nenkova A, Verma R. CREMA-D: crowd-sourced emotional multimodal actors dataset. IEEE Trans Affect Comput. 2014;5:377–90.
Haq S, Jackson PJB. Multimodal emotion recognition. In: Wang W, editor. Machine audition: principles, algorithms and systems. Hershey: IGI Global Press; 2010. p. 398–423. https://doi.org/10.4018/978-1-61520-919-4.
El Ayadi M, Kamel MS, Karray F. Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 2011;44(3):572–87.
Koolagudi SG, Rao KS. Emotion recognition from speech: a review. Int J Speech Technol. 2012;15(2):99–117.
Palo HK, Chandra M, Mohanty MN. Emotion recognition using MLP and GMM for Oriya language. Int J Comput Vis Robot. 2017;7(4):426–42.
Murthy HA, Yegnanarayana B. Formant extraction from group delay function. Speech Commun. 1991;10(3):209–21.
Choudhary A, Govil MC, Singh G, Awasthi LK. Workflow scheduling algorithms in cloud environment: a review, taxonomy, and challenges. In: 2016 4th international conference on parallel, distributed and grid computing (PDGC). IEEE; 2016. p. 617–624.
Albu F, Hagiescu D, Vladutu L, Puica MA. Neural network approaches for children’s emotion recognition in intelligent learning applications. In: Proceedings of the 7th international conference on education and new learning technologies (EDULEARN15). 2015. p. 3229–3239.
Funding
The research received no external funding.
Author information
Authors and Affiliations
Contributions
AC data curation, investigation, writing—original draft, implementation. CB methodology, project administration, writing—review and editing. TTN writing—review and editing. NP data curation, investigation, implementation, testing. KP methodology, writing—review and editing, implementation. KS methodology, writing—review and editing, implementation.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declared no conflict of interest.
Ethical Approval
Not applicable.
Consent to Participate
All authors have read and agreed to participate in the publication of the manuscript.
Consent for Publication
All authors have read and agreed to publish the latest version of the manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Enabling Innovative Computational Intelligence Technologies for IOT” guest edited by Omer Rana, Rajiv Misra, Alexander Pfeiffer, Luigi Troiano and Nishtha Kesswani.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chaudhari, A., Bhatt, C., Nguyen, T.T. et al. Emotion Recognition System via Facial Expressions and Speech Using Machine Learning and Deep Learning Techniques. SN COMPUT. SCI. 4, 363 (2023). https://doi.org/10.1007/s42979-022-01633-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01633-9