Real-time emotional health detection using fine-tuned transfer networks with multimodal fusion

Sharma, Aditi; Sharma, Kapil; Kumar, Akshi

doi:10.1007/s00521-022-06913-2

Real-time emotional health detection using fine-tuned transfer networks with multimodal fusion

S.I. : Neural Computing for IOT based Intelligent Healthcare Systems
Published: 22 January 2022

Volume 35, pages 22935–22948, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

1763 Accesses
19 Citations
Explore all metrics

Abstract

Recognizing and regulating human emotion or a wave of riding emotions are a vital life skill as it can play an important role in how a person thinks, behaves and acts. Accurate real-time emotion detection can revolutionize the human–computer interaction industry and has the potential to provide a proactive approach to mental health care. Several untapped sources of data, including social media data (psycholinguistic markers), multimodal data (audio and video signals) combined with the sensor-based psychophysiological and brain signals, help to comprehend the affective states and emotional experiences. In this work, we propose a model that utilizes three modalities, i.e., visual (facial expression and body gestures), audio (speech) and text (spoken content), to classify emotion into discrete categories based on Ekman’s model with an additional category for ‘neutral’ state. Transfer learning has been used with multistage fine-tuning for each modality instead of training on a single dataset to make the model generalizable. The use of multiple modalities allows integration of heterogeneous data from different sources effectively. The results of the three modalities are combined at the decision-level using weighted fusion technique. The proposed EmoHD model compares favorably to the state-of-the-art technique on two benchmark datasets MELD and IEMOCAP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Multimodal Emotion Recognition Using Transfer Learning on Audio and Text Data

Harnessing emotions for depression detection

Article 09 September 2021

Elder emotion classification through multimodal fusion of intermediate layers and cross-modal transfer learning

Article 18 January 2022

Availability of data and materials

Publicly accessible data have been used by the authors.

Code availability

Can be available on request.

References

Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
Article Google Scholar
Zhang S, Zhang S, Huang T, Gao W, Tian Q (2017) Learning affective features with a hybrid deep model for audio–visual emotion recognition. IEEE Trans Circuits Syst Video Technol 28(10):3030–3043
Article Google Scholar
Kumar A, Sharma K, Sharma A (2021) Hierarchical deep neural network for mental stress state detection using IoT based biomarkers. Pattern Recogn Lett 145:81–87
Article Google Scholar
Gunes H, Pantic M (2010) Automatic, dimensional and continuous emotion recognition. Int J Synthet Emot (IJSE) 1(1):68–99
Article Google Scholar
Szabóová M, Sarnovský M, Maslej Krešňáková V, Machová K (2020) Emotion analysis in human-robot interaction. Electronics 9(11):1761
Article Google Scholar
Rabiei M, Gasparetto A (2014) A system for feature classification of emotions based on speech analysis; applications to human-robot interaction. In: 2014 second RSI/ISM international conference on robotics and mechatronics (ICRoM), pp 795–800. IEEE.
García-Magariño I, Chittaro L, Plaza I (2018) Bodily sensation maps: exploring a new direction for detecting emotions from user self-reported data. Int J Hum Comput Stud 113:32–47
Article Google Scholar
Zhang L, Walter S, Ma X, Werner P, Al-Hamadi A, Traue HC, Gruss S (2016) “BioVid Emo DB”: A multimodal database for emotion analyses validated by subjective ratings. In: 2016 IEEE symposium series on computational intelligence (SSCI) pp 1–6. IEEE.
Bahreini K, Nadolski R, Westera W (2016) Towards multimodal emotion recognition in e-learning environments. Interact Learn Environ 24(3):590–605
Article Google Scholar
Ashwin TS, Jose J, Raghu G, Reddy GRM (2015) An e-learning system with multifacial emotion recognition using supervised machine learning. In: 2015 IEEE seventh international conference on technology for education (T4E), pp 23–26. IEEE.
Ayvaz U, Gürüler H, Devrim MO (2017) Use of facial emotion recognition in e-learning systems. Iнфopмaцiйнi тexнoлoгiï i зacoби нaвчaння, (60, вип. 4), 95–104
Zeng H, Shu X, Wang Y, Wang Y, Zhang L, Pong TC, Qu H (2020) EmotionCues: emotion-oriented visual summarization of classroom videos. IEEE Trans Vis Comput Gr
Tu G, Fu Y, Li B, Gao J, Jiang YG, Xue X (2019) A multi-task neural approach for emotion attribution, classification, and summarization. IEEE Trans Multimedia 22(1):148–159
Article Google Scholar
Hossain MS, Muhammad G (2017) Emotion-aware connected healthcare big data towards 5G. IEEE Internet Things J 5(4):2399–2406
Article Google Scholar
Weitz K, Hassan T, Schmid U, Garbas J (2018) Towards explaining deep learning networks to distinguish facial expressions of pain and emotions. In: Forum Bildverarbeitung, pp 197–208
Saravia E, Liu HCT, Huang YH, Wu J, Chen YS (2018) Carer: contextualized affect representations for emotion recognition. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3687–3697
Ekman P, Friesen W (1977) Facial action coding system: a technique for the measurement of facial movement. Consulting Psychologists Press Stanford University, Palo Alto
Google Scholar
Datcu D, Rothkrantz L (2008) Semantic audio-visual data fusion for automatic emotion recognition. Euromedia’2008
De Silva LC, Miyasato T, Nakatsu R (1997) Facial emotion recognition using multi-modal information. In: Information, communications and signal processing, 1997. ICICS., Proceedings of 1997 International Conference on, vol 1. IEEE, 1997, pp 397–401
Datcu D, Rothkrantz LJ (2011) Emotion recognition using bimodal data fusion. In: Proceedings of the 12th international conference on computer systems and technologies. ACM, 2011, pp 122–128
Schuller B (2011) Recognizing affect from linguistic information in 3d continuous space. IEEE Trans Affect Comput 2(4):192–205
Article Google Scholar
Metallinou A, Lee S, Narayanan S (2008) Audio-visual emotion recognition using gaussian mixture models for face and voice. In: Tenth IEEE international symposium on multimedia, 2008. ISM 2008. IEEE, 2008, pp 250–257
Eyben F, Wollmer M, Graves A, Schuller B, Douglas-Cowie E, Cowie R (2010) On-line emotion recognition in a 3-d activation-valence-time continuum using acoustic and linguistic cues. J Multimodal User Interfaces 3(1–2):7–19
Article Google Scholar
Rosas V, Mihalcea R, Morency L-P (1977) Multimodal sentiment analysis of spanish online videos. In: IEEE intelligent systems, vol 28, no. 3, pp. 0038–45, 2013. P. Ekman and W. Friesen, Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Stanford University, Palo Alto, 1977.
Rozgic V, Ananthakrishnan S, Saleem S, Kumar R, Prasad R (2012) Speech language & multimedia technol., raytheon bbn technol., Cambridge, Ma, Usa. In: Signal & information processing association annual summit and conference (APSIPA ASC), 2012 Asia-Pacific. IEEE, 2012, pp 1–4
Soleymani M, Pantic M, Pun T (2011) Multimodal emotion recognition in response to videos. IEEE Trans Affect Comput 3(2):211–223
Article Google Scholar
Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou S (2017) End-to-end multimodal emotion recognition using deep neural networks. IEEE J Sel Top Signal Process 11(8):1301–1309
Article Google Scholar
Ranganathan H, Chakraborty S, Panchanathan S (2016) Multimodal emotion recognition using deep learning architectures. In: 2016 IEEE winter conference on applications of computer vision (WACV), pp 1–9. IEEE
Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 439–448. IEEE
Nguyen D, Nguyen K, Sridharan S, Ghasemi A, Dean D, Fookes C (2017) Deep spatio-temporal features for multimodal emotion recognition. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 1215–1223. IEEE
Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) Meld: a multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint https://arxiv.org/abs/1810.02508.
Mittal T, Bhattacharya U, Chandra R, Bera A, Manocha D (2020) M3ER: multiplicative multimodal emotion recognition using facial, textual, and speech cues. In: AAAI, pp 1359–1367
Delbrouck JB, Tits N, Dupont S (2020) Modulated fusion using transformer for linguistic-acoustic emotion recognition. arXiv preprint https://arxiv.org/abs/2010.02057
Hagar AF, Abbas HM, Khalil MI (2019) Emotion recognition in videos for low-memory systems using deep-learning. In: 2019 14th international conference on computer engineering and systems (ICCES), pp 16–21. IEEE
Iskhakova A, Wolf D, Meshcheryakov R (2020) Automated destructive behavior state detection on the 1D CNN-based voice analysis. In: International conference on speech and computer, pp 184–193. Springer, Cham
Xie J, Xu X, Shu L (2018) WT feature based emotion recognition from multi-channel physiological signals with decision fusion. In: 2018 first asian conference on affective computing and intelligent interaction (ACII Asia), pp 1–6. IEEE
Gideon J, Khorram S, Aldeneh Z, Dimitriadis D, Provost EM (2017) Progressive neural networks for transfer learning in emotion recognition. arXiv preprint https://arxiv.org/abs/1706.03256.
Ouyang, X., Kawaai, S., Goh, E. G. H., Shen, S., Ding, W., Ming, H., & Huang, D. Y. (2017, November). Audio-visual emotion recognition using deep transfer learning and multiple temporal models. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (pp. 577–582).
Kumar A, Sharma K, Sharma A (2021) Genetically optimized fuzzy C-means data clustering of IoMT-based biomarkers for fast affective state recognition in intelligent edge analytics. Applied Soft Computing, 107525
Tavallali P, et al. (2021) An EM-based optimization of synthetic reduced nearest neighbor model towards multiple modalities representation with human interpretability, multimedia tools and applications
Dresvyanskiy D, Ryumina E, Kaya H, Markitantov M, Karpov A, Minker W (2020) An audio-video deep and transfer learning framework for multimodal emotion recognition in the wild. arXiv preprint https://arxiv.org/abs/2010.03692
Siriwardhana S, Reis A, Weerasekera R, Nanayakkara S (2020) Jointly fine-tuning "BERT-like" self supervised models to improve multimodal speech emotion recognition. arXiv preprint https://arxiv.org/abs/2008.06682
Ekman P (1999) Basic emotions. Handb Cognit Emot 98(45–60):16
Google Scholar
Abbas A, Abdelsamea MM, Gaber MM (2020) Detrac: Transfer learning of class decomposed medical images in convolutional neural networks. IEEE Access 8:74901–74913
Article Google Scholar
Huh M, Agrawal P, Efros AA (2016) What makes ImageNet good for transfer learning?. arXiv preprint https://arxiv.org/abs/1608.08614
Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335–359
Article Google Scholar
Li W, Abtahi F, Zhu Z (2015) A deep feature based multi-kernel learning approach for video emotion recognition. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, pp 483–490
Wu Z, Shen C, Van Den Hengel A (2019) Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recogn 90:119–133
Article Google Scholar
Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion 37:98–125
Article Google Scholar
Kumar A, Sharma A, Arora A (2019) Anxious depression prediction in real-time social data. In: International conference on advances in engineering science management & technology (ICAESMT)-2019, Uttaranchal University, Dehradun, India
Hossain MS, Muhammad G (2019) Emotion recognition using deep learning approach from audio–visual emotional big data. Information Fusion 49:69–78
Article Google Scholar
Li W, Tsangouri C, Abtahi F, Zhu Z (2018) A recursive framework for expression recognition: from web images to deep models to game dataset. Mach Vis Appl 29(3):489–502
Article Google Scholar
Acheampong FA, Nunoo-Mensah H, Chen W (2021) Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif Intell Rev, 1–41
Hazarika D, Poria S, Zimmermann R, Mihalcea R (2021) Conversational transfer learning for emotion recognition. Inf Fusion 65:1–12
Article Google Scholar

Download references

Funding

No Funding has been received.

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Delhi Technological University, New Delhi, India
Aditi Sharma
Department of Information Technology, Delhi Technological University, New Delhi, India
Kapil Sharma
Department of Information Technology, Netaji Subhas University of Technology, New Delhi, India
Akshi Kumar

Authors

Aditi Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Kapil Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Akshi Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors have equally contributed to the manuscript preparation.

Corresponding author

Correspondence to Akshi Kumar.

Ethics declarations

Conflicts of interest

The authors certify that there is no conflict of interest in the subject matter discussed in this manuscript.

Ethics approval

The work conducted is not plagiarized. No one has been harmed in this work.

Consent to participate

All the authors have given consent to submit the manuscript.

Consent for publication

Authors provide their consent for the publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharma, A., Sharma, K. & Kumar, A. Real-time emotional health detection using fine-tuned transfer networks with multimodal fusion. Neural Comput & Applic 35, 22935–22948 (2023). https://doi.org/10.1007/s00521-022-06913-2

Download citation

Received: 05 May 2021
Accepted: 04 January 2022
Published: 22 January 2022
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00521-022-06913-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time emotional health detection using fine-tuned transfer networks with multimodal fusion

Abstract

Access this article

Similar content being viewed by others

Multimodal Emotion Recognition Using Transfer Learning on Audio and Text Data

Harnessing emotions for depression detection

Elder emotion classification through multimodal fusion of intermediate layers and cross-modal transfer learning

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time emotional health detection using fine-tuned transfer networks with multimodal fusion

Abstract

Access this article

Similar content being viewed by others

Multimodal Emotion Recognition Using Transfer Learning on Audio and Text Data

Harnessing emotions for depression detection

Elder emotion classification through multimodal fusion of intermediate layers and cross-modal transfer learning

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation