Skip to main content
Log in

Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Mental stress is a significant risk factor for several maladies and can negatively impact a person’s quality of life, including their work and personal relationships. Traditional methods of detecting mental stress through interviews and questionnaires may not capture individuals’ instantaneous emotional responses. In this study, the method of experience sampling was used to analyze the participants’ immediate affective responses, which provides a more comprehensive and dynamic understanding of the participants’ experiences. WorkStress3D dataset was compiled using information gathered from 20 participants for three distinct modalities. During an average of one week, 175 h of data containing physiological signals such as BVP, EDA, and body temperature, as well as facial expressions and auditory data, were collected from a single subject. We present a novel fusion model that uses double-early fusion approaches to combine data from multiple modalities. The model’s F1 score of 0.94 with a loss of 0.18 is very encouraging, showing that it can accurately identify and classify varying degrees of stress. Furthermore, we investigate the utilization of transfer learning techniques to improve the efficacy of our stress detection system. Despite our efforts, we were unable to attain better results than the fusion model. Transfer learning resulted in an accuracy of 0.93 and a loss of 0.17, illustrating the difficulty of adapting pre-trained models to the task of stress analysis. The results we obtained emphasize the significance of multi-modal fusion in stress detection and the importance of selecting the most suitable model architecture for the given task. The proposed fusion model demonstrates its potential for achieving an accurate and robust classification of stress. This research contributes to the field of stress analysis and contributes to the development of effective models for stress detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The WorkStress3D dataset generated during and/or analyzed during the current study is available in the Mendeley repository https://data.mendeley.com/datasets/t93xcwm75r/5.

References

  1. Jacobs N, Myin-Germeys Inez, Cathérine Derom P, Delespaul J Van, Os, and NA Nicolson, (2007) A momentary assessment study of the relationship between affective and adrenocortical stress responses in daily life. Biol Psychol 74(1):60–66

    Article  Google Scholar 

  2. Cohen Sheldon, Kamarck Tom, Mermelstein Robin et al (1994) Perceived stress scale. Measur Stress Guide Health Soc Sci 10(2):1–2

    Google Scholar 

  3. Koh KB, Park JK, Kim CH (2000) Development of the stress response inventory. J Korean Neuropsychiatric Assoc 39(4):707–719

    Google Scholar 

  4. Dogan G, Akbulut FP, Catal C, Mishra A (2022) Stress detection using experience sampling: a systematic mapping study. Int J Environ Res Public Health 19(9):5693

    Article  Google Scholar 

  5. Akbulut FP, Ikitimur B, Akan A (2020) Wearable sensor-based evaluation of psychosocial stress in patients with metabolic syndrome. Artific Intell Med 104:101824

    Article  Google Scholar 

  6. Fatma Patlar Akbulut, Harry G Perros, and Muhammad Shahzad. Bimodal affect recognition based on autoregressive hidden markov models from physiological signals. Computer Methods and Programs in Biomedicine, 195:105571, 2020

  7. Akbulut FP (2022) Evaluating the effects of the autonomic nervous system and sympathetic activity on emotional states. İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi 21(41):156–169

    Article  Google Scholar 

  8. Derdiyok S, Akbulut FP (2023) Biosignal based emotion-oriented video summarization. Multimed Syst 29(3):1513–1526

    Article  Google Scholar 

  9. Yildirim E, Akbulut FP, Catal C (2023) Analysis of facial emotion expression in eating occasions using deep learning. Multimed Tools Appl 82:31659–31671

    Article  Google Scholar 

  10. Shiffman S, Stone AA, Hufford MR (2008) Ecological momentary assessment. Annu Rev Clin Psychol 4:1–32

    Article  Google Scholar 

  11. Serre Fuschia, Fatseas Melina, Debrabant Romain, Alexandre Jean-Marc, Auriacombe Marc, Swendsen Joel (2012) Ecological momentary assessment in alcohol, tobacco, cannabis and opiate dependence: a comparison of feasibility and validity. Drug Alcohol Depend 126(1–2):118–123

    Article  Google Scholar 

  12. Abraham AD, Leung EJY, Wong BA, Rivera ZMG, Kruse LC, Clark JJ, Land BB (2020) Orally consumed cannabinoids provide long-lasting relief of allodynia in a mouse model of chronic neuropathic pain. Neuropsychopharmacology 45(7):1105–1114

    Article  Google Scholar 

  13. Myin-Germeys I, Krabbendam L, Jolles J, Delespaul PA, van Os J (2002) Are cognitive impairments associated with sensitivity to stress in schizophrenia? an experience sampling study. Am J Psychiatry 159(3):443–449

    Article  Google Scholar 

  14. Peters Stefan, Wilkinson Amanda, Mulligan Hilda (2019) Views of healthcare professionals on training for and delivery of a fatigue self-management program for persons with multiple sclerosis. Disabil Rehabil 41(23):2792–2798

    Article  Google Scholar 

  15. Brys ADH, Di Stasio E, Lenaert B, Sanguinetti M, Picca A, Calvani R, Marzetti E, Gambaro G, Bossola M (2020) Serum interleukin-6 and endotoxin levels and their relationship with fatigue and depressive symptoms in patients on chronic haemodialysis. Cytokine 125:154823

    Article  Google Scholar 

  16. Nalepa GJ, Kutt K, Giżycka B, Jemioło P, Bobek S (2019) Analysis and use of the emotional context with wearable devices for games and intelligent assistants. Sensors 19(11):2509

    Article  Google Scholar 

  17. Setz Cornelia, Arnrich Bert, Schumm Johannes, La Marca Roberto, Tröster Gerhard, Ehlert Ulrike (2009) Discriminating stress from cognitive load using a wearable eda device. IEEE Transactions on information technology in biomedicine 14(2):410–417

    Article  Google Scholar 

  18. Mohaddes F, da Silva RL, Akbulut FP, Zhou Y, Tanneeru A, Lobaton E, Lee B, Misra V (2020) A pipeline for adaptive filtering and transformation of noisy left-arm ECG to its surrogate chest signal. Electronics 9(5):866

    Article  Google Scholar 

  19. Akbulut FP, Akan A (2018) A smart wearable system for short-term cardiovascular risk assessment with emotional dynamics. Measurement 128:237–246

    Article  Google Scholar 

  20. Rothkrantz LJM, Wiggers P, Van Wees JWA, van Vark RJ (2004) Voice stress analysis. In: International conference on text, speech and dialogue, pp 449–456. Springer

  21. Leung Y, Oates J, Chan SP (2018) Voice, articulation, and prosody contribute to listener perceptions of speaker gender: a systematic review and meta-analysis. J Speech Lang Hearing Res 61(2):266–297

    Article  Google Scholar 

  22. Pennebaker JW (1993) Putting stress into words: health, linguistic, and therapeutic implications. Behav Res Therapy 31(6):539–548

    Article  Google Scholar 

  23. Madhavi I, Chamishka S, Nawaratne R, Nanayakkara V, Alahakoon D, De Silva D (2020) A deep learning approach for work related stress detection from audio streams in cyber physical environments. In: 2020 25th IEEE international conference on emerging technologies and factory automation (ETFA), volume 1, pp 929–936. IEEE

  24. Wood Adrienne, Rychlowska Magdalena, Korb Sebastian, Niedenthal Paula (2016) Fashioning the face: sensorimotor simulation contributes to facial expression recognition. Trends Cognit Sci 20(3):227–240

    Article  Google Scholar 

  25. Mitra S, Acharya T (2007) Gesture recognition: a survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 37(3):311–324

    Article  Google Scholar 

  26. Happy SL, Routray A (2017) Fuzzy histogram of optical flow orientations for micro-expression recognition. IEEE Trans Affect Comput 10(3):394–406

    Article  Google Scholar 

  27. Verma GK, Tiwary US (2014) Multimodal fusion framework: a multiresolution approach for emotion classification and recognition from physiological signals. NeuroImage 102:162–172

    Article  Google Scholar 

  28. Gunes H, Piccardi M (2005) Affect recognition from face and body: early fusion versus late fusion. In: 2005 IEEE international conference on systems, man and cybernetics, vol 4, pp 3437–3443. IEEE

  29. Zhou X, Jin Y, Zhang H, Li S, Huang X (2016).A map of threats to validity of systematic literature reviews in software engineering. In: 2016 23rd Asia-Pacific software engineering conference (APSEC), pp 153–160. IEEE

Download references

Acknowledgements

This work was supported by the Scientific Research Projects Coordination Unit of Istanbul Kultur University with project number: IKU-BAP2012. Within the scope of the study, data were collected with the permission of the ethics committee of Istanbul Kultur University, with the decision dated 20.05.2020 and numbered 2020.29.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatma Patlar Akbulut.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dogan, G., Akbulut, F.P. Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress. Neural Comput & Applic 35, 24435–24454 (2023). https://doi.org/10.1007/s00521-023-09036-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09036-4

Keywords

Navigation