Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress

Dogan, Gulin; Akbulut, Fatma Patlar

doi:10.1007/s00521-023-09036-4

Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress

Original Article
Published: 03 October 2023

Volume 35, pages 24435–24454, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

471 Accesses
4 Citations
Explore all metrics

Abstract

Mental stress is a significant risk factor for several maladies and can negatively impact a person’s quality of life, including their work and personal relationships. Traditional methods of detecting mental stress through interviews and questionnaires may not capture individuals’ instantaneous emotional responses. In this study, the method of experience sampling was used to analyze the participants’ immediate affective responses, which provides a more comprehensive and dynamic understanding of the participants’ experiences. WorkStress3D dataset was compiled using information gathered from 20 participants for three distinct modalities. During an average of one week, 175 h of data containing physiological signals such as BVP, EDA, and body temperature, as well as facial expressions and auditory data, were collected from a single subject. We present a novel fusion model that uses double-early fusion approaches to combine data from multiple modalities. The model’s F1 score of 0.94 with a loss of 0.18 is very encouraging, showing that it can accurately identify and classify varying degrees of stress. Furthermore, we investigate the utilization of transfer learning techniques to improve the efficacy of our stress detection system. Despite our efforts, we were unable to attain better results than the fusion model. Transfer learning resulted in an accuracy of 0.93 and a loss of 0.17, illustrating the difficulty of adapting pre-trained models to the task of stress analysis. The results we obtained emphasize the significance of multi-modal fusion in stress detection and the importance of selecting the most suitable model architecture for the given task. The proposed fusion model demonstrates its potential for achieving an accurate and robust classification of stress. This research contributes to the field of stress analysis and contributes to the development of effective models for stress detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introducing MDPSD, a Multimodal Dataset for Psychological Stress Detection

Evaluating a New Approach to Data Fusion in Wearable Physiological Sensors for Stress Monitoring

Issues and Challenges in Detecting Mental Stress from Multimodal Data Using Machine Intelligence

Article 28 March 2024

Data availability

The WorkStress3D dataset generated during and/or analyzed during the current study is available in the Mendeley repository https://data.mendeley.com/datasets/t93xcwm75r/5.

References

Jacobs N, Myin-Germeys Inez, Cathérine Derom P, Delespaul J Van, Os, and NA Nicolson, (2007) A momentary assessment study of the relationship between affective and adrenocortical stress responses in daily life. Biol Psychol 74(1):60–66
Article Google Scholar
Cohen Sheldon, Kamarck Tom, Mermelstein Robin et al (1994) Perceived stress scale. Measur Stress Guide Health Soc Sci 10(2):1–2
Google Scholar
Koh KB, Park JK, Kim CH (2000) Development of the stress response inventory. J Korean Neuropsychiatric Assoc 39(4):707–719
Google Scholar
Dogan G, Akbulut FP, Catal C, Mishra A (2022) Stress detection using experience sampling: a systematic mapping study. Int J Environ Res Public Health 19(9):5693
Article Google Scholar
Akbulut FP, Ikitimur B, Akan A (2020) Wearable sensor-based evaluation of psychosocial stress in patients with metabolic syndrome. Artific Intell Med 104:101824
Article Google Scholar
Fatma Patlar Akbulut, Harry G Perros, and Muhammad Shahzad. Bimodal affect recognition based on autoregressive hidden markov models from physiological signals. Computer Methods and Programs in Biomedicine, 195:105571, 2020
Akbulut FP (2022) Evaluating the effects of the autonomic nervous system and sympathetic activity on emotional states. İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi 21(41):156–169
Article Google Scholar
Derdiyok S, Akbulut FP (2023) Biosignal based emotion-oriented video summarization. Multimed Syst 29(3):1513–1526
Article Google Scholar
Yildirim E, Akbulut FP, Catal C (2023) Analysis of facial emotion expression in eating occasions using deep learning. Multimed Tools Appl 82:31659–31671
Article Google Scholar
Shiffman S, Stone AA, Hufford MR (2008) Ecological momentary assessment. Annu Rev Clin Psychol 4:1–32
Article Google Scholar
Serre Fuschia, Fatseas Melina, Debrabant Romain, Alexandre Jean-Marc, Auriacombe Marc, Swendsen Joel (2012) Ecological momentary assessment in alcohol, tobacco, cannabis and opiate dependence: a comparison of feasibility and validity. Drug Alcohol Depend 126(1–2):118–123
Article Google Scholar
Abraham AD, Leung EJY, Wong BA, Rivera ZMG, Kruse LC, Clark JJ, Land BB (2020) Orally consumed cannabinoids provide long-lasting relief of allodynia in a mouse model of chronic neuropathic pain. Neuropsychopharmacology 45(7):1105–1114
Article Google Scholar
Myin-Germeys I, Krabbendam L, Jolles J, Delespaul PA, van Os J (2002) Are cognitive impairments associated with sensitivity to stress in schizophrenia? an experience sampling study. Am J Psychiatry 159(3):443–449
Article Google Scholar
Peters Stefan, Wilkinson Amanda, Mulligan Hilda (2019) Views of healthcare professionals on training for and delivery of a fatigue self-management program for persons with multiple sclerosis. Disabil Rehabil 41(23):2792–2798
Article Google Scholar
Brys ADH, Di Stasio E, Lenaert B, Sanguinetti M, Picca A, Calvani R, Marzetti E, Gambaro G, Bossola M (2020) Serum interleukin-6 and endotoxin levels and their relationship with fatigue and depressive symptoms in patients on chronic haemodialysis. Cytokine 125:154823
Article Google Scholar
Nalepa GJ, Kutt K, Giżycka B, Jemioło P, Bobek S (2019) Analysis and use of the emotional context with wearable devices for games and intelligent assistants. Sensors 19(11):2509
Article Google Scholar
Setz Cornelia, Arnrich Bert, Schumm Johannes, La Marca Roberto, Tröster Gerhard, Ehlert Ulrike (2009) Discriminating stress from cognitive load using a wearable eda device. IEEE Transactions on information technology in biomedicine 14(2):410–417
Article Google Scholar
Mohaddes F, da Silva RL, Akbulut FP, Zhou Y, Tanneeru A, Lobaton E, Lee B, Misra V (2020) A pipeline for adaptive filtering and transformation of noisy left-arm ECG to its surrogate chest signal. Electronics 9(5):866
Article Google Scholar
Akbulut FP, Akan A (2018) A smart wearable system for short-term cardiovascular risk assessment with emotional dynamics. Measurement 128:237–246
Article Google Scholar
Rothkrantz LJM, Wiggers P, Van Wees JWA, van Vark RJ (2004) Voice stress analysis. In: International conference on text, speech and dialogue, pp 449–456. Springer
Leung Y, Oates J, Chan SP (2018) Voice, articulation, and prosody contribute to listener perceptions of speaker gender: a systematic review and meta-analysis. J Speech Lang Hearing Res 61(2):266–297
Article Google Scholar
Pennebaker JW (1993) Putting stress into words: health, linguistic, and therapeutic implications. Behav Res Therapy 31(6):539–548
Article Google Scholar
Madhavi I, Chamishka S, Nawaratne R, Nanayakkara V, Alahakoon D, De Silva D (2020) A deep learning approach for work related stress detection from audio streams in cyber physical environments. In: 2020 25th IEEE international conference on emerging technologies and factory automation (ETFA), volume 1, pp 929–936. IEEE
Wood Adrienne, Rychlowska Magdalena, Korb Sebastian, Niedenthal Paula (2016) Fashioning the face: sensorimotor simulation contributes to facial expression recognition. Trends Cognit Sci 20(3):227–240
Article Google Scholar
Mitra S, Acharya T (2007) Gesture recognition: a survey. IEEE Trans Syst Man Cybern Part C (Appl Rev) 37(3):311–324
Article Google Scholar
Happy SL, Routray A (2017) Fuzzy histogram of optical flow orientations for micro-expression recognition. IEEE Trans Affect Comput 10(3):394–406
Article Google Scholar
Verma GK, Tiwary US (2014) Multimodal fusion framework: a multiresolution approach for emotion classification and recognition from physiological signals. NeuroImage 102:162–172
Article Google Scholar
Gunes H, Piccardi M (2005) Affect recognition from face and body: early fusion versus late fusion. In: 2005 IEEE international conference on systems, man and cybernetics, vol 4, pp 3437–3443. IEEE
Zhou X, Jin Y, Zhang H, Li S, Huang X (2016).A map of threats to validity of systematic literature reviews in software engineering. In: 2016 23rd Asia-Pacific software engineering conference (APSEC), pp 153–160. IEEE

Download references

Acknowledgements

This work was supported by the Scientific Research Projects Coordination Unit of Istanbul Kultur University with project number: IKU-BAP2012. Within the scope of the study, data were collected with the permission of the ethics committee of Istanbul Kultur University, with the decision dated 20.05.2020 and numbered 2020.29.

Author information

Gulin Dogan and Fatma Patlar Akbulut have contributed equally to this work.

Authors and Affiliations

Department of Computer Engineering, Istanbul Kultur University, Istanbul, 34158, Turkey
Gulin Dogan
Department of Software Engineering, Istanbul Kultur University, Istanbul, 34158, Turkey
Fatma Patlar Akbulut

Authors

Gulin Dogan
View author publications
You can also search for this author in PubMed Google Scholar
Fatma Patlar Akbulut
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatma Patlar Akbulut.

Ethics declarations

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dogan, G., Akbulut, F.P. Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress. Neural Comput & Applic 35, 24435–24454 (2023). https://doi.org/10.1007/s00521-023-09036-4

Download citation

Received: 06 December 2022
Accepted: 06 September 2023
Published: 03 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00521-023-09036-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress

Abstract

Access this article

Similar content being viewed by others

Introducing MDPSD, a Multimodal Dataset for Psychological Stress Detection

Evaluating a New Approach to Data Fusion in Wearable Physiological Sensors for Stress Monitoring

Issues and Challenges in Detecting Mental Stress from Multimodal Data Using Machine Intelligence

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress

Abstract

Access this article

Similar content being viewed by others

Introducing MDPSD, a Multimodal Dataset for Psychological Stress Detection

Evaluating a New Approach to Data Fusion in Wearable Physiological Sensors for Stress Monitoring

Issues and Challenges in Detecting Mental Stress from Multimodal Data Using Machine Intelligence

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation