Abstract
Nowadays, depression heavily affects humans’ physical and mental health. Depression occurs due to changes in mood, loss of interest, and stress, which leads to self-harm events and suicide. Thus analyzing depression is very important to reduce suicidal acts. In recent years, automatic depression evaluation has been developed in computer vision technology. Several models were investigated for depression analysis, but they are limited only to video and audio data analysis. In this paper, hybrid Artificial Intelligence (AI) based Multi-modal depression analysis was proposed in which the severity of depression from multi-modal data such as video, audio and text descriptors are extracted. Initially, the proposed approach estimates the Patient Health Questionnaire (PHQ) depression scale by a hybrid framework Residual Network based Deep Neural Network (D-ResNet), which computes the PHQ-8 score from video and audio features. Then, Paragraph Vector Kernel Extreme Learning Machine (PV-KELM) is developed to infer the mental and physical states of the individuals related to the psychoanalytic features of depression. It recognizes the absence (or) presence of the measured psychoanalytic symptoms. Finally, the estimated PHQ-8 score and psychoanalytic symptoms are extracted from the Residual Network based Deep Neural Network and the Paragraph Vector based Kernel Extreme Learning Machine, which is fed together into the ensemble classifier. In the ensemble classifier, three classifiers are used, namely Support Vector Machine (SVM), Naive-Bayes (NB), and Decision Tree (DT) classifier, to classify whether the individual is depressed or not. The proposed approach is implemented in PYTHON software, and the experiments will be carried out using the Distress Analysis Interview Corpus-Wizard of -OZ interview depression dataset. By using the proposed approach, the accuracy, precision, recall, F-measure, RMSE, MAE, JSD and Contextual similarity obtained are 0.89, 0.86, 0.86 and 0.86, 0.373, 0.35, 0.355 and 0.689 respectively. Our proposed approach has been compared with the state-of-the-art approaches, and the performance result shows the efficiency of the proposed approach.
Similar content being viewed by others
Data availability
Data sharing is not applicable to this article.
References
Alakus TB, Turkoglu I (2020) Comparison of deep learning approaches to predict COVID-19 infection. Chaos, Solitons Fractals 140:110120
Aloshban N, Esposito A, Vinciarelli A (2021) Language or Paralanguage, This is the Problem: Comparing Depressed and Non-Depressed Speakers Through the Analysis of Gated Multimodal Units Proc Interspeech 2021: 2496–2500.
Ansari H, Vijayvergia A and Kumar K (2018) Dcr-hmm: Depression detection based on content rating using hidden markov model. In 2018 Conference on information and communication technology (CICT), IEEE 1–6.
Avots E, Jermakovs K, Bachmann M, Päeske L, Ozcinar C, Anbarjafari G (2021) Ensemble approach for detection of depression using EEG features. arXiv preprint arXiv:2103.08467.
Cai H, Qu Z, Li Z, Zhang Y, Hu X, Hu B (2020) Feature-level fusion approaches based on multi-modal EEG data for depression recognition. Inform Fusion 59:127–138
Ceccarelli F, Mahmoud M (2021) Multimodal temporal machine learning for bipolar disorder and depression recognition. Pattern Anal Applic 1–12.
Chen Q, Chaturvedi I, Ji S, Cambria E (2021) Sequential fusion of facial appearance and dynamics for depression recognition. Pattern Recogn Lett 150:115–121
Chiu CY, Lane HY, Koh JL, Chen AL (2021) Multimodal depression detection on instagram considering time interval of posts. J Intell Inf Syst 56(1):25–47
Chow YY, Verdonschot M, McEvoy CT, Peeters G (2022) Associations between depression and cognition, mild cognitive impairment and dementia in persons with diabetes mellitus: a systematic review and meta-analysis.” Diabetes Research and Clinical Practice, Elsevier 109227.
Churi H, Keshri P, Khamkar S, Sankhe A (2021) A deep learning approach for depression classification using audio features.
Cohn JF, Kruez TS, Matthews I, Yang Y, Nguyen MH, Padilla MT, De la Torre F (2009) Detecting depression from facial actions and vocal prosody. In 2009 3rd international conference on affective computing and intelligent interaction and workshops, IEEE 1-7.
Cohn JF, Cummins N, Epps J, Goecke R, Joshi J, Scherer S (2018) Multi-modal assessment of depression from behavioral signals. In The Handbook of Multimodal-Multisensor Interfaces: Signal Processing, Architectures, and Detection of Emotion and Cognition-Volume 2: 375–417.
Dai Z, Zhou H, Ba Q, Zhou Y, Wang L, Li G (2021) Improving depression prediction using a novel feature selection algorithm coupled with context-aware analysis. J Affect Disord 295:1040–1048
Das NN, Kumar N, Kaur M, Kumar V, Singh D (2020) Automated deep transfer learning-based approach for detection of COVID-19 infection in chest X-rays. Elsevier, Irbm
Francese R, Attanasio P (2021) Supporting depression screening with multi-modal emotion detection. In CHItaly 2021: 14th biannual conference of the Italian SIGCHI chapter 1-8.
Gao S, Calhoun VD, Sui J (2018) Machine learning in major depression: from classification to treatment outcome prediction. CNS NeurosciTherapeut 24(11):1037–1052
Gray JP, Müller VI, Eickhoff SB, Fox PT (2020) Multi-modal abnormalities of brain structure and function in major depressive disorder: a meta-analysis of neuroimaging studies. Am J Psychiatr 177(5):422–434
Gui T, Zhu L, Zhang Q, Peng M, Zhou X, Ding K, Chen Z (2019) Cooperative multi-modal approach to depression detection in twitter. Proc AAAI Conf Artificial Intell 33(01):110–117
Islam M, Kabir MA, Ahmed A, Kamal ARM, Wang H, Ulhaq A (2018) Depression detection from social network data using machine learning techniques. Health Inform Sci Syst 6(1):1–12
Kwon I, Jo G, Shin K-S (2021) A deep neural network based on ResNet for predicting solutions of Poisson–Boltzmann equation. Electronics. Researchgate.net, 10(21): 2627.
Lalousis PA, Wood SJ, Schmaal L, Chisholm K, Griffiths SL, Reniers RL, PRONIA Consortium (2021) Heterogeneity and classification of recent onset psychosis and depression: a multi-modal machine learning approach. Schizophr Bull 47:1130–1140
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In International conference on machine learning, PMLR, 1188–1196.
Li M, Cao L, Zhai Q, Li P, Liu S, Li R, Lu S (2020) Method of Depression classification based on behavioral and physiological signals of eye movement Complexity, 2020
Liu X, Li L, Li M, Ren Z, Ma P (2021) Characterizing the subtype of anhedonia in major depressive disorder: a symptom-specific multi-modal MRI study. Psychiatry Res Neuroimaging 308:111239
Malhotra A, Jindal R (2020) Multi-modal deep learning based framework for detecting depression and suicidal behaviour by affective analysis of social media posts. EAI Endorsed Transac Pervasive Health Technol 6(21):e1
Mann P, Paes A, Matsushima EH (2020) See and read: detecting depression symptoms in higher education students using multi-modal social media data. In Proceedings of the International AAAI Conference on Web and social media 14: 440–451.
Meng Y, Speier W, Ong MK, Arnold CW (2021) Bidirectional representation learning from transformers using multimodal electronic health record data to predict depression. IEEE Journal of Biomedical and Health Informatics.
Morales MR (2018) Multimodal depression detection: an investigation of features and fusion techniques for automated systems. City University of New York.
Morales M, Scherer S, Levitan R (2018) A linguistically-informed fusion approach for multi-modal depression detection. In proceedings of the fifth workshop on computational linguistics and clinical psychology: from keyboard to clinic 13-24.
Nikolin S, Tan YY, Schwaab A, Moffa A, Loo CK and Martin D (2021) An investigation of working memory deficits in depression using the n-back task: a systematic review and meta-analysis. J Affective Disord, Elsevier, 284: 1–8.
Pinto G, Carvalho JM, Barros F, Soares SC, Pinho AJ, Brás S (2020) Multi-modal emotion evaluation: a physiological model for cost-effective emotion classification. Sensors 20(12):3510
Qureshi SA, Saha S, Hasanuzzaman M, Dias G (2019) Multitask representation learning for multi-modal estimation of depression level. IEEE Intell Syst 34(5):45–52
Rohanian M, Hough J, Purver M (2019) Detecting depression with word-level multimodal fusion. In Interspeech 1443–1447.
Rutowski T, Harati A, Lu Y, & Shriberg E (2019) Optimizing speech-input length for speaker-independent depression classification. In INTERSPEECH 3023–3027.
Shalu H, CN HS, Das A, Majumder S, Datar A, MS SM, Kadiwala J (2020) Depression status estimation by deep learning based hybrid multi-modal fusion model. arXiv preprint arXiv:2011.14966.
Sharma S, Kumar K, Singh N (2017) D-FES: deep facial expression recognition system. In 2017 conference on information and communication technology (CICT), IEEE1-6.
Sharma S, Kumar P, Kumar K (2017) LEXER: lexicon based emotion analyzer. In international conference on pattern recognition and machine intelligence, springer, Cham 373-379.
Sharma S, Shivhare SN, Singh N, Kumar K (2019) Computationally efficient ann model for small-scale problems. In Machine intelligence and signal analysis, Springer, Singapore 423–435.
Sharma S, Kumar K and Singh N (2020) Deep eigen space based ASL recognition system. IETE Journal of Research, Taylor and Francis, 1–11.
Shi Y, Song R, Wang Z, Zhang H, Zhu J, Yue Y, Zhao Y, Zhang Z (2021) Potential clinical value of circular RNAs as peripheral biomarkers for the diagnosis and treatment of major depressive disorder. EBio Med 66:103337
Shrestha A, Serra E, Spezzano F (2020) Multi-modal social and psycho-linguistic embedding via recurrent neural networks to identify depressed users in online forums. Network Model Analy Health Inform Bioinform 9(1):1–11
Singh H, Dhanak N, Ansari H and Kumar K (2017) HDML: habit detection with machine learning. In proceedings of the 7th international conference on computer and communication technology, 29-33.
Solieman H, Pustozerov EA (2021) The Detection of Depression Using Multi-modal Models Based on Text and Voice Quality Features. In 2021 IEEE conference of Russian young researchers in electrical and electronic engineering (ElConRus), IEEE 1843–1848.
Vidal-Ribas P, Janiri D, Doucet GE, Pornpattananangkul N, Nielson DM, Frangou S, Stringaris A (2021) Multimodal neuroimaging of suicidal thoughts and behaviors in a US population-based sample of school-age children. Am J Psychiatr 178(4):321–332
Vijayvergia A, Kumar K (2018) STAR: rating of reviewS by exploiting variation in emotions using trAnsferleaRning framework. In 2018 conference on information and communication technology (CICT), IEEE 1-6.
Vijayvergia A, Kumar K (2021) Selective shallow models strength integration for emotion detection using GloVe and LSTM. Multimedia Tools App, Springer 80(18):28349–28363
Villatoro-Tello E, Ramírez-de-la-Rosa G, Gática-Pérez D, Magimai-Doss M, Jiménez-Salazar H (2021) Approximating the Mental Lexicon from Clinical Interviews as a Support Tool for Depression Detection. In Proceedings of the 2021 International Conference on Multimodal Interaction 557–566.
Yang L, Jiang D, Sahli H (2018) Integrating deep and shallow models for multi-modal depression analysis—hybrid architectures. IEEE Trans Affect Comput
Yazdavar AH, Mahdavinejad MS, Bajaj G, Romine W, Sheth A, Monadjemi AH, Hitzler P (2020) Multimodal mental health analysis in social media. PLoS One 15(4):e0226248
Zhang X, Shen J, ud Din Z, Liu J, Wang G, Hu B (2019) Multimodal depression detection: fusion of electroencephalography and paralinguistic behaviors using a novel strategy for classifier ensemble. IEEE J Biomed Health Inform 23(6):2265–2275
Zheng W, Yan L, Gou C, Wang FY (2020). Graph attention model embedded with multi-modal knowledge for depression detection. In 2020 IEEE international conference on multimedia and expo (ICME), IEEE 1-6.
Zhou X, Huang P, Liu H, &Niu S (2019) Learning content-adaptive feature pooling for facial depression recognition in videos. Electron Lett 55(11): 648–650.
Author information
Authors and Affiliations
Contributions
All authors have equal contributions in this work.
Corresponding author
Ethics declarations
Conflict of interest
Authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate
All the authors involved have agreed to participate in this submitted article.
Consent to publish
All the authors involved in this manuscript give full consent for publication of this submitted article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
T J, S.J., Jacob, I.J. & Mandava, A.K. D-ResNet-PVKELM: deep neural network and paragraph vector based kernel extreme machine learning model for multimodal depression analysis. Multimed Tools Appl 82, 25973–26004 (2023). https://doi.org/10.1007/s11042-023-14351-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14351-y