Skip to main content
Log in

Deep learning for Depression Recognition from Speech

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript


In recent years, depression has been widely concerned, which makes people depressed, even suicidal, causing serious adverse consequences. In this paper, a multi information joint decision algorithm model is established by means of emotion recognition. The model is used to analyze the representative data of the subjects, and to assist in diagnosis of whether the subjects have depression. The main work is as follows: On the basis of exploring the speech characteristics of people with depressive disorder, this paper conducts an in-depth study of speech assisted depression diagnosis based on the speech data in the DAIC-WOZ dataset. First, the speech information is preprocessed, including speech signal pre emphasis, framing windowing, endpoint detection, noise reduction, etc. Secondly, OpenSmile is used to extract the features of speech signals, and the speech features that the features can reflect are studied and analyzed in depth. Then feature selection is carried out based on the influence of speech features and feature combination on depression diagnosis. Then, principal component analysis is used to reduce the dimension of data features. Finally, the convolutional neural network is used to modeling, testing and result analysis showed that the voice based diagnosis of depression was as high as 87%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others


  1. Naujokat E, Perkuhn M, Harris M, Norra C (2009) Depression detection system

  2. Rafiqul IM, Ashad KM, Ashir A, Kamal ARM, Hua W, Anwaar U (2018) Depression detection from social network data using machine learning techniques. Health Inf Sci Syst 6(1):8–18

    Article  Google Scholar 

  3. Ryder AG, Chentsova-Dutton YE (2012) Depression in china: integrating developmental psychopathology and cultural-clinical psychology. J Clin Child Adolesc Psychol 41(5):682–694

    Article  Google Scholar 

  4. He YC, Zhang B, Qu W, Ning J, Quan HY, Xia Y, Yao Y, Han M. (2014) Value of serum monoamine neurotransmitters and their metabolites in diagnosis of comorbid anxiety and depression and major depressive disorder. Journal of third military medical university 36(08):806–810

    Google Scholar 

  5. Gunes H, Pantic M (2010) Automatic, dimensional and continuous emotion recognition. Int J Synthetic Emotions 1(1):68–99

    Article  Google Scholar 

  6. Mph R (2012) Epidemiologic evidence concerning the bereavement exclusion in major depression—reply:bereavement and the diagnosis of major depressive episode in the national epidemiologic survey on alcohol and related conditions. JAMA Psychiat 69(11):1179–1181

    Google Scholar 

  7. Subhashree R, Rathna GN (2016) Speech emotion recognition: Performance analysis based on fused algorithms and gmm modelling. Indian Journal of Science and Technology 9(11)

  8. France DJ, Shiavi RG (2000) Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans Biomed Eng 47(7):829–837

    Article  Google Scholar 

  9. Jiang H, Hu B, Liu Z, Yan L, Wang T, Liu F, Kang H, Li X (2017) Investigation of different speech types and emotions for detecting depression using different classifiers. Speech Communication

  10. Ooi KEB, Lech M, Allen NB (2014) Prediction of major depression in adolescents using an optimized multi-channel weighted speech classification system. Biomed Signal Process Control 14:228–239

    Article  Google Scholar 

  11. Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2010) Front-end factor analysis for speaker verification. IEEE Transactions on Audio Speech, and Language Processing 19(4):788–798

    Article  Google Scholar 

  12. Huang Z, Epps J, Joachim D (2022) Investigation of speech landmark patterns for depression detection. IEEE Trans Affect Comput 13(2):666–679

    Article  Google Scholar 

  13. Lorenzo-Trueba J, Henter GE, Takaki S, Yamagishi J, Morino Y, Ochiai Y (2018) Investigating different representations for modeling and controlling multiple emotions in dnn-based speech synthesis. Speech Comm 99:135–143

    Article  Google Scholar 

  14. Asgari M, Shafran I (2018) Improvements to harmonic model for extracting better speech features in clinical applications. Computer Speech and Language 47:298–313

    Article  Google Scholar 

  15. Rajisha TM, Sunija AP, Riyas KS (2016) Performance analysis of malayalam language speech emotion recognition system using ann/svm. Procedia Technol 24:1097–1104

    Article  Google Scholar 

  16. Lang H, Cui C (2018) Automated depression analysis using convolutional neural networks from speech. J Biomed Inform 83:103–111

    Article  Google Scholar 

  17. Nie W, Ren M, Nie J, Zhao S (2021) C-gcn: Correlation based graph convolutional network for audio-video emotion recognition. IEEE Trans Multimedia 23:3793–3804

    Article  Google Scholar 

  18. James CM, Adam PV, Douglas EF, William RL (2012) Vocal acoustic biomarkers of depression severity and treatment response. Biol Psychiatry 72(7):580–587

    Article  Google Scholar 

Download references


This work was partially supported by research funding of Jinhua Advanced Research Institute (G202209) and research funding of Jinhua Advanced Research Institute (G202207).


This work was partially supported by research funding from Jinhua Advanced Research Institute (G202209) and research funding of Jinhua Advanced Research Institute (G202207).

Author information

Authors and Affiliations



Tian Han provided the idea of this paper. Zhu Zhang Established the model, analyzed the results and finished the manuscript. Jing Xu extracted the features and selected the Features.

Corresponding author

Correspondence to Zhang Zhu.

Ethics declarations

Ethics approval

Not applicable (The data in this paper is from public data sets and there is not other ethical content.)

Consent for Publication


Conflict of Interests

No potential conflict or competing of interest was reported by the authors.

Additional information

Availability of data and materials

Data in this paper will be made available on reasonable request

Consent to participate


Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Han Tian and Xu Jing are contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, H., Zhu, Z. & Jing, X. Deep learning for Depression Recognition from Speech. Mobile Netw Appl (2023).

Download citation

  • Accepted:

  • Published:

  • DOI: