Skip to main content


Log in

Detection of major depressive disorder using vocal acoustic analysis and machine learning—an exploratory study

  • Original Article
  • Published:
Research on Biomedical Engineering Aims and scope Submit manuscript



Diagnosis and treatment in psychiatry are still highly dependent on reports from patients and on clinician judgment. This fact makes them prone to memory and subjectivity biases. As for other medical fields, where objective biomarkers are available, there has been an increasing interest in the development of such tools in psychiatry. To this end, vocal acoustic parameters have been recently studied as possible objective biomarkers, instead of otherwise invasive and costly methods. Patients suffering from different mental disorders, such as major depressive disorder (MDD), may present with alterations of speech. These can be described as uninteresting, monotonous, and spiritless speech and low voice.


Thirty-three individuals (11 males) over 18 years old were selected, 22 of which being previously diagnosed with MDD and 11 healthy controls. Their speech was recorded in naturalistic settings, during a routine medical evaluation for psychiatric patients, and in different environments for healthy controls. Voices from third parties were removed. The recordings were submitted to a vocal feature extraction algorithm, and to different machine learning classification techniques.


The results showed that random tree models with 100 trees provided the greatest classification performances. It achieved mean accuracy of 87.5575% ± 1.9490, mean kappa index, sensitivity, and specificity of 0.7508 ± 0.0319, 0.9149 ± 0.0204, and 0.8354 ± 0.0254, respectively, for the detection of MDD.


The use of machine learning classifiers with vocal acoustic features appears to be very promising for the detection of major depressive disorder in this exploratory study, but further experiments with a larger sample will be necessary to validate our findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others


  • Afshan A, Guo J, Park S J, Ravi, V, Flint, J, Alwan, A (2018). Effectiveness of voice quality features in detecting depression. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2018-Septe(September), 1676–1680.

  • Alghowinem, S, Goecke, R, Wagner, M, Epps, J., Breakspear, M, Parker, G (2012). From joyous to clinically depressed: mood detection using spontaneous speech. In Proceedings of the 25th International Florida Artificial Intelligence Research Society Conference, FLAIRS-25 (pp. 141–146).

  • Alghowinem S, Goecke R, Wagner M, Epps J. Detecting depression: a comparison between spontaneous and read speech. IEEE. 2013a:7547–51.

  • Alghowinem, S, Goecke, R, Wagner, M, Epps, J., Gedeon, T, Breakspear, M, Parker, G (2013b). A comparative study of different classifiers for detecting depression from spontaneous speech. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 8022–8026).

  • Alpert M, Pouget ER, Silva RR. Reflections of depression in acoustic measures of the patient’s speech. J Affect Disord. 2001;66:59–69.

    Article  Google Scholar 

  • American Psychiatric Association. (2013). DSM-5 - Manual Diagnóstico e Estatístico de Transtornos Mentais. Artmed (5.). Porto Alegre: Artmed.

  • Arjmandi MK, Pooyan M. An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine. Biomed. Signal Process. Control. 2012;7(2012):3–19.

    Article  Google Scholar 

  • Baca-Garcia E, Perez-Rodriguez MM, Basurte-Villamor I, Fernandez Del Moral AL, Jimenez-Arriero MA, Gonzalez De Rivera JL, et al. Diagnostic stability of psychiatric disorders in clinical practice. Br J Psychiatry. 2007;190(MAR.):210–6.

    Article  Google Scholar 

  • Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, et al. Automated analysis of free speech predicts psychosis onset in high-risk youths. Nature Partner Journals. 2015;1:15030.

    Article  Google Scholar 

  • Bzdok, D, Meyer-lindenberg, A (2018). Machine learning for precision psychiatry: opportunities and challenges. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 3, 223–230.

  • Cannizzaro M, Harel B, Reilly N, Chappell P, Snyder PJ. Voice acoustical measurement of the severity of major depression. Brain Cogn. 2004;56:30–5.

    Article  Google Scholar 

  • Cohn, J. F, Kruez, T. S, Matthews, I, Yang, Y, Nguyen, M. H, Padilla, M. T, … De La Torre, F. (2009). Detecting depression from facial actions and vocal prosody. Proceedings - 2009 3rd International conference on affective computing and intelligent interaction and workshops, ACII 2009, (October).

  • Commowick O, Istace A, Kain M, Laurent B, Leray F, Simon M, et al. Objective evaluation of multiple sclerosis lesion segmentation using a data management and processing infrastructure. Sci Rep. 2018;8(1):1–17.

    Article  Google Scholar 

  • Cordeiro FR, Lima SM, Silva-Filho AG, Santos WP. Segmentation of mammography by applying extreme learning machine in tumor detection, In International Conference on Intelligent Data Engineering and Automated Learning (pp. 92–100). Berlin Heidelberg: Springer; 2012.

    Google Scholar 

  • Cordeiro FR, Santos WP, Silva-Filho AG. A semi-supervised fuzzy GrowCut algorithm to segment and classify regions of interest of mammographic images. Expert Syst Appl. 2016;65:116–26.

    Article  Google Scholar 

  • Cummins N, Epps J, Sethu V, Krajewski J. Variability compensation in small data: oversampled extraction of i-vectors for the classification of depressed speech. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - proceedings; 2014. p. 970–4.

    Chapter  Google Scholar 

  • Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF. A review of depression and suicide risk assessment using speech analysis. Speech Comm. 2015;71(April):10–49.

    Article  Google Scholar 

  • da Silva Junior M, de Freitas RC, dos Santos WP, da Silva WWA, Rodrigues MCA, Conde EFQ. Exploratory study of the effect of binaural beat stimulation on the EEG activity pattern in resting state using artificial neural networks. Cogn Syst Res. 2019;54:1–20.

    Article  Google Scholar 

  • Darby JK, Hollien H. Vocal and speech patterns of depressive patients. Folia Phoniatr. 1977;29:279–91.

    Article  Google Scholar 

  • de Freitas RC, Alves R, da Silva Filho AG, de Souza RE, Bezerra BL, dos Santos WP. Electromyography-controlled car: a proof of concept based on surface electromyography, extreme learning machines and low-cost open hardware. Comput Electr Eng. 2019;73:167–79.

    Article  Google Scholar 

  • de Lima SM, da Silva-Filho AG, dos Santos WP. Detection and classification of masses in mammographic images in a multi-kernel approach. Comput Methods Prog Biomed. 2016;134:11–29.

    Article  Google Scholar 

  • dos Santos, W. P, de Souza, R. E, dos Santos Filho, P. B (2007). Evaluation of Alzheimer’s disease by analysis of MR images using multilayer perceptrons and Kohonen SOM classifiers as an alternative to the ADC maps. In 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 2118–2121).

  • dos Santos, W. P, de Assis, F. M, de Souza, R. E, Santos, D, Filho, P. B (2008). Evaluation of Alzheimer’s disease by analysis of MR images using objective dialectical classifiers as an alternative to ADC maps. In 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 5506–5509).

  • dos Santos WP, De Assis FM, De Souza RE, Mendes PB, De Souza Monteiro HS, Alves HD. A dialectical method to classify Alzheimer’s magnetic resonance images. Evol Comput. 2009;473.

  • Faurholt-Jepsen M, Busk J, Frost M, Vinberg M, Christensen EM, Winther O, et al. Voice analysis as an objective state marker in bipolar disorder. Transl Psychiatry. 2016;6(7):e856–8.

    Article  Google Scholar 

  • Gonçalves DM, Stein AT, Kapczinski F. Avaliação de desempenho do self-reporting questionnaire Como instrumento de rastreamento psiquiátrico: um estudo comparativo com o structured clinical interview for DSM-IV-TR. Cad. Saude Publica. 2008;24(2):380–90.

    Article  Google Scholar 

  • Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry. 1960;23:56–62.

    Article  Google Scholar 

  • Hasan R, Jamil M, Rabbani G, Rahman S. Speaker identification using Mel frequency cepstral coefficients. In: 3rd International Conference on Electrical & Computer Engineering ICECE 2004, (December); 2004. p. 565–8.

    Google Scholar 

  • Hashim NW, Wilkes M, Salomon R, Meggs J, France DJ. Evaluation of voice acoustics as predictors of clinical depression scores. J Voice. 2016;31:256.e1–6.

    Article  Google Scholar 

  • Higuchi, M, Tokuno, S, Nakamura, M, Shinohara, S (2018). Classification of bipolar disorder, major depressive disorder, and healthy state using voice. Asian Journal of Pharmaceutical and Clinical Research, 11(3), 89–93.

  • Hönig, F, Batliner, A, Nöth, E, Schnieder, S, Krajewski, J. (2014). Automatic modelling of depressed speech: relevant features and relevance of gender. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, (444), 1248–1252.

  • Iwabuchi SJ, Liddle PF, Palaniyappan L. Clinical utility of machine-learning approaches in schizophrenia: improving diagnostic confidence for translational neuroimaging. Front. Psychol. 2013;4(August):1–9.

    Article  Google Scholar 

  • Jiang H, Hu B, Liu Z, Yan L, Wang T, Liu F, et al. Investigation of different speech types and emotions for detecting depression using different classifiers. Speech Comm. 2017;90:39–46.

    Article  Google Scholar 

  • Jiang H, Hu B, Liu Z, Wang G, Zhang L, Li X, et al. Detecting depression using an ensemble logistic regression model based on multiple speech features. Comput. Math. Methods Med. 2018;2018:2018–9.

    Article  MATH  Google Scholar 

  • Joshi J, Goecke R, Alghowinem S, Dhall A, Wagner M, Epps J, et al. Multimodal assistive technologies for depression diagnosis and monitoring. Journal on Multimodal User Interfaces. 2013;7(3):217–28.

    Article  Google Scholar 

  • Liu, Z, Hu, B, Yan, L, Wang, T., Liu, F., Li, X., Kang, H. (2015). Detection of depression in speech. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII) (pp. 743–747).

  • Low LSA, Maddage NC, Lech M, Sheeber LB, Allen NB. Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans Biomed Eng. 2011;58(3 PART 1):574–86.

    Article  Google Scholar 

  • Maxhuni A, Muñoz-meléndez A, Osmani V, Perez H, Mayora O, Morales EF. Classification of bipolar disorder episodes based on analysis of voice and motor activity of patients. Pervasive Mob. Comput. 2016;31(1):50–66.

    Article  Google Scholar 

  • Mcginnis EW, Anderau SP, Hruschak J, Gurchiek RD, Lopez-duran NL, Fitzgerald K, et al. Giving voice to vulnerable children: machine learning analysis of speech detects anxiety and depression in early childhood. IEEE. 2019;23:1–8.

    Article  Google Scholar 

  • Miguel EC, Gentil V, Gattaz WF. Clínica Psiquiátrica. Barueri: Manole; 2011.

    Google Scholar 

  • Mitra, V., Shriberg, E (2015). Effects of feature type, Learning Algorithm and Speaking Style for Depression Detection from Speech IEEE, 4774–4778.

  • Mundt JC, Snyder PJ, Cannizzaro MS, Chappie K, Geralts DS. Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J Neurolinguistics. 2007;20:50–64.

    Article  Google Scholar 

  • Mundt JC, Vogel AP, Feltner DE, Lenderking WR. Vocal acoustic biomarkers of depression severity and treatment response. Biol Psychiatry. 2012;72(7):580–7.

    Article  Google Scholar 

  • Ooi KEB, Lech M, Brian Allen N. Multichannel weighted speech classification system for prediction of major depression in adolescents. IEEE Trans Biomed Eng. 2013;60(2):497–506.

    Article  Google Scholar 

  • Ozdas A, Shiavi RG, Silverman SE, Silverman MK, Wilkes DM. Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Trans Biomed Eng. 2004;51(9):1530–40.

    Article  Google Scholar 

  • Rodrigues AL, de Santana MA, Azevedo WW, Bezerra RS, Barbosa VA, de Lima RC, et al. Identification of mammary lesions in thermographic images: feature selection study using genetic algorithms and particle swarm optimization. Res Bio Eng. 2019;35(3):213–22.

    Article  Google Scholar 

  • Sadock B, Sadock V, Ruiz P. Compêndio de Psiquiatria: Ciência do Comportamento e Psiquiatria Clínica (11.). Porto Alegre: Artmed; 2017.

    Google Scholar 

  • Santos KOB, Araújo TM, Pinho PS, Silva ACC. Avaliação de um Instrumento de Mensuração de Morbidade Psíquica. Revista Baiana de Saúde Pública. 2010;34(3):544–60.

    Article  Google Scholar 

  • Scherer, S., Stratou, G., Gratch, J., Morency, L. P. (2013). Investigating voice quality as a speaker-independent indicator of depression and PTSD. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, (August), 847–851.

  • Sellam V, Jagadeesan J. Classification of normal and pathological voice using SVM and RBFNN. Journal of Signal and Information Processing. 2014;5:1–7.

    Article  Google Scholar 

  • Vanello, N., Guidi, A., Gentili, C., Werner, S., Bertschy, G., Valenza, G., … Scilingo, E. P. (2012). Speech analysis for mood state characterization in bipolar patients. In 34th Annual International Conference of the IEEE EMBS (pp. 2104–2107).

  • Wang X, Zhang J, Yan Y. Discrimination between pathological and normal voices using GMM-SVM approach. J Voice. 2011;25(1):38–43.

    Article  Google Scholar 

  • Weinberger AH, Gbedemah M, Martinez AM, Nash D, Galea S, Goodwin RD. Trends in depression prevalence in the USA from 2005 to 2015: widening disparities in vulnerable groups. Psychol Med. 2017;48:1–10.

    Article  Google Scholar 

  • World Health Organization. Depression and other common mental disorders global health estimates. Geneva: Switzerland; 2017.

    Google Scholar 

  • World Health Organization. (2018). Depression. Retrieved November 11, 2019, from

  • Zimmerman M, Martinez JH, Young D, Chelminski I, Dalrymple K. Severity classification on the Hamilton depression rating scale. J Affect Disord. 2013;150(2):384–8.

    Article  Google Scholar 

Download references


The authors are grateful to the Brazilian research agency CNPq for the partial financial support of this research.


This study was partially funded by the Brazilian research agency CNPq.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Wellington Pinheiro dos Santos.

Ethics declarations

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Espinola, C.W., Gomes, J.C., Pereira, J.M.S. et al. Detection of major depressive disorder using vocal acoustic analysis and machine learning—an exploratory study. Res. Biomed. Eng. 37, 53–64 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: