Skip to main content

A CNN-Based Method for Depression Detecting Form Audio

  • Conference paper
  • First Online:
Digital Health and Medical Analytics (DHA 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1412))

Included in the following conference series:

Abstract

Depression, a major mental illness, has widely affected lives all over the world. Being depressed not only affects patients’ mood, but also has a negative impact on patients’ physical and mental health. It may lead to the lack of enthusiasm for daily life, low mental state, anxiety, irritability, anger and even suicidal tendencies. As the need to automatically detect depression using machine learning algorithms increases, an automatic depression detection method based on audio files and convolutional neural network (CNN) is proposed in this paper. First of all, we delete long silent sections of each audio file and splice the rest into a brand-new one. After that, add the label which represents if the participant is healthy or not to each file. Then, Mel frequency cepstrum coefficients (MFCCs), the features of speech signal, are extracted into matrix vector feature to represent the particular characteristics of participants’ own voice. Eventually, the features are imported into the convolutional neural network model to complete the model training and evaluation. The results on Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset show that the overall prediction accuracy is 0.85, and the average probability of correct prediction of a single file is 0.82.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mathers, C., Boerma, J.T., Fat, D.M.: The global burden of disease: 2004 update. World Health Organization (2008)

    Google Scholar 

  2. Mcpherson, A., Martin, C.R.: A narrative review of the Beck Depression Inventory (BDI) and implications for its use in an alcohol-dependent population. Psychiatric Mental Health Nurs. 17(1), 19–30 (2010)

    Article  Google Scholar 

  3. Zimmerman, M., Chelminski, I., Posternak, M.: A review of studies of the Hamilton depression rating scale in healthy controls: implications for the definition of remission in treatment studies of depression. Nerv. Mental Disease 192(9), 595–601 (2004)

    Article  Google Scholar 

  4. Andreasen, N.C.: The scale for the assessment of negative symptoms (SANS): conceptual and theoretical foundations. Br. J. Psychiatry Suppl. 13(7), 49–58 (1989)

    Google Scholar 

  5. Dham, S., Sharma, A., Dhall, A.: Depression scale recognition from audio, visual and text analysis. http://arxiv.org/abs/1709.05865 (2017)

  6. Giannakopoulos, T., Smailis, C., Perantonis, S., et al.: Realtime depression estimation using mid-term audio features. In: Proceedings of CEUR Workshop, vol. 1213, pp. 41–46 (2014)

    Google Scholar 

  7. Hanai, T.A., Ghassemi, M., Glass, J.: Detecting depression with audio/text sequence modeling of interviews. In: Interspeech, pp. 1716–1720 (2018)

    Google Scholar 

  8. Vázquez-Romero, A., Gallardo-Antolín, A.: Automatic detection of depression in speech using ensemble convolutional neural networks. Entropy 22, 688 (2020)

    Article  Google Scholar 

  9. Cong, Q., Feng, Z., Li, F.: XA-BiLSTM: a deep learning approach for depression detection in imbalanced data. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine BIBM, pp. 1624–1627 (2018)

    Google Scholar 

  10. Yang, L., Jiang, D., Xia, X., et al.: Multimodal measurement of depression using deep learning models. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 53–59. ACM (2017).

    Google Scholar 

  11. Mitra, V., Tsiartas, A., Shriberg, E.: Noise and reverberation effects on depression detection from speech. In: IEEE International Conference on Acoustics, pp. 5795–5799. IEEE (2016)

    Google Scholar 

  12. Yao, Z.-J., Bi, J., Chen, Y.-X.: Applying deep learning to individual and community health monitoring data: a survey. Int. J. Autom. Comput. 15(6), 643–655 (2018). https://doi.org/10.1007/s11633-018-1136-9

    Article  Google Scholar 

  13. Yang, L., Jiang, D.M., He, L., et al.: Decision tree based depression classification from audio video and language information. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 89–96. ACM (2016)

    Google Scholar 

  14. Dham S., Sharma A., Dhall A.: Depression scale recognition from audio, visual and text analysis. arxiv.org https://arxiv.org/abs/1709.05865 (2018)

  15. Wang, Z., Chen, L., Wang, L., et al.: Recognition of audio depression based on convolutional neural network and generative antagonism network model. IEEE Access 8, 101181–101191 (2020)

    Article  Google Scholar 

  16. Rejaibi, E., Komaty, A., Meriaudeau, F., et al.: MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. PreprintarXiv:1909.07208 (2019)

  17. Ma, X., Yang, H., Chen, Q., et al.: DepAudioNet: an efficient deep model for audio based depression classification. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, Co-located with ACM Multimedia 2016, pp. 35–42 (2016)

    Google Scholar 

  18. Huang, Z., Dong, M., Mao, Q., et al.: Speech emotion recognition using CNN. In: ACM International Conference on Multimedia, pp. 801–804. ACM (2014)

    Google Scholar 

  19. Huang, J.T., Li, J., Gong, Y.: An analysis of convolutional neural networks for speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Australia, pp. 4989–4993 (2015)

    Google Scholar 

  20. Parcollet, T., Zhang, Y., Morchid, M., et al.: Quaternion convolutional neural networks for end-to-end automatic speech recognition. In: Interspeech (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, S., Li, Q., Li, C., Li, Y., Lu, K. (2021). A CNN-Based Method for Depression Detecting Form Audio. In: Wang, Y., Wang, W.Y.C., Yan, Z., Zhang, D. (eds) Digital Health and Medical Analytics. DHA 2020. Communications in Computer and Information Science, vol 1412. Springer, Singapore. https://doi.org/10.1007/978-981-16-3631-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-3631-8_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-3630-1

  • Online ISBN: 978-981-16-3631-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics