A CNN-Based Method for Depression Detecting Form Audio

Zhao, Shuangshuang; Li, Qingqing; Li, Chenbin; Li, Yu; Lu, Ke

doi:10.1007/978-981-16-3631-8_1

Shuangshuang Zhao⁹,
Qingqing Li⁹,
Chenbin Li⁹,
Yu Li⁹ &
…
Ke Lu⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1412))

Included in the following conference series:

International Conference on Digital Health and Medical Analytics

513 Accesses
2 Citations

Abstract

Depression, a major mental illness, has widely affected lives all over the world. Being depressed not only affects patients’ mood, but also has a negative impact on patients’ physical and mental health. It may lead to the lack of enthusiasm for daily life, low mental state, anxiety, irritability, anger and even suicidal tendencies. As the need to automatically detect depression using machine learning algorithms increases, an automatic depression detection method based on audio files and convolutional neural network (CNN) is proposed in this paper. First of all, we delete long silent sections of each audio file and splice the rest into a brand-new one. After that, add the label which represents if the participant is healthy or not to each file. Then, Mel frequency cepstrum coefficients (MFCCs), the features of speech signal, are extracted into matrix vector feature to represent the particular characteristics of participants’ own voice. Eventually, the features are imported into the convolutional neural network model to complete the model training and evaluation. The results on Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset show that the overall prediction accuracy is 0.85, and the average probability of correct prediction of a single file is 0.82.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mathers, C., Boerma, J.T., Fat, D.M.: The global burden of disease: 2004 update. World Health Organization (2008)
Google Scholar
Mcpherson, A., Martin, C.R.: A narrative review of the Beck Depression Inventory (BDI) and implications for its use in an alcohol-dependent population. Psychiatric Mental Health Nurs. 17(1), 19–30 (2010)
Article Google Scholar
Zimmerman, M., Chelminski, I., Posternak, M.: A review of studies of the Hamilton depression rating scale in healthy controls: implications for the definition of remission in treatment studies of depression. Nerv. Mental Disease 192(9), 595–601 (2004)
Article Google Scholar
Andreasen, N.C.: The scale for the assessment of negative symptoms (SANS): conceptual and theoretical foundations. Br. J. Psychiatry Suppl. 13(7), 49–58 (1989)
Google Scholar
Dham, S., Sharma, A., Dhall, A.: Depression scale recognition from audio, visual and text analysis. http://arxiv.org/abs/1709.05865 (2017)
Giannakopoulos, T., Smailis, C., Perantonis, S., et al.: Realtime depression estimation using mid-term audio features. In: Proceedings of CEUR Workshop, vol. 1213, pp. 41–46 (2014)
Google Scholar
Hanai, T.A., Ghassemi, M., Glass, J.: Detecting depression with audio/text sequence modeling of interviews. In: Interspeech, pp. 1716–1720 (2018)
Google Scholar
Vázquez-Romero, A., Gallardo-Antolín, A.: Automatic detection of depression in speech using ensemble convolutional neural networks. Entropy 22, 688 (2020)
Article Google Scholar
Cong, Q., Feng, Z., Li, F.: XA-BiLSTM: a deep learning approach for depression detection in imbalanced data. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine BIBM, pp. 1624–1627 (2018)
Google Scholar
Yang, L., Jiang, D., Xia, X., et al.: Multimodal measurement of depression using deep learning models. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 53–59. ACM (2017).
Google Scholar
Mitra, V., Tsiartas, A., Shriberg, E.: Noise and reverberation effects on depression detection from speech. In: IEEE International Conference on Acoustics, pp. 5795–5799. IEEE (2016)
Google Scholar
Yao, Z.-J., Bi, J., Chen, Y.-X.: Applying deep learning to individual and community health monitoring data: a survey. Int. J. Autom. Comput. 15(6), 643–655 (2018). https://doi.org/10.1007/s11633-018-1136-9
Article Google Scholar
Yang, L., Jiang, D.M., He, L., et al.: Decision tree based depression classification from audio video and language information. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 89–96. ACM (2016)
Google Scholar
Dham S., Sharma A., Dhall A.: Depression scale recognition from audio, visual and text analysis. arxiv.org https://arxiv.org/abs/1709.05865 (2018)
Wang, Z., Chen, L., Wang, L., et al.: Recognition of audio depression based on convolutional neural network and generative antagonism network model. IEEE Access 8, 101181–101191 (2020)
Article Google Scholar
Rejaibi, E., Komaty, A., Meriaudeau, F., et al.: MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. PreprintarXiv:1909.07208 (2019)
Ma, X., Yang, H., Chen, Q., et al.: DepAudioNet: an efficient deep model for audio based depression classification. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, Co-located with ACM Multimedia 2016, pp. 35–42 (2016)
Google Scholar
Huang, Z., Dong, M., Mao, Q., et al.: Speech emotion recognition using CNN. In: ACM International Conference on Multimedia, pp. 801–804. ACM (2014)
Google Scholar
Huang, J.T., Li, J., Gong, Y.: An analysis of convolutional neural networks for speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Australia, pp. 4989–4993 (2015)
Google Scholar
Parcollet, T., Zhang, Y., Morchid, M., et al.: Quaternion convolutional neural networks for end-to-end automatic speech recognition. In: Interspeech (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Anhui University of Technology, Maanshan, 243032, China
Shuangshuang Zhao, Qingqing Li, Chenbin Li, Yu Li & Ke Lu

Authors

Shuangshuang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Qingqing Li
View author publications
You can also search for this author in PubMed Google Scholar
Chenbin Li
View author publications
You can also search for this author in PubMed Google Scholar
Yu Li
View author publications
You can also search for this author in PubMed Google Scholar
Ke Lu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of Sheffield, Sheffield, UK
Yichuan Wang
University of Waikato, Hamilton, New Zealand
William Yu Chung Wang
Beijing Institute of Technology, Beijing, China
Zhijun Yan
University of North Carolina at Charlotte, Charlotte, NC, USA
Dongsong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, S., Li, Q., Li, C., Li, Y., Lu, K. (2021). A CNN-Based Method for Depression Detecting Form Audio. In: Wang, Y., Wang, W.Y.C., Yan, Z., Zhang, D. (eds) Digital Health and Medical Analytics. DHA 2020. Communications in Computer and Information Science, vol 1412. Springer, Singapore. https://doi.org/10.1007/978-981-16-3631-8_1

Download citation

DOI: https://doi.org/10.1007/978-981-16-3631-8_1
Published: 04 July 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3630-1
Online ISBN: 978-981-16-3631-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics