Abstract
Depression, a major mental illness, has widely affected lives all over the world. Being depressed not only affects patients’ mood, but also has a negative impact on patients’ physical and mental health. It may lead to the lack of enthusiasm for daily life, low mental state, anxiety, irritability, anger and even suicidal tendencies. As the need to automatically detect depression using machine learning algorithms increases, an automatic depression detection method based on audio files and convolutional neural network (CNN) is proposed in this paper. First of all, we delete long silent sections of each audio file and splice the rest into a brand-new one. After that, add the label which represents if the participant is healthy or not to each file. Then, Mel frequency cepstrum coefficients (MFCCs), the features of speech signal, are extracted into matrix vector feature to represent the particular characteristics of participants’ own voice. Eventually, the features are imported into the convolutional neural network model to complete the model training and evaluation. The results on Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset show that the overall prediction accuracy is 0.85, and the average probability of correct prediction of a single file is 0.82.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mathers, C., Boerma, J.T., Fat, D.M.: The global burden of disease: 2004 update. World Health Organization (2008)
Mcpherson, A., Martin, C.R.: A narrative review of the Beck Depression Inventory (BDI) and implications for its use in an alcohol-dependent population. Psychiatric Mental Health Nurs. 17(1), 19–30 (2010)
Zimmerman, M., Chelminski, I., Posternak, M.: A review of studies of the Hamilton depression rating scale in healthy controls: implications for the definition of remission in treatment studies of depression. Nerv. Mental Disease 192(9), 595–601 (2004)
Andreasen, N.C.: The scale for the assessment of negative symptoms (SANS): conceptual and theoretical foundations. Br. J. Psychiatry Suppl. 13(7), 49–58 (1989)
Dham, S., Sharma, A., Dhall, A.: Depression scale recognition from audio, visual and text analysis. http://arxiv.org/abs/1709.05865 (2017)
Giannakopoulos, T., Smailis, C., Perantonis, S., et al.: Realtime depression estimation using mid-term audio features. In: Proceedings of CEUR Workshop, vol. 1213, pp. 41–46 (2014)
Hanai, T.A., Ghassemi, M., Glass, J.: Detecting depression with audio/text sequence modeling of interviews. In: Interspeech, pp. 1716–1720 (2018)
Vázquez-Romero, A., Gallardo-AntolĂn, A.: Automatic detection of depression in speech using ensemble convolutional neural networks. Entropy 22, 688 (2020)
Cong, Q., Feng, Z., Li, F.: XA-BiLSTM: a deep learning approach for depression detection in imbalanced data. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine BIBM, pp. 1624–1627 (2018)
Yang, L., Jiang, D., Xia, X., et al.: Multimodal measurement of depression using deep learning models. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 53–59. ACM (2017).
Mitra, V., Tsiartas, A., Shriberg, E.: Noise and reverberation effects on depression detection from speech. In: IEEE International Conference on Acoustics, pp. 5795–5799. IEEE (2016)
Yao, Z.-J., Bi, J., Chen, Y.-X.: Applying deep learning to individual and community health monitoring data: a survey. Int. J. Autom. Comput. 15(6), 643–655 (2018). https://doi.org/10.1007/s11633-018-1136-9
Yang, L., Jiang, D.M., He, L., et al.: Decision tree based depression classification from audio video and language information. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 89–96. ACM (2016)
Dham S., Sharma A., Dhall A.: Depression scale recognition from audio, visual and text analysis. arxiv.org https://arxiv.org/abs/1709.05865 (2018)
Wang, Z., Chen, L., Wang, L., et al.: Recognition of audio depression based on convolutional neural network and generative antagonism network model. IEEE Access 8, 101181–101191 (2020)
Rejaibi, E., Komaty, A., Meriaudeau, F., et al.: MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. PreprintarXiv:1909.07208 (2019)
Ma, X., Yang, H., Chen, Q., et al.: DepAudioNet: an efficient deep model for audio based depression classification. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, Co-located with ACM Multimedia 2016, pp. 35–42 (2016)
Huang, Z., Dong, M., Mao, Q., et al.: Speech emotion recognition using CNN. In: ACM International Conference on Multimedia, pp. 801–804. ACM (2014)
Huang, J.T., Li, J., Gong, Y.: An analysis of convolutional neural networks for speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Australia, pp. 4989–4993 (2015)
Parcollet, T., Zhang, Y., Morchid, M., et al.: Quaternion convolutional neural networks for end-to-end automatic speech recognition. In: Interspeech (2018)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhao, S., Li, Q., Li, C., Li, Y., Lu, K. (2021). A CNN-Based Method for Depression Detecting Form Audio. In: Wang, Y., Wang, W.Y.C., Yan, Z., Zhang, D. (eds) Digital Health and Medical Analytics. DHA 2020. Communications in Computer and Information Science, vol 1412. Springer, Singapore. https://doi.org/10.1007/978-981-16-3631-8_1
Download citation
DOI: https://doi.org/10.1007/978-981-16-3631-8_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3630-1
Online ISBN: 978-981-16-3631-8
eBook Packages: Computer ScienceComputer Science (R0)