Depression Detection by Person’s Voice

Zavorina, Evgeniya; Makarov, Ilya

doi:10.1007/978-3-031-16500-9_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13217))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

395 Accesses

Abstract

In this work, a machine learning algorithm is proposed to detect depression. The Transformer encoder network is considered and compared with top baseline approaches. Low-level features are extracted from audio recordings and then are augmented to overcome the problem of the small size of available dataset. The Transformer network achieves recognition accuracy of 73.51% on DAIC-WOZ database, which compare favourably to the accuracy of 65.85% and 66.35% obtained by traditional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/IliaZenkov/transformer-cnn-emotion-recognition.

References

Al Hanai, T., Ghassemi, M.M., Glass, J.R.: Detecting depression with audio/text sequence modeling of interviews. In: Interspeech, pp. 1716–1720 (2018)
Google Scholar
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182. PMLR (2016)
Google Scholar
Ananyeva, M., Makarov, I., Pendiukhov, M.: GSM: inductive learning on dynamic graph embeddings. In: Bychkov, I., Kalyagin, V.A., Pardalos, P.M., Prokopyev, O. (eds.) NET 2018. SPMS, vol. 315, pp. 85–99. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37157-9_6
Chapter Google Scholar
American Psychiatric Association et al.: Diagnostic and Statistical Manual of Mental Disorders: DSM-5. Arlington (2013)
Google Scholar
Averchenkova, A., et al.: Collaborator recommender system. In: Bychkov, I., Kalyagin, V.A., Pardalos, P.M., Prokopyev, O. (eds.) NET 2018. SPMS, vol. 315, pp. 101–119. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37157-9_7
Chapter Google Scholar
Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949. IEEE (2016)
Google Scholar
Bhargava, M., Rose, R.: Architectures for deep neural network based acoustic models defined over windowed speech waveforms. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016)
Google Scholar
Cohn, J.F., et al.: Detecting depression from facial actions and vocal prosody. In: 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, pp. 1–7. IEEE (2009)
Google Scholar
Dong, L., Xu, S., Xu, B.: Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5888. IEEE (2018)
Google Scholar
France, D.J., Shiavi, R.G., Silverman, S., Silverman, M., Wilkes, M.: Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomed. Eng. 47(7), 829–837 (2000)
Article Google Scholar
Gratch, J., et al.: The distress analysis interview corpus of human and computer interviews. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 3123–3128 (2014)
Google Scholar
Haque, A., Guo, M., Miner, A.S., Fei-Fei, L.: Measuring depression symptom severity from spoken language and 3d facial expressions. arXiv preprint arXiv:1811.08592 (2018)
Keren, G., Schuller, B.: Convolutional RNN: an enhanced model for extracting features from sequential data. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 3412–3419. IEEE (2016)
Google Scholar
Lee, J., Tashev, I.: High-level feature representation using recurrent neural network for speech emotion recognition. In: Interspeech 2015 (2015)
Google Scholar
Li, S., Raj, D., Lu, X., Shen, P., Kawahara, T., Kawai, H.: Improving transformer-based speech recognition systems with compressed structure and speech attributes augmentation. In: Interspeech, pp. 4400–4404 (2019)
Google Scholar
Low, L.S.A., Maddage, N.C., Lech, M., Sheeber, L., Allen, N.: Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5154–5157. IEEE (2010)
Google Scholar
Makarov, I., Borisenko, G.: Depth inpainting via vision transformer. In: 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 286–291. IEEE (2021)
Google Scholar
Makarov, I., Gerasimova, O.: Link prediction regression for weighted co-authorship networks. In: Rojas, I., Joya, G., Catala, A. (eds.) IWANN 2019, Part II. LNCS, vol. 11507, pp. 667–677. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20518-8_55
Chapter Google Scholar
Makarov, I., Gerasimova, O.: Predicting collaborations in co-authorship network. In: 2019 14th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), pp. 1–6. IEEE (2019)
Google Scholar
Makarov, I., Gerasimova, O., Sulimov, P., Zhukov, L.E.: Co-authorship network embedding and recommending collaborators via network embedding. In: van der Aalst, W.M.P., et al. (eds.) AIST 2018. LNCS, vol. 11179, pp. 32–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-11027-7_4
Chapter Google Scholar
Makarov, I., Gerasimova, O., Sulimov, P., Zhukov, L.E.: Dual network embedding for representing research interests in the link prediction problem on co-authorship networks. PeerJ Comput. Sci. 5, e172 (2019)
Google Scholar
Makarov, I., Kiselev, D., Nikitinsky, N., Subelj, L.: Survey on graph embeddings and their applications to machine learning problems on graphs. PeerJ Comput. Sci. 7, e357 (2021)
Google Scholar
Makarov, I., Korovina, K., Kiselev, D.: JONNEE: joint network nodes and edges embedding. IEEE Access 9, 144646–144659 (2021)
Article Google Scholar
Makarov, I., Makarov, M., Kiselev, D.: Fusion of text and graph information for machine learning problems on networks. PeerJ Comput. Sci. 7, e526 (2021)
Google Scholar
Mao, Q., Dong, M., Huang, Z., Zhan, Y.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Trans. Multimed. 16(8), 2203–2213 (2014)
Article Google Scholar
Moore, E., Clements, M., Peifer, J., Weisser, L.: Analysis of prosodic variation in speech for clinical depression. In: Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No. 03CH37439), vol. 3, pp. 2925–2928. IEEE (2003)
Google Scholar
Moore, E., II., Clements, M.A., Peifer, J.W., Weisser, L.: Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans. Biomed. Eng. 55(1), 96–107 (2007)
Article Google Scholar
Mundt, J.C., Snyder, P.J., Cannizzaro, M.S., Chappie, K., Geralts, D.S.: Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J. Neurolinguistics 20(1), 50–64 (2007)
Article Google Scholar
Muzammel, M., Salam, H., Othmani, A.: End-to-end multimodal clinical depression recognition using deep neural networks: a comparative analysis. Comput. Methods Prog. Biomed. 211, 106433 (2021)
Google Scholar
Othmani, A., Kadoch, D., Bentounes, K., Rejaibi, E., Alfred, R., Hadid, A.: Towards robust deep neural networks for affect and depression recognition from speech. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12662, pp. 5–19. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68790-8_1
Chapter Google Scholar
Ozdas, A., Shiavi, R.G., Silverman, S.E., Silverman, M.K., Wilkes, D.M.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Trans. Biomed. Eng. 51(9), 1530–1540 (2004)
Article Google Scholar
Pareja, A., et al.: EvolveGCN: evolving graph convolutional networks for dynamic graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 4, pp. 5363–5370 (2020)
Google Scholar
Pham, V.T., et al.: Independent language modeling architecture for end-to-end ASR. arXiv preprint arXiv:1912.00863 (2019)
Prendergast, M.: Understanding Depression. Penguin Group Australia (2006)
Google Scholar
Ringeval, F., et al.: AVEC 2017: real-life depression, and affect recognition workshop and challenge. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 3–9 (2017)
Google Scholar
Rustem, M.K., Makarov, I., Zhukov, L.E.: Predicting psychology attributes of a social network user. In: Proceedings of the Fourth Workshop on Experimental Economics and Machine Learning (EEML 2017), Dresden, Germany, 17–18 September 2017, pp. 1–7. CEUR WP (2017)
Google Scholar
Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584. IEEE (2015)
Google Scholar
Satt, A., Rozenberg, S., Hoory, R.: Efficient emotion recognition from speech using deep learning on spectrograms. In: Interspeech, pp. 1089–1093 (2017)
Google Scholar
Seo, Y., Defferrard, M., Vandergheynst, P., Bresson, X.: Structured sequence modeling with graph convolutional recurrent networks. In: Cheng, L., Leung, A.C.S., Ozawa, S. (eds.) ICONIP 2018, Part I. LNCS, vol. 11301, pp. 362–373. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04167-0_33
Chapter Google Scholar
Shirian, A., Guha, T.: Compact graph architecture for speech emotion recognition. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2021, pp. 6284–6288. IEEE (2021)
Google Scholar
Tikhomirova, K., Makarov, I.: Community detection based on the nodes role in a network: the telegram platform case. In: van der Aalst, W.M.P., et al. (eds.) AIST 2020. LNCS, vol. 12602, pp. 294–302. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72610-2_22
Chapter Google Scholar
Trigeorgis, G., et al.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200–5204. IEEE (2016)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, H., Liu, Y., Zhen, X., Tu, X.: Depression speech recognition with a three-dimensional convolutional network. Front. Hum. Neurosci. 15 (2021)
Google Scholar
Wang, P.S., et al.: Use of mental health services for anxiety, mood, and substance disorders in 17 countries in the who world mental health surveys. Lancet 370(9590), 841–850 (2007)
Article Google Scholar
Yang, L., Sahli, H., Xia, X., Pei, E., Oveneke, M.C., Jiang, D.: Hybrid depression classification and estimation from audio video and text information. In: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, pp. 45–51 (2017)
Google Scholar
Zlochower, A.J., Cohn, J.F.: Vocal timing in face-to-face interaction of clinically depressed and nondepressed mothers and their 4-month-old infants. Infant Behav. Dev. 19(3), 371–374 (1996)
Article Google Scholar

Download references

Acknowledgement

The work of Ilya Makarov was supported by the Russian Science Foundation under grant 22-11-00323 and performed at HSE University, Moscow, Russia.

Author information

Authors and Affiliations

HSE University, Moscow, Russia
Evgeniya Zavorina & Ilya Makarov
Artificial Intelligence Research Institute (AIRI), Moscow, Russia
Ilya Makarov

Authors

Evgeniya Zavorina
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Makarov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilya Makarov .

Editor information

Editors and Affiliations

Skolkovo Institute of Science and Technology, Moscow, Russia
Evgeny Burnaev
National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Skolkovo Institute of Science and Technology, Moscow, Russia
Sergei Ivanov
Krasovskii Institute of Mathematics and Mechanics of Russian Academy of Sciences, Yekaterinburg, Russia
Michael Khachay
National Research University Higher School of Economics, St. Petersburg, Russia
Olessia Koltsova
University of Oslo, Oslo, Norway
Andrei Kutuzov
National Research University Higher School of Economics, Moscow, Russia
Sergei O. Kuznetsov
Lomonosov Moscow State University, Moscow, Russia
Natalia Loukachevitch
LORIA, Campus Scientifique, Vandœuvre lès Nancy, France
Amedeo Napoli
Skolkovo Institute of Science and Technology, Moscow, Russia
Alexander Panchenko
University of Florida, Gainesville, USA
Panos M. Pardalos
Aalto University, Espoo, Finland
Jari Saramäki
National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Yandex LLC, Moscow, Russia
Evgenii Tsymbalov
Kazan Federal University, Kazan, Russia
Elena Tutubalina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zavorina, E., Makarov, I. (2022). Depression Detection by Person’s Voice. In: Burnaev, E., et al. Analysis of Images, Social Networks and Texts. AIST 2021. Lecture Notes in Computer Science, vol 13217. Springer, Cham. https://doi.org/10.1007/978-3-031-16500-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-16500-9_21
Published: 02 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16499-6
Online ISBN: 978-3-031-16500-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Depression Detection by Person’s Voice