Skip to main content

Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening

  • Chapter
  • First Online:
Biomedical Sensing and Analysis

Abstract

Depression is a costly and underdiagnosed global health concern, and there is a great need for improved patient screening. Speech technology offers promise for remote screening, but must perform robustly across patient and environmental variables. This chapter describes two deep learning models that achieve excellent performance in this regard. An acoustic model uses transfer learning from an automatic speech recognition (ASR) task. A natural language processing (NLP) model uses transfer learning from a language modeling task. Both models are studied using data from over 10,000 unique users who interacted with human-machine applications using conversational speech. Results for binary classification on a large test set show AUC performance of 0.79 and 0.83 for the acoustic and NLP models, respectively. RMSE for a regression task is 4.70 for the acoustic model and 4.27 for the NLP model. Further analysis of performance as a function of test subset characteristics indicates that the models are generally robust over speaker and session variables. It is concluded that both acoustic and NLP-based models have potential for use in generalized automated depression screening.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Otte, C., Gold, S. M., Penninx, B. W., Pariante, C. M., Etkin, A., Fava, M., Mohr, D. C., & Schatzberg, A. F. (2016). Major depressive disorder. Nature Reviews Disease Primers, 2(1), 16065. https://doi.org/10.1038/nrdp.2016.65

    Article  Google Scholar 

  2. World Health Organisation. (2017). Depression and other common mental disorders: Global health estimates. World Health Organization.

    Google Scholar 

  3. NIH. (2019, February). Major depression. Retrieved January 22, 2021, from https://www.nimh.nih.gov/health/statistics/major-depression.shtml.

  4. Depression. (n.d.). Centers for Disease Control and Prevention. Retrieved January 22, 2021, from https://www.cdc.gov/nchs/fastats/depression.htm.

    Google Scholar 

  5. Kuhl, E. A. (2018). Quantifying the cost of depression. Center For Workplace Mental Health. Retrieved from http://www.workplacementalhealth.org/Mental-Health-Topics/Depression/Quantifying-the-Cost-of-Depression.

    Google Scholar 

  6. Mitchell, A. J., Vaze, A., & Rao, S. (2009). Clinical diagnosis of depression in primary care: A meta-analysis. The Lancet, 374(9690), 609–619. https://doi.org/10.1016/S0140-6736(09)60879-5

    Article  Google Scholar 

  7. Simon, G. E., VonKorff, M., Piccinelli, M., Fullerton, C., & Ormel, J. (1999). An international study of the relation between somatic symptoms and depression. New England Journal of Medicine, 341(18), 1329–1335. https://doi.org/10.1056/NEJM199910283411801

    Article  Google Scholar 

  8. Nease, D. E., & Maloin, J. M. (2003). Depression screening: A practical strategy. The Journal of Family Practice, 52(2), 118–124. http://www.ncbi.nlm.nih.gov/pubmed/12585989.

    Google Scholar 

  9. Resnik, P., Garron, A., & Resnik, R. (2013). Using topic modeling to improve prediction of neuroticism and depression in college students. In EMNLP 2013—2013 conference on empirical methods in natural language processing, proceedings of the conference, pp. 1348–1353.

    Google Scholar 

  10. Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., & Quatieri, T. F. (2015). A review of depression and suicide risk assessment using speech analysis. Speech Communication, 71, 10–49. https://doi.org/10.1016/j.specom.2015.03.004

    Article  Google Scholar 

  11. Williamson, J. R., Godoy, E., Cha, M., Schwarzentruber, A., Khorrami, P., Gwon, Y., Kung, H.-T., Dagli, C., & Quatieri, T. F. (2016). Detecting depression using vocal, facial and semantic communication cues. In Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 11–18. https://doi.org/10.1145/2988257.2988263.

  12. Pampouchidou, A., Simantiraki, O., Fazlollahi, A., Pediaditis, M., Manousos, D., Roniotis, A., Giannakakis, G., Meriaudeau, F., Simos, P., Marias, K., Yang, F., & Tsiknakis, M. (2016). Depression assessment by fusing high and low level features from audio, video, and text. Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 27–34. https://doi.org/10.1145/2988257.2988266.

  13. Yang, L., Sahli, H., Xia, X., Pei, E., Oveneke, M. C., & Jiang, D. (2017). Hybrid depression classification and estimation from audio video and text information. In Proceedings of the 7th annual workshop on audio/visual emotion challenge, pp. 45–51. https://doi.org/10.1145/3133944.3133950.

  14. Ringeval, F., Messner, E.-M., Song, S., Liu, S., Zhao, Z., Mallol-Ragolta, A., Ren, Z., Soleymani, M., Pantic, M., Schuller, B., Valstar, M., Cummins, N., Cowie, R., Tavabi, L., Schmitt, M., Alisamir, S., & Amiriparian, S. (2019). AVEC 2019 workshop and challenge: State-of-mind, detecting depression with AI, and cross-cultural affect recognition. In Proceedings of the 9th international on audio/visual emotion challenge and workshop—AVEC ‘19, pp. 3–12. https://doi.org/10.1145/3347320.3357688.

  15. Rutowski, T., Harati, A., Lu, Y., & Shriberg, E. (2019). Optimizing speech-input length for speaker-independent depression classification. Interspeech, 2019, 3023–3027. https://doi.org/10.21437/Interspeech.2019-3095

    Article  Google Scholar 

  16. Cohn, J. F., Cummins, N., Epps, J., Goecke, R., Joshi, J., & Scherer, S. (2018). Multimodal assessment of depression from behavioral signals. In The handbook of multimodal-multisensor interfaces: Foundations, user modeling, and common modality combinations—Volume 2 (pp. 375–417). Association for Computing Machinery. https://doi.org/10.1145/3107990.3108004

    Chapter  Google Scholar 

  17. Scherer, S., Stratou, G., Gratch, J., & Morency, L. P. (2013). Investigating voice quality as a speaker-independent indicator of depression and PTSD. In Proceedings of the annual conference of the International Speech Communication Association, INTERSPEECH, pp. 847–851.

    Google Scholar 

  18. Helfer, B. S., Quatieri, T. F., Williamson, J. R., Mehta, D. D., Horwitz, R., & Yu, B. (2013). Classification of depression state based on articulatory precision. In Proceedings of the annual conference of the International Speech Communication Association, INTERSPEECH, pp. 2172–2176.

    Google Scholar 

  19. Stasak, B., Epps, J., & Goecke, R. (2019). An investigation of linguistic stress and articulatory vowel characteristics for automatic depression classification. Computer Speech & Language, 53, 140–155. https://doi.org/10.1016/j.csl.2018.08.001

    Article  Google Scholar 

  20. Trevino, A. C., Quatieri, T. F., & Malyska, N. (2011). Phonologically-based biomarkers for major depressive disorder. EURASIP Journal on Advances in Signal Processing, 2011(1), 42. https://doi.org/10.1186/1687-6180-2011-42

    Article  Google Scholar 

  21. Horwitz, R., Quatieri, T. F., Helfer, B. S., Yu, B., Williamson, J. R., & Mundt, J. (2013). On the relative importance of vocal source, system, and prosody in human depression. In 2013 IEEE international conference on body sensor networks, pp. 1–6. https://doi.org/10.1109/BSN.2013.6575522.

  22. Sacks, H. (1995). Lectures on conversation. Wiley-Blackwell. https://doi.org/10.1002/9781444328301

    Book  Google Scholar 

  23. Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology, 54(1), 547–577. https://doi.org/10.1146/annurev.psych.54.101601.145041

    Article  Google Scholar 

  24. Pestian, J. P., Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K. B., Hurdle, J., & Brew, C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5, BII-S9042. https://doi.org/10.4137/BII.S9042

    Article  Google Scholar 

  25. Ramirez-Esparza, N., Chung, C. K., Kacewicz, E., & Pennebaker, J. W. (2008). The psychology of word use in depression forums in English and in Spanish: Testing two text analytic approaches. Retrieved from www.aaai.org.

  26. Huang, S. H., LePendu, P., Iyer, S. V., Tai-Seale, M., Carrell, D., & Shah, N. H. (2014a). Toward personalizing treatment for depression: Predicting diagnosis and severity. Journal of the American Medical Informatics Association, 21(6), 1069–1075. https://doi.org/10.1136/amiajnl-2014-002733

    Article  Google Scholar 

  27. Perlis, R. H., Iosifescu, D. V., Castro, V. M., Murphy, S. N., Gainer, V. S., Minnier, J., Cai, T., Goryachev, S., Zeng, Q., Gallagher, P. J., Fava, M., Weilburg, J. B., Churchill, S. E., Kohane, I. S., & Smoller, J. W. (2012). Using electronic medical records to enable large-scale studies in psychiatry: Treatment resistant depression as a model. Psychological Medicine, 42(1), 41–50. https://doi.org/10.1017/S0033291711000997

    Article  Google Scholar 

  28. Cook, B. L., Progovac, A. M., Chen, P., Mullin, B., Hou, S., & Baca-Garcia, E. (2016). Novel use of Natural Language Processing (NLP) to predict suicidal ideation and psychiatric symptoms in a text-based mental health intervention in Madrid. Computational and Mathematical Methods in Medicine, 2016, 8708434. https://doi.org/10.1155/2016/8708434

    Article  MATH  Google Scholar 

  29. Yates, A., Cohan, A., & Goharian, N. (2017). Depression and self-harm risk assessment in online forums. In Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 2968–2978. https://doi.org/10.18653/v1/D17-1322.

  30. Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4960–4964. https://doi.org/10.1109/ICASSP.2016.7472621.

  31. Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4835–4839. https://doi.org/10.1109/ICASSP.2017.7953075.

  32. Narayanan, A., Prabhavalkar, R., Chiu, C.-C., Rybach, D., Sainath, T. N., & Strohman, T. (2019). Recognizing long-form speech using streaming end-to-end models. In 2019 IEEE automatic speech recognition and understanding workshop (ASRU), pp. 920–927. https://doi.org/10.1109/ASRU46091.2019.9003913.

  33. Deng, J., Zhang, Z., Marchi, E., & Schuller, B. (2013). Sparse autoencoder-based feature transfer learning for speech emotion recognition. In 2013 Humaine association conference on affective computing and intelligent interaction, pp. 511–516. https://doi.org/10.1109/ACII.2013.90.

  34. Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2014b). Speech emotion recognition using CNN. In Proceedings of the 22nd ACM international conference on multimedia, pp. 801–804. https://doi.org/10.1145/2647868.2654984.

  35. Huang, Z., Epps, J., & Joachim, D. (2020). Exploiting vocal tract coordination using dilated CNNS for depression detection in naturalistic environments. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 6549–6553. https://doi.org/10.1109/ICASSP40776.2020.9054323.

  36. Kahou, S. E., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, K., Jean, S., Froumenty, P., Dauphin, Y., Boulanger-Lewandowski, N., Chandias Ferrari, R., Mirza, M., Warde-Farley, D., Courville, A., Vincent, P., Memisevic, R., Pal, C., & Bengio, Y. (2016). EmoNets: Multimodal deep learning approaches for emotion recognition in video. Journal on Multimodal User Interfaces, 10(2), 99–111. https://doi.org/10.1007/s12193-015-0195-2

    Article  Google Scholar 

  37. Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In 2016 Asia-Pacific Signal and Information Processing Association annual summit and conference (APSIPA), pp. 1–4. doi:https://doi.org/10.1109/APSIPA.2016.7820699.

  38. Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16(8), 2203–2213. https://doi.org/10.1109/TMM.2014.2360798

    Article  Google Scholar 

  39. Yang, L., Jiang, D., & Sahli, H. (2020). Feature augmenting networks for improving depression severity estimation from speech signals. IEEE Access, 8, 24033–24045. https://doi.org/10.1109/ACCESS.2020.2970496

    Article  Google Scholar 

  40. He, L., & Cao, C. (2018). Automated depression analysis using convolutional neural networks from speech. Journal of Biomedical Informatics, 83, 103–111. https://doi.org/10.1016/j.jbi.2018.05.007

    Article  Google Scholar 

  41. Coutinho, E., Deng, J., & Schuller, B. (2014). Transfer learning emotion manifestation across music and speech. International Joint Conference on Neural Networks (IJCNN), 2014, 3592–3598. https://doi.org/10.1109/IJCNN.2014.6889814

    Article  Google Scholar 

  42. Coutinho, E., & Schuller, B. (2017). Shared acoustic codes underlie emotional communication in music and speech—Evidence from deep transfer learning. PLoS One, 12(6), e0179289. https://doi.org/10.1371/journal.pone.0179289

    Article  Google Scholar 

  43. Li, Q., & Chaspari, T. (2019). Exploring transfer learning between scripted and spontaneous speech for emotion recognition. In 2019 international conference on multimodal interaction, pp. 435–439. https://doi.org/10.1145/3340555.3353762.

  44. Du, W., Morency, L.-P., Cohn, J., & Black, A. W. (2019). Bag-of-acoustic-words for mental health assessment: A deep autoencoding approach. Interspeech, 2019, 1428–1432. https://doi.org/10.21437/Interspeech.2019-3059

    Article  Google Scholar 

  45. Martinez-Castaño, R., Htait, A., Azzopardi, L., & Moshfeghi, Y. (2020). Early risk detection of self-harm and depression severity using BERT-based transformers: iLab at CLEF eRisk 2020. CEUR Workshop Proceedings, 2696.

    Google Scholar 

  46. Salekin, A., Eberle, J. W., Glenn, J. J., Teachman, B. A., & Stankovic, J. A. (2018). A weakly supervised learning framework for detecting social anxiety and depression. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(2), 1–26. https://doi.org/10.1145/3214284

    Article  Google Scholar 

  47. Zhao, Z., Bao, Z., Zhang, Z., Deng, J., Cummins, N., Wang, H., Tao, J., & Schuller, B. (2020). Automatic assessment of depression from speech via a hierarchical attention transfer network and attention autoencoders. IEEE Journal of Selected Topics in Signal Processing, 14(2), 423–434. https://doi.org/10.1109/JSTSP.2019.2955012

    Article  Google Scholar 

  48. Lu, Y., Harati, A., Rutowski, T., Oliveira, R., Chlebek, P., & Shriberg, E. (2020). Robust speech and natural language processing models for depression screening. In The 2020 IEEE signal processing in medicine and biology symposium, pp. 1–5.

    Google Scholar 

  49. Valstar, M., Gratch, J., Schuller, B., Ringeval, F., Lalanne, D., Torres Torres, M., Scherer, S., Stratou, G., Cowie, R., & Pantic, M. (2016). AVEC 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 3–10. https://doi.org/10.1145/2988257.2988258.

  50. Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., & Pantic, M. (2014). AVEC 2014: 3D dimensional affect and depression recognition challenge. In Proceedings of the 4th international workshop on audio/visual emotion challenge—AVEC ‘14, pp. 3–10. https://doi.org/10.1145/2661806.2661807.

  51. Stasak, B., Epps, J., & Goecke, R. (2017). Elicitation design for acoustic depression classification: An investigation of articulation effort, linguistic complexity, and word affect. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH, pp. 834–838. https://doi.org/10.21437/Interspeech.2017-1223.

  52. Jiahong, Y., Liberman, M., & Cieri, C. (2006). Towards an integrated understanding of speaking rate in conversation. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2, 541–544.

    Google Scholar 

  53. Kroenke, K., Strine, T. W., Spitzer, R. L., Williams, J. B. W., Berry, J. T., & Mokdad, A. H. (2009). The PHQ-8 as a measure of current depression in the general population. Journal of Affective Disorders, 114(1–3), 163–173. https://doi.org/10.1016/j.jad.2008.06.026

    Article  Google Scholar 

  54. National population by characteristics: 2010–2019. (n.d.). Retrieved from https://www.census.gov/data/tables/time-series/demo/popest/2010s-national-detail.html.

  55. ACS demographic and housing estimates—2011–2015. (n.d.). Retrieved from https://www.census.gov/programs-surveys/acs/technical-documentation/table-and-geography-changes/2015/5-year.html.

  56. Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. In 5th international conference on learning representations, ICLR 2017—Conference track proceedings. http://arxiv.org/abs/1609.07843.

  57. Schwenk, H., Wenzek, G., Edunov, S., Grave, E., & Joulin, A. (2019). CCMatrix: Mining billions of high-quality parallel sentences on the WEB. CoRR, abs/1911.0. Retrieved from http://arxiv.org/abs/1911.04944.

  58. Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964.

  59. DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44(3), 837. https://doi.org/10.2307/2531595

    Article  MATH  Google Scholar 

  60. Sun, X., & Xu, W. (2014). Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Processing Letters, 21(11), 1389–1393. https://doi.org/10.1109/LSP.2014.2337313

    Article  Google Scholar 

  61. Mundt, J. C., Snyder, P. J., Cannizzaro, M. S., Chappie, K., & Geralts, D. S. (2007). Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. Journal of Neurolinguistics, 20(1), 50–64. https://doi.org/10.1016/j.jneuroling.2006.04.001

    Article  Google Scholar 

  62. Ray, A., Kumar, S., Reddy, R., Mukherjee, P., & Garg, R. (2019). Multi-level attention network using text, audio and video for depression prediction. In Proceedings of the 9th international on audio/visual emotion challenge and workshop—AVEC ‘19, pp. 81–88. https://doi.org/10.1145/3347320.3357697.

  63. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems.

    Google Scholar 

  64. Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th international conference on machine learning, ICML 2011, pp. 513–520.

    Google Scholar 

  65. Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75. https://doi.org/10.1023/A:1007379606734

    Article  MathSciNet  Google Scholar 

  66. Zadeh, L. M., Silbert, N. H., Sternasty, K., Swanepoel, D. W., Hunter, L. L., & Moore, D. R. (2019). Extended high-frequency hearing enhances speech perception in noise. Proceedings of the National Academy of Sciences of the United States of America. https://doi.org/10.1073/pnas.1903315116

  67. Lüke, H. D. (1999). The origins of the sampling theorem. IEEE Communications Magazine, 37(4), 106–108. https://doi.org/10.1109/35.755459

    Article  Google Scholar 

  68. Ravindran, S., Demiroglu, C., & Anderson, D. V. (2003). Speech recognition using filter-bank features. In The thirty-seventh Asilomar conference on signals, systems & computers, 2003, pp. 1900–1903. https://doi.org/10.1109/ACSSC.2003.1292312.

  69. Ravi, V., Fan, R., Afshan, A., Lu, H., & Alwan, A. (2020). Exploring the use of an unsupervised autoregressive model as a shared encoder for text-dependent speaker verification. Interspeech, 2020, 766–770. https://doi.org/10.21437/Interspeech.2020-2957

    Article  Google Scholar 

  70. Parthasarathy, S., & Busso, C. (2018). Ladder networks for emotion recognition: Using unsupervised auxiliary tasks to improve predictions of emotional attributes. Interspeech, 2018, 3698–3702. https://doi.org/10.21437/Interspeech.2018-1391

    Article  Google Scholar 

  71. Liu, A. H., Sung, T.-W., Chuang, S.-P., Lee, H., & Lee, L. (2020). Sequence-to-sequence automatic speech recognition with word embedding regularization and fused decoding. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 7879–7883. https://doi.org/10.1109/ICASSP40776.2020.9053324.

  72. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In 3rd international conference on learning representations, ICLR 2015—Conference track proceedings.

    Google Scholar 

  73. Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5–6), 602–610. https://doi.org/10.1016/j.neunet.2005.06.042

    Article  Google Scholar 

  74. Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In Proceedings of the national conference on artificial intelligence, pp. 2267–2273.

    Google Scholar 

  75. Harati, A., Shriberg, E., Rutowski, T., Chlebek, P., Lu, Y., & Oliveira, R. (2021). Speech-based depression prediction using encoder-weight-only transfer learning and a large corpus. In ICASSP 2021—2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 7273–7277.

    Google Scholar 

  76. Al Hanai, T., Ghassemi, M., & Glass, J. (2018). Detecting depression with audio/text sequence modeling of interviews. Interspeech, 2018, 1716–1720. https://doi.org/10.21437/Interspeech.2018-2522

    Article  Google Scholar 

  77. Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Advances in Neural Information Processing Systems, 3, 1137–1155.

    MATH  Google Scholar 

  78. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 26, pp. 3111–3119). Curran Associates. https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf.

    Google Scholar 

  79. Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In EMNLP 2014—2014 conference on empirical methods in natural language processing, proceedings of the conference, pp. 1532–1543. https://doi.org/10.3115/v1/d14-1162.

  80. Arora, S., Liang, Y., & Ma, T. (2017). A simple but tough-to-beat baseline for sentence embeddings. In 5th international conference on learning representations, ICLR 2017—Conference track proceedings. Retrieved from https://github.com/PrincetonML/SIF.

    Google Scholar 

  81. Rücklé, A., Eger, S., Peyrard, M., & Gurevych, I. (2018). Concatenated power mean word embeddings as universal cross-lingual sentence representations. ArXiv. Retrieved from http://arxiv.org/abs/1803.01400.

    Google Scholar 

  82. Mou, L., Meng, Z., Yan, R., Li, G., Xu, Y., Zhang, L., & Jin, Z. (2016). How transferable are neural networks in NLP applications? In Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 479–489. https://doi.org/10.18653/v1/D16-1046.

  83. Kudo, T. (2018). Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 66–75. https://doi.org/10.18653/v1/P18-1007.

  84. Sennrich, R., Haddow, B., & Birch, A. (2016). Neural machine translation of rare words with subword units. In Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), 3, pp. 1715–1725. https://doi.org/10.18653/v1/P16-1162.

  85. Merity, S., Keskar, N. S., & Socher, R. (2018). Regularizing and optimizing LSTM language models. In 6th international conference on learning representations, ICLR 2018—Conference track proceedings.

    Google Scholar 

  86. Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. In Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 328–339. https://doi.org/10.18653/v1/P18-1031.

  87. Ferri, C., Hernández-Orallo, J., & Modroiu, R. (2009). An experimental comparison of performance measures for classification. Pattern Recognition Letters, 30(1), 27–38. https://doi.org/10.1016/j.patrec.2008.08.010

    Article  Google Scholar 

  88. Rutowski, T., Shriberg, E., Harati, A., Lu, Y., Chlebek, P., & Oliveira, R. (2021). Cross-demographic portability of deep NLP-based depression models. In 2021 IEEE spoken language technology workshop (SLT).

    Google Scholar 

  89. Carey, M., Jones, K., Meadows, G., Sanson-Fisher, R., D’Este, C., Inder, K., Yoong, S. L., & Russell, G. (2014). Accuracy of general practitioner unassisted detection of depression. Australian & New Zealand Journal of Psychiatry, 48(6), 571–578. https://doi.org/10.1177/0004867413520047

    Article  Google Scholar 

  90. Vermani, M., Marcus, M., & Katzman, M. A. (2011). Rates of detection of mood and anxiety disorders in primary care: A descriptive, cross-sectional study. The Primary Care Companion for CNS Disorders, 13(2). doi:https://doi.org/10.4088/PCC.10m01013.

Download references

Acknowledgments

This paper and the research behind it would not have been possible without the exceptional support of the Ellipsis Health team. We specially thank our clinical team: Mike Aratow, Farshid Haque, Tahmida Nazreen and Melissa McCool. We thank Mainul Mondal and Susan Solinsky for their continuing support. Additionally, we thank Vanessa Lin and Nina Roth for editing the manuscript for this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amir Harati .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Harati, A. et al. (2022). Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening. In: Obeid, I., Picone, J., Selesnick, I. (eds) Biomedical Sensing and Analysis . Springer, Cham. https://doi.org/10.1007/978-3-030-99383-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99383-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99382-5

  • Online ISBN: 978-3-030-99383-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics