Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening

Harati, Amir; Rutowski, Tomasz; Lu, Yang; Chlebek, Piotr; Oliveira, Ricardo; Shriberg, Elizabeth; Lin, David

doi:10.1007/978-3-030-99383-2_3

Amir Harati⁴,
Tomasz Rutowski⁴,
Yang Lu⁴,
Piotr Chlebek⁴,
Ricardo Oliveira⁴,
Elizabeth Shriberg⁴ &
…
David Lin⁴

422 Accesses
1 Citations
3 Altmetric

Abstract

Depression is a costly and underdiagnosed global health concern, and there is a great need for improved patient screening. Speech technology offers promise for remote screening, but must perform robustly across patient and environmental variables. This chapter describes two deep learning models that achieve excellent performance in this regard. An acoustic model uses transfer learning from an automatic speech recognition (ASR) task. A natural language processing (NLP) model uses transfer learning from a language modeling task. Both models are studied using data from over 10,000 unique users who interacted with human-machine applications using conversational speech. Results for binary classification on a large test set show AUC performance of 0.79 and 0.83 for the acoustic and NLP models, respectively. RMSE for a regression task is 4.70 for the acoustic model and 4.27 for the NLP model. Further analysis of performance as a function of test subset characteristics indicates that the models are generally robust over speaker and session variables. It is concluded that both acoustic and NLP-based models have potential for use in generalized automated depression screening.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Otte, C., Gold, S. M., Penninx, B. W., Pariante, C. M., Etkin, A., Fava, M., Mohr, D. C., & Schatzberg, A. F. (2016). Major depressive disorder. Nature Reviews Disease Primers, 2(1), 16065. https://doi.org/10.1038/nrdp.2016.65
Article Google Scholar
World Health Organisation. (2017). Depression and other common mental disorders: Global health estimates. World Health Organization.
Google Scholar
NIH. (2019, February). Major depression. Retrieved January 22, 2021, from https://www.nimh.nih.gov/health/statistics/major-depression.shtml.
Depression. (n.d.). Centers for Disease Control and Prevention. Retrieved January 22, 2021, from https://www.cdc.gov/nchs/fastats/depression.htm.
Google Scholar
Kuhl, E. A. (2018). Quantifying the cost of depression. Center For Workplace Mental Health. Retrieved from http://www.workplacementalhealth.org/Mental-Health-Topics/Depression/Quantifying-the-Cost-of-Depression.
Google Scholar
Mitchell, A. J., Vaze, A., & Rao, S. (2009). Clinical diagnosis of depression in primary care: A meta-analysis. The Lancet, 374(9690), 609–619. https://doi.org/10.1016/S0140-6736(09)60879-5
Article Google Scholar
Simon, G. E., VonKorff, M., Piccinelli, M., Fullerton, C., & Ormel, J. (1999). An international study of the relation between somatic symptoms and depression. New England Journal of Medicine, 341(18), 1329–1335. https://doi.org/10.1056/NEJM199910283411801
Article Google Scholar
Nease, D. E., & Maloin, J. M. (2003). Depression screening: A practical strategy. The Journal of Family Practice, 52(2), 118–124. http://www.ncbi.nlm.nih.gov/pubmed/12585989.
Google Scholar
Resnik, P., Garron, A., & Resnik, R. (2013). Using topic modeling to improve prediction of neuroticism and depression in college students. In EMNLP 2013—2013 conference on empirical methods in natural language processing, proceedings of the conference, pp. 1348–1353.
Google Scholar
Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., & Quatieri, T. F. (2015). A review of depression and suicide risk assessment using speech analysis. Speech Communication, 71, 10–49. https://doi.org/10.1016/j.specom.2015.03.004
Article Google Scholar
Williamson, J. R., Godoy, E., Cha, M., Schwarzentruber, A., Khorrami, P., Gwon, Y., Kung, H.-T., Dagli, C., & Quatieri, T. F. (2016). Detecting depression using vocal, facial and semantic communication cues. In Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 11–18. https://doi.org/10.1145/2988257.2988263.
Pampouchidou, A., Simantiraki, O., Fazlollahi, A., Pediaditis, M., Manousos, D., Roniotis, A., Giannakakis, G., Meriaudeau, F., Simos, P., Marias, K., Yang, F., & Tsiknakis, M. (2016). Depression assessment by fusing high and low level features from audio, video, and text. Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 27–34. https://doi.org/10.1145/2988257.2988266.
Yang, L., Sahli, H., Xia, X., Pei, E., Oveneke, M. C., & Jiang, D. (2017). Hybrid depression classification and estimation from audio video and text information. In Proceedings of the 7th annual workshop on audio/visual emotion challenge, pp. 45–51. https://doi.org/10.1145/3133944.3133950.
Ringeval, F., Messner, E.-M., Song, S., Liu, S., Zhao, Z., Mallol-Ragolta, A., Ren, Z., Soleymani, M., Pantic, M., Schuller, B., Valstar, M., Cummins, N., Cowie, R., Tavabi, L., Schmitt, M., Alisamir, S., & Amiriparian, S. (2019). AVEC 2019 workshop and challenge: State-of-mind, detecting depression with AI, and cross-cultural affect recognition. In Proceedings of the 9th international on audio/visual emotion challenge and workshop—AVEC ‘19, pp. 3–12. https://doi.org/10.1145/3347320.3357688.
Rutowski, T., Harati, A., Lu, Y., & Shriberg, E. (2019). Optimizing speech-input length for speaker-independent depression classification. Interspeech, 2019, 3023–3027. https://doi.org/10.21437/Interspeech.2019-3095
Article Google Scholar
Cohn, J. F., Cummins, N., Epps, J., Goecke, R., Joshi, J., & Scherer, S. (2018). Multimodal assessment of depression from behavioral signals. In The handbook of multimodal-multisensor interfaces: Foundations, user modeling, and common modality combinations—Volume 2 (pp. 375–417). Association for Computing Machinery. https://doi.org/10.1145/3107990.3108004
Chapter Google Scholar
Scherer, S., Stratou, G., Gratch, J., & Morency, L. P. (2013). Investigating voice quality as a speaker-independent indicator of depression and PTSD. In Proceedings of the annual conference of the International Speech Communication Association, INTERSPEECH, pp. 847–851.
Google Scholar
Helfer, B. S., Quatieri, T. F., Williamson, J. R., Mehta, D. D., Horwitz, R., & Yu, B. (2013). Classification of depression state based on articulatory precision. In Proceedings of the annual conference of the International Speech Communication Association, INTERSPEECH, pp. 2172–2176.
Google Scholar
Stasak, B., Epps, J., & Goecke, R. (2019). An investigation of linguistic stress and articulatory vowel characteristics for automatic depression classification. Computer Speech & Language, 53, 140–155. https://doi.org/10.1016/j.csl.2018.08.001
Article Google Scholar
Trevino, A. C., Quatieri, T. F., & Malyska, N. (2011). Phonologically-based biomarkers for major depressive disorder. EURASIP Journal on Advances in Signal Processing, 2011(1), 42. https://doi.org/10.1186/1687-6180-2011-42
Article Google Scholar
Horwitz, R., Quatieri, T. F., Helfer, B. S., Yu, B., Williamson, J. R., & Mundt, J. (2013). On the relative importance of vocal source, system, and prosody in human depression. In 2013 IEEE international conference on body sensor networks, pp. 1–6. https://doi.org/10.1109/BSN.2013.6575522.
Sacks, H. (1995). Lectures on conversation. Wiley-Blackwell. https://doi.org/10.1002/9781444328301
Book Google Scholar
Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology, 54(1), 547–577. https://doi.org/10.1146/annurev.psych.54.101601.145041
Article Google Scholar
Pestian, J. P., Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K. B., Hurdle, J., & Brew, C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5, BII-S9042. https://doi.org/10.4137/BII.S9042
Article Google Scholar
Ramirez-Esparza, N., Chung, C. K., Kacewicz, E., & Pennebaker, J. W. (2008). The psychology of word use in depression forums in English and in Spanish: Testing two text analytic approaches. Retrieved from www.aaai.org.
Huang, S. H., LePendu, P., Iyer, S. V., Tai-Seale, M., Carrell, D., & Shah, N. H. (2014a). Toward personalizing treatment for depression: Predicting diagnosis and severity. Journal of the American Medical Informatics Association, 21(6), 1069–1075. https://doi.org/10.1136/amiajnl-2014-002733
Article Google Scholar
Perlis, R. H., Iosifescu, D. V., Castro, V. M., Murphy, S. N., Gainer, V. S., Minnier, J., Cai, T., Goryachev, S., Zeng, Q., Gallagher, P. J., Fava, M., Weilburg, J. B., Churchill, S. E., Kohane, I. S., & Smoller, J. W. (2012). Using electronic medical records to enable large-scale studies in psychiatry: Treatment resistant depression as a model. Psychological Medicine, 42(1), 41–50. https://doi.org/10.1017/S0033291711000997
Article Google Scholar
Cook, B. L., Progovac, A. M., Chen, P., Mullin, B., Hou, S., & Baca-Garcia, E. (2016). Novel use of Natural Language Processing (NLP) to predict suicidal ideation and psychiatric symptoms in a text-based mental health intervention in Madrid. Computational and Mathematical Methods in Medicine, 2016, 8708434. https://doi.org/10.1155/2016/8708434
Article MATH Google Scholar
Yates, A., Cohan, A., & Goharian, N. (2017). Depression and self-harm risk assessment in online forums. In Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 2968–2978. https://doi.org/10.18653/v1/D17-1322.
Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4960–4964. https://doi.org/10.1109/ICASSP.2016.7472621.
Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4835–4839. https://doi.org/10.1109/ICASSP.2017.7953075.
Narayanan, A., Prabhavalkar, R., Chiu, C.-C., Rybach, D., Sainath, T. N., & Strohman, T. (2019). Recognizing long-form speech using streaming end-to-end models. In 2019 IEEE automatic speech recognition and understanding workshop (ASRU), pp. 920–927. https://doi.org/10.1109/ASRU46091.2019.9003913.
Deng, J., Zhang, Z., Marchi, E., & Schuller, B. (2013). Sparse autoencoder-based feature transfer learning for speech emotion recognition. In 2013 Humaine association conference on affective computing and intelligent interaction, pp. 511–516. https://doi.org/10.1109/ACII.2013.90.
Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2014b). Speech emotion recognition using CNN. In Proceedings of the 22nd ACM international conference on multimedia, pp. 801–804. https://doi.org/10.1145/2647868.2654984.
Huang, Z., Epps, J., & Joachim, D. (2020). Exploiting vocal tract coordination using dilated CNNS for depression detection in naturalistic environments. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 6549–6553. https://doi.org/10.1109/ICASSP40776.2020.9054323.
Kahou, S. E., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, K., Jean, S., Froumenty, P., Dauphin, Y., Boulanger-Lewandowski, N., Chandias Ferrari, R., Mirza, M., Warde-Farley, D., Courville, A., Vincent, P., Memisevic, R., Pal, C., & Bengio, Y. (2016). EmoNets: Multimodal deep learning approaches for emotion recognition in video. Journal on Multimodal User Interfaces, 10(2), 99–111. https://doi.org/10.1007/s12193-015-0195-2
Article Google Scholar
Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In 2016 Asia-Pacific Signal and Information Processing Association annual summit and conference (APSIPA), pp. 1–4. doi:https://doi.org/10.1109/APSIPA.2016.7820699.
Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16(8), 2203–2213. https://doi.org/10.1109/TMM.2014.2360798
Article Google Scholar
Yang, L., Jiang, D., & Sahli, H. (2020). Feature augmenting networks for improving depression severity estimation from speech signals. IEEE Access, 8, 24033–24045. https://doi.org/10.1109/ACCESS.2020.2970496
Article Google Scholar
He, L., & Cao, C. (2018). Automated depression analysis using convolutional neural networks from speech. Journal of Biomedical Informatics, 83, 103–111. https://doi.org/10.1016/j.jbi.2018.05.007
Article Google Scholar
Coutinho, E., Deng, J., & Schuller, B. (2014). Transfer learning emotion manifestation across music and speech. International Joint Conference on Neural Networks (IJCNN), 2014, 3592–3598. https://doi.org/10.1109/IJCNN.2014.6889814
Article Google Scholar
Coutinho, E., & Schuller, B. (2017). Shared acoustic codes underlie emotional communication in music and speech—Evidence from deep transfer learning. PLoS One, 12(6), e0179289. https://doi.org/10.1371/journal.pone.0179289
Article Google Scholar
Li, Q., & Chaspari, T. (2019). Exploring transfer learning between scripted and spontaneous speech for emotion recognition. In 2019 international conference on multimodal interaction, pp. 435–439. https://doi.org/10.1145/3340555.3353762.
Du, W., Morency, L.-P., Cohn, J., & Black, A. W. (2019). Bag-of-acoustic-words for mental health assessment: A deep autoencoding approach. Interspeech, 2019, 1428–1432. https://doi.org/10.21437/Interspeech.2019-3059
Article Google Scholar
Martinez-Castaño, R., Htait, A., Azzopardi, L., & Moshfeghi, Y. (2020). Early risk detection of self-harm and depression severity using BERT-based transformers: iLab at CLEF eRisk 2020. CEUR Workshop Proceedings, 2696.
Google Scholar
Salekin, A., Eberle, J. W., Glenn, J. J., Teachman, B. A., & Stankovic, J. A. (2018). A weakly supervised learning framework for detecting social anxiety and depression. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(2), 1–26. https://doi.org/10.1145/3214284
Article Google Scholar
Zhao, Z., Bao, Z., Zhang, Z., Deng, J., Cummins, N., Wang, H., Tao, J., & Schuller, B. (2020). Automatic assessment of depression from speech via a hierarchical attention transfer network and attention autoencoders. IEEE Journal of Selected Topics in Signal Processing, 14(2), 423–434. https://doi.org/10.1109/JSTSP.2019.2955012
Article Google Scholar
Lu, Y., Harati, A., Rutowski, T., Oliveira, R., Chlebek, P., & Shriberg, E. (2020). Robust speech and natural language processing models for depression screening. In The 2020 IEEE signal processing in medicine and biology symposium, pp. 1–5.
Google Scholar
Valstar, M., Gratch, J., Schuller, B., Ringeval, F., Lalanne, D., Torres Torres, M., Scherer, S., Stratou, G., Cowie, R., & Pantic, M. (2016). AVEC 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 3–10. https://doi.org/10.1145/2988257.2988258.
Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., & Pantic, M. (2014). AVEC 2014: 3D dimensional affect and depression recognition challenge. In Proceedings of the 4th international workshop on audio/visual emotion challenge—AVEC ‘14, pp. 3–10. https://doi.org/10.1145/2661806.2661807.
Stasak, B., Epps, J., & Goecke, R. (2017). Elicitation design for acoustic depression classification: An investigation of articulation effort, linguistic complexity, and word affect. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH, pp. 834–838. https://doi.org/10.21437/Interspeech.2017-1223.
Jiahong, Y., Liberman, M., & Cieri, C. (2006). Towards an integrated understanding of speaking rate in conversation. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2, 541–544.
Google Scholar
Kroenke, K., Strine, T. W., Spitzer, R. L., Williams, J. B. W., Berry, J. T., & Mokdad, A. H. (2009). The PHQ-8 as a measure of current depression in the general population. Journal of Affective Disorders, 114(1–3), 163–173. https://doi.org/10.1016/j.jad.2008.06.026
Article Google Scholar
National population by characteristics: 2010–2019. (n.d.). Retrieved from https://www.census.gov/data/tables/time-series/demo/popest/2010s-national-detail.html.
ACS demographic and housing estimates—2011–2015. (n.d.). Retrieved from https://www.census.gov/programs-surveys/acs/technical-documentation/table-and-geography-changes/2015/5-year.html.
Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. In 5th international conference on learning representations, ICLR 2017—Conference track proceedings. http://arxiv.org/abs/1609.07843.
Schwenk, H., Wenzek, G., Edunov, S., Grave, E., & Joulin, A. (2019). CCMatrix: Mining billions of high-quality parallel sentences on the WEB. CoRR, abs/1911.0. Retrieved from http://arxiv.org/abs/1911.04944.
Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964.
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44(3), 837. https://doi.org/10.2307/2531595
Article MATH Google Scholar
Sun, X., & Xu, W. (2014). Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Processing Letters, 21(11), 1389–1393. https://doi.org/10.1109/LSP.2014.2337313
Article Google Scholar
Mundt, J. C., Snyder, P. J., Cannizzaro, M. S., Chappie, K., & Geralts, D. S. (2007). Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. Journal of Neurolinguistics, 20(1), 50–64. https://doi.org/10.1016/j.jneuroling.2006.04.001
Article Google Scholar
Ray, A., Kumar, S., Reddy, R., Mukherjee, P., & Garg, R. (2019). Multi-level attention network using text, audio and video for depression prediction. In Proceedings of the 9th international on audio/visual emotion challenge and workshop—AVEC ‘19, pp. 81–88. https://doi.org/10.1145/3347320.3357697.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems.
Google Scholar
Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th international conference on machine learning, ICML 2011, pp. 513–520.
Google Scholar
Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75. https://doi.org/10.1023/A:1007379606734
Article MathSciNet Google Scholar
Zadeh, L. M., Silbert, N. H., Sternasty, K., Swanepoel, D. W., Hunter, L. L., & Moore, D. R. (2019). Extended high-frequency hearing enhances speech perception in noise. Proceedings of the National Academy of Sciences of the United States of America. https://doi.org/10.1073/pnas.1903315116
Lüke, H. D. (1999). The origins of the sampling theorem. IEEE Communications Magazine, 37(4), 106–108. https://doi.org/10.1109/35.755459
Article Google Scholar
Ravindran, S., Demiroglu, C., & Anderson, D. V. (2003). Speech recognition using filter-bank features. In The thirty-seventh Asilomar conference on signals, systems & computers, 2003, pp. 1900–1903. https://doi.org/10.1109/ACSSC.2003.1292312.
Ravi, V., Fan, R., Afshan, A., Lu, H., & Alwan, A. (2020). Exploring the use of an unsupervised autoregressive model as a shared encoder for text-dependent speaker verification. Interspeech, 2020, 766–770. https://doi.org/10.21437/Interspeech.2020-2957
Article Google Scholar
Parthasarathy, S., & Busso, C. (2018). Ladder networks for emotion recognition: Using unsupervised auxiliary tasks to improve predictions of emotional attributes. Interspeech, 2018, 3698–3702. https://doi.org/10.21437/Interspeech.2018-1391
Article Google Scholar
Liu, A. H., Sung, T.-W., Chuang, S.-P., Lee, H., & Lee, L. (2020). Sequence-to-sequence automatic speech recognition with word embedding regularization and fused decoding. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 7879–7883. https://doi.org/10.1109/ICASSP40776.2020.9053324.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In 3rd international conference on learning representations, ICLR 2015—Conference track proceedings.
Google Scholar
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5–6), 602–610. https://doi.org/10.1016/j.neunet.2005.06.042
Article Google Scholar
Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In Proceedings of the national conference on artificial intelligence, pp. 2267–2273.
Google Scholar
Harati, A., Shriberg, E., Rutowski, T., Chlebek, P., Lu, Y., & Oliveira, R. (2021). Speech-based depression prediction using encoder-weight-only transfer learning and a large corpus. In ICASSP 2021—2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 7273–7277.
Google Scholar
Al Hanai, T., Ghassemi, M., & Glass, J. (2018). Detecting depression with audio/text sequence modeling of interviews. Interspeech, 2018, 1716–1720. https://doi.org/10.21437/Interspeech.2018-2522
Article Google Scholar
Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Advances in Neural Information Processing Systems, 3, 1137–1155.
MATH Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 26, pp. 3111–3119). Curran Associates. https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf.
Google Scholar
Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In EMNLP 2014—2014 conference on empirical methods in natural language processing, proceedings of the conference, pp. 1532–1543. https://doi.org/10.3115/v1/d14-1162.
Arora, S., Liang, Y., & Ma, T. (2017). A simple but tough-to-beat baseline for sentence embeddings. In 5th international conference on learning representations, ICLR 2017—Conference track proceedings. Retrieved from https://github.com/PrincetonML/SIF.
Google Scholar
Rücklé, A., Eger, S., Peyrard, M., & Gurevych, I. (2018). Concatenated power mean word embeddings as universal cross-lingual sentence representations. ArXiv. Retrieved from http://arxiv.org/abs/1803.01400.
Google Scholar
Mou, L., Meng, Z., Yan, R., Li, G., Xu, Y., Zhang, L., & Jin, Z. (2016). How transferable are neural networks in NLP applications? In Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 479–489. https://doi.org/10.18653/v1/D16-1046.
Kudo, T. (2018). Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 66–75. https://doi.org/10.18653/v1/P18-1007.
Sennrich, R., Haddow, B., & Birch, A. (2016). Neural machine translation of rare words with subword units. In Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), 3, pp. 1715–1725. https://doi.org/10.18653/v1/P16-1162.
Merity, S., Keskar, N. S., & Socher, R. (2018). Regularizing and optimizing LSTM language models. In 6th international conference on learning representations, ICLR 2018—Conference track proceedings.
Google Scholar
Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. In Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 328–339. https://doi.org/10.18653/v1/P18-1031.
Ferri, C., Hernández-Orallo, J., & Modroiu, R. (2009). An experimental comparison of performance measures for classification. Pattern Recognition Letters, 30(1), 27–38. https://doi.org/10.1016/j.patrec.2008.08.010
Article Google Scholar
Rutowski, T., Shriberg, E., Harati, A., Lu, Y., Chlebek, P., & Oliveira, R. (2021). Cross-demographic portability of deep NLP-based depression models. In 2021 IEEE spoken language technology workshop (SLT).
Google Scholar
Carey, M., Jones, K., Meadows, G., Sanson-Fisher, R., D’Este, C., Inder, K., Yoong, S. L., & Russell, G. (2014). Accuracy of general practitioner unassisted detection of depression. Australian & New Zealand Journal of Psychiatry, 48(6), 571–578. https://doi.org/10.1177/0004867413520047
Article Google Scholar
Vermani, M., Marcus, M., & Katzman, M. A. (2011). Rates of detection of mood and anxiety disorders in primary care: A descriptive, cross-sectional study. The Primary Care Companion for CNS Disorders, 13(2). doi:https://doi.org/10.4088/PCC.10m01013.

Download references

Acknowledgments

This paper and the research behind it would not have been possible without the exceptional support of the Ellipsis Health team. We specially thank our clinical team: Mike Aratow, Farshid Haque, Tahmida Nazreen and Melissa McCool. We thank Mainul Mondal and Susan Solinsky for their continuing support. Additionally, we thank Vanessa Lin and Nina Roth for editing the manuscript for this chapter.

Author information

Authors and Affiliations

Ellipsis Health Inc., San Francisco, CA, USA
Amir Harati, Tomasz Rutowski, Yang Lu, Piotr Chlebek, Ricardo Oliveira, Elizabeth Shriberg & David Lin

Authors

Amir Harati
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Rutowski
View author publications
You can also search for this author in PubMed Google Scholar
Yang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Chlebek
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Shriberg
View author publications
You can also search for this author in PubMed Google Scholar
David Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amir Harati .

Editor information

Editors and Affiliations

Department of Electrical & Computer Engineering, Temple University, Philadelphia, PA, USA
Iyad Obeid
Department of Electrical & Computer Engineering, Temple University, Philadelphia, PA, USA
Joseph Picone
Tandon School of Engineering, New York University, Brooklyn, NY, USA
Ivan Selesnick

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Harati, A. et al. (2022). Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening. In: Obeid, I., Picone, J., Selesnick, I. (eds) Biomedical Sensing and Analysis . Springer, Cham. https://doi.org/10.1007/978-3-030-99383-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-99383-2_3
Published: 20 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99382-5
Online ISBN: 978-3-030-99383-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics