Abstract
Depression is a costly and underdiagnosed global health concern, and there is a great need for improved patient screening. Speech technology offers promise for remote screening, but must perform robustly across patient and environmental variables. This chapter describes two deep learning models that achieve excellent performance in this regard. An acoustic model uses transfer learning from an automatic speech recognition (ASR) task. A natural language processing (NLP) model uses transfer learning from a language modeling task. Both models are studied using data from over 10,000 unique users who interacted with human-machine applications using conversational speech. Results for binary classification on a large test set show AUC performance of 0.79 and 0.83 for the acoustic and NLP models, respectively. RMSE for a regression task is 4.70 for the acoustic model and 4.27 for the NLP model. Further analysis of performance as a function of test subset characteristics indicates that the models are generally robust over speaker and session variables. It is concluded that both acoustic and NLP-based models have potential for use in generalized automated depression screening.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Otte, C., Gold, S. M., Penninx, B. W., Pariante, C. M., Etkin, A., Fava, M., Mohr, D. C., & Schatzberg, A. F. (2016). Major depressive disorder. Nature Reviews Disease Primers, 2(1), 16065. https://doi.org/10.1038/nrdp.2016.65
World Health Organisation. (2017). Depression and other common mental disorders: Global health estimates. World Health Organization.
NIH. (2019, February). Major depression. Retrieved January 22, 2021, from https://www.nimh.nih.gov/health/statistics/major-depression.shtml.
Depression. (n.d.). Centers for Disease Control and Prevention. Retrieved January 22, 2021, from https://www.cdc.gov/nchs/fastats/depression.htm.
Kuhl, E. A. (2018). Quantifying the cost of depression. Center For Workplace Mental Health. Retrieved from http://www.workplacementalhealth.org/Mental-Health-Topics/Depression/Quantifying-the-Cost-of-Depression.
Mitchell, A. J., Vaze, A., & Rao, S. (2009). Clinical diagnosis of depression in primary care: A meta-analysis. The Lancet, 374(9690), 609–619. https://doi.org/10.1016/S0140-6736(09)60879-5
Simon, G. E., VonKorff, M., Piccinelli, M., Fullerton, C., & Ormel, J. (1999). An international study of the relation between somatic symptoms and depression. New England Journal of Medicine, 341(18), 1329–1335. https://doi.org/10.1056/NEJM199910283411801
Nease, D. E., & Maloin, J. M. (2003). Depression screening: A practical strategy. The Journal of Family Practice, 52(2), 118–124. http://www.ncbi.nlm.nih.gov/pubmed/12585989.
Resnik, P., Garron, A., & Resnik, R. (2013). Using topic modeling to improve prediction of neuroticism and depression in college students. In EMNLP 2013—2013 conference on empirical methods in natural language processing, proceedings of the conference, pp. 1348–1353.
Cummins, N., Scherer, S., Krajewski, J., Schnieder, S., Epps, J., & Quatieri, T. F. (2015). A review of depression and suicide risk assessment using speech analysis. Speech Communication, 71, 10–49. https://doi.org/10.1016/j.specom.2015.03.004
Williamson, J. R., Godoy, E., Cha, M., Schwarzentruber, A., Khorrami, P., Gwon, Y., Kung, H.-T., Dagli, C., & Quatieri, T. F. (2016). Detecting depression using vocal, facial and semantic communication cues. In Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 11–18. https://doi.org/10.1145/2988257.2988263.
Pampouchidou, A., Simantiraki, O., Fazlollahi, A., Pediaditis, M., Manousos, D., Roniotis, A., Giannakakis, G., Meriaudeau, F., Simos, P., Marias, K., Yang, F., & Tsiknakis, M. (2016). Depression assessment by fusing high and low level features from audio, video, and text. Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 27–34. https://doi.org/10.1145/2988257.2988266.
Yang, L., Sahli, H., Xia, X., Pei, E., Oveneke, M. C., & Jiang, D. (2017). Hybrid depression classification and estimation from audio video and text information. In Proceedings of the 7th annual workshop on audio/visual emotion challenge, pp. 45–51. https://doi.org/10.1145/3133944.3133950.
Ringeval, F., Messner, E.-M., Song, S., Liu, S., Zhao, Z., Mallol-Ragolta, A., Ren, Z., Soleymani, M., Pantic, M., Schuller, B., Valstar, M., Cummins, N., Cowie, R., Tavabi, L., Schmitt, M., Alisamir, S., & Amiriparian, S. (2019). AVEC 2019 workshop and challenge: State-of-mind, detecting depression with AI, and cross-cultural affect recognition. In Proceedings of the 9th international on audio/visual emotion challenge and workshop—AVEC ‘19, pp. 3–12. https://doi.org/10.1145/3347320.3357688.
Rutowski, T., Harati, A., Lu, Y., & Shriberg, E. (2019). Optimizing speech-input length for speaker-independent depression classification. Interspeech, 2019, 3023–3027. https://doi.org/10.21437/Interspeech.2019-3095
Cohn, J. F., Cummins, N., Epps, J., Goecke, R., Joshi, J., & Scherer, S. (2018). Multimodal assessment of depression from behavioral signals. In The handbook of multimodal-multisensor interfaces: Foundations, user modeling, and common modality combinations—Volume 2 (pp. 375–417). Association for Computing Machinery. https://doi.org/10.1145/3107990.3108004
Scherer, S., Stratou, G., Gratch, J., & Morency, L. P. (2013). Investigating voice quality as a speaker-independent indicator of depression and PTSD. In Proceedings of the annual conference of the International Speech Communication Association, INTERSPEECH, pp. 847–851.
Helfer, B. S., Quatieri, T. F., Williamson, J. R., Mehta, D. D., Horwitz, R., & Yu, B. (2013). Classification of depression state based on articulatory precision. In Proceedings of the annual conference of the International Speech Communication Association, INTERSPEECH, pp. 2172–2176.
Stasak, B., Epps, J., & Goecke, R. (2019). An investigation of linguistic stress and articulatory vowel characteristics for automatic depression classification. Computer Speech & Language, 53, 140–155. https://doi.org/10.1016/j.csl.2018.08.001
Trevino, A. C., Quatieri, T. F., & Malyska, N. (2011). Phonologically-based biomarkers for major depressive disorder. EURASIP Journal on Advances in Signal Processing, 2011(1), 42. https://doi.org/10.1186/1687-6180-2011-42
Horwitz, R., Quatieri, T. F., Helfer, B. S., Yu, B., Williamson, J. R., & Mundt, J. (2013). On the relative importance of vocal source, system, and prosody in human depression. In 2013 IEEE international conference on body sensor networks, pp. 1–6. https://doi.org/10.1109/BSN.2013.6575522.
Sacks, H. (1995). Lectures on conversation. Wiley-Blackwell. https://doi.org/10.1002/9781444328301
Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology, 54(1), 547–577. https://doi.org/10.1146/annurev.psych.54.101601.145041
Pestian, J. P., Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K. B., Hurdle, J., & Brew, C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5, BII-S9042. https://doi.org/10.4137/BII.S9042
Ramirez-Esparza, N., Chung, C. K., Kacewicz, E., & Pennebaker, J. W. (2008). The psychology of word use in depression forums in English and in Spanish: Testing two text analytic approaches. Retrieved from www.aaai.org.
Huang, S. H., LePendu, P., Iyer, S. V., Tai-Seale, M., Carrell, D., & Shah, N. H. (2014a). Toward personalizing treatment for depression: Predicting diagnosis and severity. Journal of the American Medical Informatics Association, 21(6), 1069–1075. https://doi.org/10.1136/amiajnl-2014-002733
Perlis, R. H., Iosifescu, D. V., Castro, V. M., Murphy, S. N., Gainer, V. S., Minnier, J., Cai, T., Goryachev, S., Zeng, Q., Gallagher, P. J., Fava, M., Weilburg, J. B., Churchill, S. E., Kohane, I. S., & Smoller, J. W. (2012). Using electronic medical records to enable large-scale studies in psychiatry: Treatment resistant depression as a model. Psychological Medicine, 42(1), 41–50. https://doi.org/10.1017/S0033291711000997
Cook, B. L., Progovac, A. M., Chen, P., Mullin, B., Hou, S., & Baca-Garcia, E. (2016). Novel use of Natural Language Processing (NLP) to predict suicidal ideation and psychiatric symptoms in a text-based mental health intervention in Madrid. Computational and Mathematical Methods in Medicine, 2016, 8708434. https://doi.org/10.1155/2016/8708434
Yates, A., Cohan, A., & Goharian, N. (2017). Depression and self-harm risk assessment in online forums. In Proceedings of the 2017 conference on empirical methods in natural language processing, pp. 2968–2978. https://doi.org/10.18653/v1/D17-1322.
Chan, W., Jaitly, N., Le, Q., & Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4960–4964. https://doi.org/10.1109/ICASSP.2016.7472621.
Kim, S., Hori, T., & Watanabe, S. (2017). Joint CTC-attention based end-to-end speech recognition using multi-task learning. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4835–4839. https://doi.org/10.1109/ICASSP.2017.7953075.
Narayanan, A., Prabhavalkar, R., Chiu, C.-C., Rybach, D., Sainath, T. N., & Strohman, T. (2019). Recognizing long-form speech using streaming end-to-end models. In 2019 IEEE automatic speech recognition and understanding workshop (ASRU), pp. 920–927. https://doi.org/10.1109/ASRU46091.2019.9003913.
Deng, J., Zhang, Z., Marchi, E., & Schuller, B. (2013). Sparse autoencoder-based feature transfer learning for speech emotion recognition. In 2013 Humaine association conference on affective computing and intelligent interaction, pp. 511–516. https://doi.org/10.1109/ACII.2013.90.
Huang, Z., Dong, M., Mao, Q., & Zhan, Y. (2014b). Speech emotion recognition using CNN. In Proceedings of the 22nd ACM international conference on multimedia, pp. 801–804. https://doi.org/10.1145/2647868.2654984.
Huang, Z., Epps, J., & Joachim, D. (2020). Exploiting vocal tract coordination using dilated CNNS for depression detection in naturalistic environments. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 6549–6553. https://doi.org/10.1109/ICASSP40776.2020.9054323.
Kahou, S. E., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, K., Jean, S., Froumenty, P., Dauphin, Y., Boulanger-Lewandowski, N., Chandias Ferrari, R., Mirza, M., Warde-Farley, D., Courville, A., Vincent, P., Memisevic, R., Pal, C., & Bengio, Y. (2016). EmoNets: Multimodal deep learning approaches for emotion recognition in video. Journal on Multimodal User Interfaces, 10(2), 99–111. https://doi.org/10.1007/s12193-015-0195-2
Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In 2016 Asia-Pacific Signal and Information Processing Association annual summit and conference (APSIPA), pp. 1–4. doi:https://doi.org/10.1109/APSIPA.2016.7820699.
Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16(8), 2203–2213. https://doi.org/10.1109/TMM.2014.2360798
Yang, L., Jiang, D., & Sahli, H. (2020). Feature augmenting networks for improving depression severity estimation from speech signals. IEEE Access, 8, 24033–24045. https://doi.org/10.1109/ACCESS.2020.2970496
He, L., & Cao, C. (2018). Automated depression analysis using convolutional neural networks from speech. Journal of Biomedical Informatics, 83, 103–111. https://doi.org/10.1016/j.jbi.2018.05.007
Coutinho, E., Deng, J., & Schuller, B. (2014). Transfer learning emotion manifestation across music and speech. International Joint Conference on Neural Networks (IJCNN), 2014, 3592–3598. https://doi.org/10.1109/IJCNN.2014.6889814
Coutinho, E., & Schuller, B. (2017). Shared acoustic codes underlie emotional communication in music and speech—Evidence from deep transfer learning. PLoS One, 12(6), e0179289. https://doi.org/10.1371/journal.pone.0179289
Li, Q., & Chaspari, T. (2019). Exploring transfer learning between scripted and spontaneous speech for emotion recognition. In 2019 international conference on multimodal interaction, pp. 435–439. https://doi.org/10.1145/3340555.3353762.
Du, W., Morency, L.-P., Cohn, J., & Black, A. W. (2019). Bag-of-acoustic-words for mental health assessment: A deep autoencoding approach. Interspeech, 2019, 1428–1432. https://doi.org/10.21437/Interspeech.2019-3059
Martinez-Castaño, R., Htait, A., Azzopardi, L., & Moshfeghi, Y. (2020). Early risk detection of self-harm and depression severity using BERT-based transformers: iLab at CLEF eRisk 2020. CEUR Workshop Proceedings, 2696.
Salekin, A., Eberle, J. W., Glenn, J. J., Teachman, B. A., & Stankovic, J. A. (2018). A weakly supervised learning framework for detecting social anxiety and depression. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(2), 1–26. https://doi.org/10.1145/3214284
Zhao, Z., Bao, Z., Zhang, Z., Deng, J., Cummins, N., Wang, H., Tao, J., & Schuller, B. (2020). Automatic assessment of depression from speech via a hierarchical attention transfer network and attention autoencoders. IEEE Journal of Selected Topics in Signal Processing, 14(2), 423–434. https://doi.org/10.1109/JSTSP.2019.2955012
Lu, Y., Harati, A., Rutowski, T., Oliveira, R., Chlebek, P., & Shriberg, E. (2020). Robust speech and natural language processing models for depression screening. In The 2020 IEEE signal processing in medicine and biology symposium, pp. 1–5.
Valstar, M., Gratch, J., Schuller, B., Ringeval, F., Lalanne, D., Torres Torres, M., Scherer, S., Stratou, G., Cowie, R., & Pantic, M. (2016). AVEC 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the 6th international workshop on audio/visual emotion challenge, pp. 3–10. https://doi.org/10.1145/2988257.2988258.
Valstar, M., Schuller, B., Smith, K., Almaev, T., Eyben, F., Krajewski, J., Cowie, R., & Pantic, M. (2014). AVEC 2014: 3D dimensional affect and depression recognition challenge. In Proceedings of the 4th international workshop on audio/visual emotion challenge—AVEC ‘14, pp. 3–10. https://doi.org/10.1145/2661806.2661807.
Stasak, B., Epps, J., & Goecke, R. (2017). Elicitation design for acoustic depression classification: An investigation of articulation effort, linguistic complexity, and word affect. In Proceedings of the annual conference of the international speech communication association, INTERSPEECH, pp. 834–838. https://doi.org/10.21437/Interspeech.2017-1223.
Jiahong, Y., Liberman, M., & Cieri, C. (2006). Towards an integrated understanding of speaking rate in conversation. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2, 541–544.
Kroenke, K., Strine, T. W., Spitzer, R. L., Williams, J. B. W., Berry, J. T., & Mokdad, A. H. (2009). The PHQ-8 as a measure of current depression in the general population. Journal of Affective Disorders, 114(1–3), 163–173. https://doi.org/10.1016/j.jad.2008.06.026
National population by characteristics: 2010–2019. (n.d.). Retrieved from https://www.census.gov/data/tables/time-series/demo/popest/2010s-national-detail.html.
ACS demographic and housing estimates—2011–2015. (n.d.). Retrieved from https://www.census.gov/programs-surveys/acs/technical-documentation/table-and-geography-changes/2015/5-year.html.
Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. In 5th international conference on learning representations, ICLR 2017—Conference track proceedings. http://arxiv.org/abs/1609.07843.
Schwenk, H., Wenzek, G., Edunov, S., Grave, E., & Joulin, A. (2019). CCMatrix: Mining billions of high-quality parallel sentences on the WEB. CoRR, abs/1911.0. Retrieved from http://arxiv.org/abs/1911.04944.
Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964.
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44(3), 837. https://doi.org/10.2307/2531595
Sun, X., & Xu, W. (2014). Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Processing Letters, 21(11), 1389–1393. https://doi.org/10.1109/LSP.2014.2337313
Mundt, J. C., Snyder, P. J., Cannizzaro, M. S., Chappie, K., & Geralts, D. S. (2007). Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. Journal of Neurolinguistics, 20(1), 50–64. https://doi.org/10.1016/j.jneuroling.2006.04.001
Ray, A., Kumar, S., Reddy, R., Mukherjee, P., & Garg, R. (2019). Multi-level attention network using text, audio and video for depression prediction. In Proceedings of the 9th international on audio/visual emotion challenge and workshop—AVEC ‘19, pp. 81–88. https://doi.org/10.1145/3347320.3357697.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems.
Glorot, X., Bordes, A., & Bengio, Y. (2011). Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th international conference on machine learning, ICML 2011, pp. 513–520.
Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75. https://doi.org/10.1023/A:1007379606734
Zadeh, L. M., Silbert, N. H., Sternasty, K., Swanepoel, D. W., Hunter, L. L., & Moore, D. R. (2019). Extended high-frequency hearing enhances speech perception in noise. Proceedings of the National Academy of Sciences of the United States of America. https://doi.org/10.1073/pnas.1903315116
Lüke, H. D. (1999). The origins of the sampling theorem. IEEE Communications Magazine, 37(4), 106–108. https://doi.org/10.1109/35.755459
Ravindran, S., Demiroglu, C., & Anderson, D. V. (2003). Speech recognition using filter-bank features. In The thirty-seventh Asilomar conference on signals, systems & computers, 2003, pp. 1900–1903. https://doi.org/10.1109/ACSSC.2003.1292312.
Ravi, V., Fan, R., Afshan, A., Lu, H., & Alwan, A. (2020). Exploring the use of an unsupervised autoregressive model as a shared encoder for text-dependent speaker verification. Interspeech, 2020, 766–770. https://doi.org/10.21437/Interspeech.2020-2957
Parthasarathy, S., & Busso, C. (2018). Ladder networks for emotion recognition: Using unsupervised auxiliary tasks to improve predictions of emotional attributes. Interspeech, 2018, 3698–3702. https://doi.org/10.21437/Interspeech.2018-1391
Liu, A. H., Sung, T.-W., Chuang, S.-P., Lee, H., & Lee, L. (2020). Sequence-to-sequence automatic speech recognition with word embedding regularization and fused decoding. In ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 7879–7883. https://doi.org/10.1109/ICASSP40776.2020.9053324.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In 3rd international conference on learning representations, ICLR 2015—Conference track proceedings.
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18(5–6), 602–610. https://doi.org/10.1016/j.neunet.2005.06.042
Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In Proceedings of the national conference on artificial intelligence, pp. 2267–2273.
Harati, A., Shriberg, E., Rutowski, T., Chlebek, P., Lu, Y., & Oliveira, R. (2021). Speech-based depression prediction using encoder-weight-only transfer learning and a large corpus. In ICASSP 2021—2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 7273–7277.
Al Hanai, T., Ghassemi, M., & Glass, J. (2018). Detecting depression with audio/text sequence modeling of interviews. Interspeech, 2018, 1716–1720. https://doi.org/10.21437/Interspeech.2018-2522
Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. Advances in Neural Information Processing Systems, 3, 1137–1155.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 26, pp. 3111–3119). Curran Associates. https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf.
Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In EMNLP 2014—2014 conference on empirical methods in natural language processing, proceedings of the conference, pp. 1532–1543. https://doi.org/10.3115/v1/d14-1162.
Arora, S., Liang, Y., & Ma, T. (2017). A simple but tough-to-beat baseline for sentence embeddings. In 5th international conference on learning representations, ICLR 2017—Conference track proceedings. Retrieved from https://github.com/PrincetonML/SIF.
Rücklé, A., Eger, S., Peyrard, M., & Gurevych, I. (2018). Concatenated power mean word embeddings as universal cross-lingual sentence representations. ArXiv. Retrieved from http://arxiv.org/abs/1803.01400.
Mou, L., Meng, Z., Yan, R., Li, G., Xu, Y., Zhang, L., & Jin, Z. (2016). How transferable are neural networks in NLP applications? In Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 479–489. https://doi.org/10.18653/v1/D16-1046.
Kudo, T. (2018). Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 66–75. https://doi.org/10.18653/v1/P18-1007.
Sennrich, R., Haddow, B., & Birch, A. (2016). Neural machine translation of rare words with subword units. In Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), 3, pp. 1715–1725. https://doi.org/10.18653/v1/P16-1162.
Merity, S., Keskar, N. S., & Socher, R. (2018). Regularizing and optimizing LSTM language models. In 6th international conference on learning representations, ICLR 2018—Conference track proceedings.
Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. In Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp. 328–339. https://doi.org/10.18653/v1/P18-1031.
Ferri, C., Hernández-Orallo, J., & Modroiu, R. (2009). An experimental comparison of performance measures for classification. Pattern Recognition Letters, 30(1), 27–38. https://doi.org/10.1016/j.patrec.2008.08.010
Rutowski, T., Shriberg, E., Harati, A., Lu, Y., Chlebek, P., & Oliveira, R. (2021). Cross-demographic portability of deep NLP-based depression models. In 2021 IEEE spoken language technology workshop (SLT).
Carey, M., Jones, K., Meadows, G., Sanson-Fisher, R., D’Este, C., Inder, K., Yoong, S. L., & Russell, G. (2014). Accuracy of general practitioner unassisted detection of depression. Australian & New Zealand Journal of Psychiatry, 48(6), 571–578. https://doi.org/10.1177/0004867413520047
Vermani, M., Marcus, M., & Katzman, M. A. (2011). Rates of detection of mood and anxiety disorders in primary care: A descriptive, cross-sectional study. The Primary Care Companion for CNS Disorders, 13(2). doi:https://doi.org/10.4088/PCC.10m01013.
Acknowledgments
This paper and the research behind it would not have been possible without the exceptional support of the Ellipsis Health team. We specially thank our clinical team: Mike Aratow, Farshid Haque, Tahmida Nazreen and Melissa McCool. We thank Mainul Mondal and Susan Solinsky for their continuing support. Additionally, we thank Vanessa Lin and Nina Roth for editing the manuscript for this chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Harati, A. et al. (2022). Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening. In: Obeid, I., Picone, J., Selesnick, I. (eds) Biomedical Sensing and Analysis . Springer, Cham. https://doi.org/10.1007/978-3-030-99383-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-99383-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99382-5
Online ISBN: 978-3-030-99383-2
eBook Packages: EngineeringEngineering (R0)