Abstract
Depression is the most common mental illness in the US, with 6.7% of all adults experiencing a major depressive episode. Unfortunately, depression extends to teens and young users as well and researchers have observed an increasing rate in recent years (from 8.7% in 2005 to 11.3% in 2014 in adolescents and from 8.8 to 9.6% in young adults), especially among girls and women. People themselves are a barrier to fighting this disease as they tend to hide their symptoms and do not receive treatments. However, protected by anonymity, they share their sentiments on the Web, looking for help. In this paper, we address the problem of detecting depressed users in online forums. We analyze user behavior in the ReachOut.com online forum, a platform providing a supportive environment for young people to discuss their everyday issues, including depression. We propose an unsupervised technique based on recurrent neural networks and anomaly detection to detect depressed users. We examine the linguistic style of user posts in combination with network-based features modeling how users connect in the forum. Our results on detecting depressed users show that both psycho-linguistic features derived from user posts and network features are good predictors of users facing depression. Moreover, by combining these two sets of features, we can achieve an F1-measure of 0.64 and perform better than baselines.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Unsupervised anomaly detection is used when the data are unlabelled, i.e., the class of an instance (normal or anomaly) is not known. This approach does not require the training or testing data, which makes it more flexible and widely applicable. The main idea of unsupervised anomaly detection is to provide a score for each instance by learning intrinsic properties such as distance or density. This score is called the anomaly score that determines whether the instance is normal or anomalous.
The score is given by the weighed log probability obtained with GMM when this anomaly detection algorithm is applied to our autoencoder features plus PageRank, reciprocity, and local clustering coefficient network features.
References
Brew C (2016) Classifying reachout posts with a radial basis function SVM. In: Proceedings of the 3rd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, CLPsych@NAACL-HLT, San Diego, California, pp 138–142
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw 30(1–7):107–117
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Clpsych dataset. http://clpsych.org/shared-task-2017/
Cohan A, Young S, Goharian N (2016) Triaging mental health forum posts. In: Proceedings of the third workshop on computational lingusitics and clinical psychology, pp 143–147
De Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. ICWSM 13:1–10
De Choudhury M, Counts S, Horvitz E (2013) Social media as a measurement tool of depression in populations. In: Proceedings of the 5th annual ACM web science conference, ACM, pp 47–56
Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preoţiuc-Pietro D, Asch DA, Schwartz HA (2018) Facebook language predicts depression in medical records. Proc Natl Acad Sci 115(44):11203–11208
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: SIGKDD, pp 855–864
Han J, Kamber M (2012) Data mining: concepts and techniques
Hauskrecht M, Batal I, Valko M, Visweswaran S, Cooper GF, Clermont G (2013) Outlier detection for patient monitoring and alerting. J Biomed Inf 46(1):47–55
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Kim SM, Wang Y, Wan S, Paris C (2016) Data61-csiro systems at the CLPSYCH 2016 shared task. In: Proceedings of the 3rd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, CLPsych@NAACL-HLT 2016, San Diego, California, pp 128–132
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining, IEEE, pp 413–422
Lstm description. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Lustberg L, Reynolds CF III (2000) Depression and insomnia: questions of cause and effect. Sleep Med Rev 4(3):253–262
MacAvaney S, Desmet B, Cohan A, Soldaini L, Yates A, Zirikly A, Goharian N (2018) RSDD-time: Temporal annotation of self-reported mental health diagnoses. In: Proceedings of the fifth workshop on computational linguistics and clinical psychology: from keyboard to clinic, CLPsych@NAACL-HTL, New Orleans, pp 168–173
Malmasi S, Zampieri M, Dras M (2016) Predicting post severity in mental health forums. In: Proceedings of the 3rd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, CLPsych@NAACL-HLT 2016, San Diego, pp 133–137
Metcalf A, Blake V (2013) Reachout.com annual user survey results
Millen D (2015) Reachout annual report 2013/2014
Park M, McDonald DW, Cha M (2013) Perception differences between the depressed and non-depressed users in twitter. ICWSM 9:217–226
Park M, Cha C, Cha M (2012) Depressive moods of users portrayed in twitter. In: Proceedings of the ACM SIGKDD workshop on healthcare informatics (HI-KDD), vol 2012, ACM New York, pp 1–8
Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of LIWC2015. In: Technical report
Resnik P, Armstrong W, Claudino L, Nguyen T, Nguyen VA, Boyd-Graber J (2015) Beyond LDA: exploring supervised topic modeling for depression-related language in twitter. In: Proceedings of the 2nd workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 99–107
Rude S, Gortner EM, Pennebaker J (2004) Language use of depressed and depression-vulnerable college students. Cognit Emot 18(8):1121–1133
Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Shrestha A, Spezzano F (2019) Detecting depressed users in online forums. In: International symposium on network enabled health informatics, biomedicine and bioinformatics (HI-BI-BI 2019), in conjuction with ASONAM’19
Staiano J, Guerini M (2014) Depechemood: a lexicon for emotion analysis from crowd-annotated news. arXiv preprint arXiv:1405.1605
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing, Association for computational linguistics, pp 347–354
Xu R, Zhang Q (2016) Understanding online health groups for depression: social network and linguistic perspectives. J Med Internet Res 18(3):e63
Yates A, Cohan A, Goharian N (2017) Depression and self-harm risk assessment in online forums. arXiv preprint arXiv:1709.01848
Zimmermann J, Brockmeyer T, Hunn M, Schauenburg H, Wolf M (2017) First-person pronoun use in spoken language as a predictor of future depressive symptoms: preliminary evidence from a clinical sample of depressed patients. Clin Psychol Psychother 24(2):384–391
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This paper is an extended version of the conference paper “Anu Shrestha and Francesca Spezzano, Detecting Depressed Users in Online Forums. In Proceedings of the International Symposium on Network Enabled Health Informatics, Biomedicine and Bioinformatics (HI-BI-BI 2019)”, in conjunction with ASONAM 2019. Shrestha and Spezzano (2019).
Rights and permissions
About this article
Cite this article
Shrestha, A., Serra, E. & Spezzano, F. Multi-modal social and psycho-linguistic embedding via recurrent neural networks to identify depressed users in online forums. Netw Model Anal Health Inform Bioinforma 9, 22 (2020). https://doi.org/10.1007/s13721-020-0226-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-020-0226-0