Multimedia Tools and Applications

, Volume 76, Issue 8, pp 10653–10676 | Cite as

Using linguistic and topic analysis to classify sub-groups of online depression communities

  • Thin Nguyen
  • Bridianne O’Dea
  • Mark Larsen
  • Dinh Phung
  • Svetha Venkatesh
  • Helen Christensen


Depression is a highly prevalent mental health problem and is a co-morbidity of other mental, physical, and behavioural disorders. The internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these discussions have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A total of 5,000 posts were randomly selected from 24 online communities. Five subgroups of online communities were identified: Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement, and Suicide. Psycholinguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the depression communities from the other subgroups. Topics and psycholinguistic features were found to be highly valid predictors of community subgroup. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in mental health.


Social media Mental health Depression Web community Web-logs Feature extraction Textual cues Language styles Topics 


  1. 1.
    Arguello J, Butler BS, Joyce E, Kraut R, Ling KS, Carolyn R, Wang X (2006) Talk to me: Foundations for successful individual−group interactions in online communities. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems, pp 959–968Google Scholar
  2. 2.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993– 1022MATHGoogle Scholar
  3. 3.
    Chang X, Nie F, Yi Y, Huang H (2014) A convex formulation for semi−supervised multi−label feature selection. In: Proceedings of AAAI conference on artificial intelligence, pp 1171–1177Google Scholar
  4. 4.
    Chang X, Yi Y, Xing E, Yaoliang Y (2015) Complex event detection using semantic saliency and nearly−isotonic SVM. In: Proceedings of the International Conference on Machine Learning, pp 1348–1357Google Scholar
  5. 5.
    Chang X, Nie F, Wang S, Yi Y, Zhou X, Zhang C (2015) Compound rank−k projections for bilinear analysis. IEEE Transactions on Neural Networks and Learning Systems PP(99):1–1Google Scholar
  6. 6.
    Chen L−S, Eaton WW, Gallo JJ, Gerald N (2000) Understanding the heterogeneity of depression through the triad of symptoms, course and risk factors: A longitudinal, population−based study. J Affect Disord 59(1):1–11CrossRefGoogle Scholar
  7. 7.
    Coppersmith G, Dredze M, Harman C (2014) Quantifying mental health signals in Twitter. In: Proceedings of workshop on computational linguistics and clinical psychology: from linguistic signal to clinical reality, pp 51–60Google Scholar
  8. 8.
    Coppersmith G, Harman C, Dredze M (2014) Measuring post traumatic stress disorder in Twitter. In: Proceedings of International AAAI conference on weblogs and social mediaGoogle Scholar
  9. 9.
    Coppersmith G, Dredze M, Harman C, Hollingshead K (2015) From ADHD to SAD: Analyzing the language of mental health on Twitter through self−reported diagnoses. In: Proceedings of Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical RealityGoogle Scholar
  10. 10.
    Cruwys T, Haslam SA, Dingle GA, Haslam C, Jetten J, Depression and social identity: An integrative review (2014). In: Personality and Social Psychology ReviewGoogle Scholar
  11. 11.
    Culotta A (2014) Estimating county health statistics with Twitter. In: Proceedings of SIGCHI conference on human factors in computing systems, pp 1335–1344Google Scholar
  12. 12.
    Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF (2015) A review of depression and suicide risk assessment using speech analysis. Speech Comm 71:10–49CrossRefGoogle Scholar
  13. 13.
    De Choudhury M, Counts S, Horvitz E (2013) Major life changes and behavioral markers in social media: Case of childbirth. In: Proceedings of conference on computer supported cooperative work, pp 1431–1442Google Scholar
  14. 14.
    De Choudhury M, Counts S, Horvitz E (2013) Predicting postpartum changes in emotion and behavior via social media. In: Proceedings of SIGCHI conference on human factors in computing systems, pp 3267–3276Google Scholar
  15. 15.
    De Choudhury M, Morris MR, White RW (2014) Seeking and sharing health information online: Comparing search engines and social media. In: Proceedings of SIGCHI conference on human factors in computing systems, pp 1365–1376Google Scholar
  16. 16.
    De Choudhury M, Gamon M, Counts S, Horvitz E (2013) Predicting depression via social media. In: Proceedings of international AAAI conference on weblogs and social mediaGoogle Scholar
  17. 17.
    Eggly S, Manning MA, Slatcher RB, Berg RA, Wessel DL, Newth CJL, Shanley TP, Harrison R, Dalton H, Dean MJ, Doctor A, Jenkins T, Meert KL (2014) Language analysis as a window to bereaved parents’ emotions during a parent–physician bereavement meeting. J Lang Soc PsycholGoogle Scholar
  18. 18.
    Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1CrossRefGoogle Scholar
  19. 19.
    George DR, Dellasega C, Whitehead MM, Bordon A (2013) Facebook−based stress management resources for first−year medical students: A multi−method evaluation. Comput Hum Behav 29(3):559–562CrossRefGoogle Scholar
  20. 20.
    Giles J (2012) Making the links. Nature 488(7412):448–450CrossRefGoogle Scholar
  21. 21.
    Goldberg D (2011) The heterogeneity of “major depression”. World Psychiatry 10(3):226–228CrossRefGoogle Scholar
  22. 22.
    Grajales F.J III, Sheps S, Ho K, Novak−Lauscher H, Eysenbach G (2014) Social media: A review and tutorial of applications in medicine and health care. J Med Internet Res 16(2):e13CrossRefGoogle Scholar
  23. 23.
    Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(90001):5228–5235CrossRefGoogle Scholar
  24. 24.
    Hollenbaugh EE (2011) Motives for maintaining personal journal blogs. Cyberpsychology, Behavior, and Social Networking 14(1−2):13–20CrossRefGoogle Scholar
  25. 25.
    Houston TK, Cooper LA, Ford DE (2002) Internet support groups for depression: A 1−year prospective cohort study. Am J Psychiatr 159(12):2062–2068CrossRefGoogle Scholar
  26. 26.
    Johnson GJ, Ambrose PJ (2006) Neo−tribes: The power and potential of online communities in health care. Commun ACM 49(1):107–113CrossRefGoogle Scholar
  27. 27.
    Jeong YS, Nhi−Ha T, Shyu I, Chang T, Fava M, Kvedar J, Yeung A (2013) Using online social media: Facebook, in screening for major depressive disorder among college students. Int J Clin Health Psychol 13(1):74–80CrossRefGoogle Scholar
  28. 28.
    Kessler RC, Heeringa S, Lakoma MD, Petukhova M, Rupp AE, Schoenbaum M, Wang PS, Zaslavsky AM (2008) The individual−level and societal−level effects of mental disorders on earnings in the United States: Results from the national comorbidity survey replication. Am J Psychiatry 165(6):703–711CrossRefGoogle Scholar
  29. 29.
    Klonsky DE, Oltmanns TF, Turkheimer E (2003) Deliberate self−harm in a nonclinical population: Prevalence and psychological correlates. Am J Psychiatr 160 (8):1501–1508CrossRefGoogle Scholar
  30. 30.
    Larsen ME, Boonstra TW, Batterham PJ, O’Dea B, Paris C, Christensen H (2015) We feel: Mapping emotion on Twitter. IEEE Journal of Biomedical and Health Informatics 19(4):1246–1252CrossRefGoogle Scholar
  31. 31.
    Laserna CM, Seih Y−T, Pennebaker J.W (2014) Um... who like says you know: Filler word use as a function of age, gender, and personalityGoogle Scholar
  32. 32.
    McDaniel BT, Coyne SM, Holmes EK (2012) New mothers and media use: Associations between blogging, social networking, and maternal well−being. Matern Child Health J 16(7):1509–1517CrossRefGoogle Scholar
  33. 33.
    Moreno MA, Jelenchick LA, Egan KG, Cox E, Young H, Gannon KE, Tara B (2011) Feeling bad on Facebook: Depression disclosures by college students on a social networking site. Depress Anxiety 28(6):447–455CrossRefGoogle Scholar
  34. 34.
    Mundt JC, Vogel AP, Feltner DE, Lenderking WR (2012) Vocal acoustic biomarkers of depression severity and treatment response. Biol Psychiatry 72(7):580–587CrossRefGoogle Scholar
  35. 35.
    Nguyen T, Phung D, Bo D, Venkatesh S, Berk M (2014) Affective and content analysis of online depression communities. IEEE Trans Affect Comput 5 (3):1949–3045CrossRefGoogle Scholar
  36. 36.
    Nguyen T, Duong T, Venkatesh S, Phung D (2015) Austism blogs: Expressed emotion, language styles and concerns in personal and community settings. IEEE Trans Affect Comput 6(3):312–323CrossRefGoogle Scholar
  37. 37.
    Nguyen T, O’Dea B, Larsen M, Phung D, Venkatesh S, Christensen H (2015) Differentiating sub−groups of online depression−related communities using textual cues. In: Proceedings of web information systems engineering conference. Springer, pp 216–224Google Scholar
  38. 38.
    Nie L, Li T, Akbari M, Shen J, Chua T−S (2014) Wenzher: Comprehensive vertical search for healthcare domain. In: Proceedings of International ACM conference on research & development in information retrieval, pp 1245–1246Google Scholar
  39. 39.
    Nie L, Zhao Y−L, Akbari M, Shen J, Chua T−S (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans Knowl Data Eng 27(2):396–409CrossRefGoogle Scholar
  40. 40.
    O’Dea B, Wan S, Batterham P.J, Calear A.L, Paris C, Christensen H (2015) Detecting suicidality on Twitter. Internet Interventions 2(2):183–188CrossRefGoogle Scholar
  41. 41.
    Park M, McDonald D, Meeyoung C (2013) Perception differences between the depressed and non−depressed users in Twitter. In: Proceedings of AAAI International conference on weblogs and social mediaGoogle Scholar
  42. 42.
    Parker G, McCraw S, Paterson A (2015) Clinical features distinguishing grief from depressive episodes: A qualitative analysis. J Affect Disord 176:43–47CrossRefGoogle Scholar
  43. 43.
    Patrick K, Sheehan J, Bietz M, Gregory J, Claffey M, Calvert S, Melichar L, Downs S (2013) Gaining insight from patient & person−generated real world/real time data. In Medicine 2:0Google Scholar
  44. 44.
    Paul MJ, Dredze M (2014) Discovering health topics in social media using topic models. PLoS One 9(8):e103408CrossRefGoogle Scholar
  45. 45.
    Pennebaker JW, Francis ME, Booth RJ (2007) Linguistic Inquiry and Word Count (LIWC) [Computer software]. LIWC IncGoogle Scholar
  46. 46.
    Powell J, McCarthy N, Eysenbach G (2003) Cross−sectional survey of users of internet depression communities. BMC Psychiatry 3(1):19CrossRefGoogle Scholar
  47. 47.
    Preotiuc−Pietro D, Eichstaedt J, Park G, Sap M, Smith L, Tobolsky V, Schwartz HA, Ungar L (2015) The role of personality, age and gender in tweeting about mental illnesses. In: Proceedings of Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical RealityGoogle Scholar
  48. 48.
    Ramirez−Esparza N, Chung CK, Kacewicz E, Pennebaker JW (2008) The psychology of word use in depression forums in English and in Spanish: Testing two text analytic approaches. In: Proceedings of AAAI International Conference on Weblogs and Social Media, pp 102–108Google Scholar
  49. 49.
    Rodriguez AJ, Holleran SE, Matthias RM (2010) Reading between the lines: The lay assessment of subclinical depression from written self−descriptions. J Pers 78 (2):575–598CrossRefGoogle Scholar
  50. 50.
    Rude S, Gortner E−M, Pennebaker J (2004) Language use of depressed and depression−vulnerable college students. Cognition & Emotion 18(8):1121–1133CrossRefGoogle Scholar
  51. 51.
    Schwartz H, Eichstaedt J, Kern M, Dziurzynski L, Lucas R, Agrawal M, Park G, Lakshmikanth S, Jha S, Seligman M, Ungar L (2013) Characterizing geographic variation in well−being using tweets. In: Proceedings of International AAAI Conference on Weblogs and Social MediaGoogle Scholar
  52. 52.
    Song X, Nie L, Zhang L, Akbari M, Chua T−S (2015) Multiple social network learning and its application in volunteerism tendency prediction. In: Proceedings of International ACM Conference on Research & Development in Information Retrieval, pp 213–222Google Scholar
  53. 53.
    Song X, Nie L, Zhang L, Liu M, Chua T−S (2015) Interest inference via structure−constrained multi−source multi−task learning. In: Proceedings of International Joint Conference on Artificial Intelligence. AAAI Press, pp 2371–2377Google Scholar
  54. 54.
    Stirman SW, Pennebaker JW (2001) Word use in the poetry of suicidal and nonsuicidal poets. Psychosom Med 63(4):517–522CrossRefGoogle Scholar
  55. 55.
    Tsuya A, Sugawara Y, Tanaka A, Narimatsu H (2014) Do cancer patients tweet? Examining the Twitter use of cancer patients in Japan. J Med Internet Res 16 (5):e137CrossRefGoogle Scholar
  56. 56.
    Van der Maaten L, Hinton G (2008) Visualizing data using t−SNE. J Mach Learn Res 9(2579−2605):85MATHGoogle Scholar
  57. 57.
    Vinod Vydiswaran VG, Yang L, Kai Z, Hanauer DA, Qiaozhu M (2014) User−created groups in health forums: What makes them special?. In: Proceedings of International AAAI Conference on Weblogs and Social MediaGoogle Scholar
  58. 58.
    Volkova S, Bacharach Y, Armstrong M, Sharma V (2015) Inferring latent user properties from texts published in social media. In: Proceedings of Twenty−Ninth Conference on Artificial IntelligenceGoogle Scholar
  59. 59.
    Wang PS, Angermeyer M, Borges G, Bruffaerts R, Chiu WT, Girolamo GD, Fayyad J, Gureje O, Haro JM, Huang Y (2007) Delay and failure in treatment seeking after first onset of mental disorders in the World Health Organization’s World Mental Health Survey Initiative. World Psychiatry 6(3):177Google Scholar
  60. 60.
    Wang S, Chang X, Li X, Sheng QZ , Chen W (2014) Multi−task support vector machines for feature selection with shared knowledge discovery. Signal ProcessGoogle Scholar
  61. 61.
    Waxer PH (1976) Nonverbal cues for depth of depression: Set versus no set. J Consult Clin Psychol 4(3):493CrossRefGoogle Scholar
  62. 62.
    World Health Organization (2009) Global health risks: Mortality and burden of disease attributable to selected major risksGoogle Scholar
  63. 63.
    Yan Y, Liu G, Ricci E, Sebe N (2013) Multi−task linear discriminant analysis for multi−view action recognition. In: Proceedings of IEEE International conference on image processing, pp 2842–2846Google Scholar
  64. 64.
    Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013) No matter where you are: Flexible graph−guided multi−task learning for multi−view head pose classification under target motion. In: Proceedings of IEEE International Conference on Computer Vision, pp 1177–1184Google Scholar
  65. 65.
    Yan Y, Ricci E, Liu G, Sebe N (2015) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Thin Nguyen
    • 1
  • Bridianne O’Dea
    • 2
  • Mark Larsen
    • 2
  • Dinh Phung
    • 1
  • Svetha Venkatesh
    • 1
  • Helen Christensen
    • 2
  1. 1.Centre for Pattern Recognition and Data AnalyticsDeakin UniversityGeelongAustralia
  2. 2.Black Dog InstituteUniversity of New South WalesRandwickAustralia

Personalised recommendations