Skip to main content
Log in

Evaluating and improving lexical resources for detecting signs of depression in text

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

While considerable attention has been given to the analysis of texts written by depressed individuals, few studies were interested in evaluating and improving lexical resources for supporting the detection of signs of depression in text. In this paper, we present a search-based methodology to evaluate existing depression lexica. To meet this aim, we exploit existing resources for depression and language use and we analyze which elements of the lexicon are the most effective at revealing depression symptoms. Furthermore, we propose innovative expansion strategies able to further enhance the quality of the lexica.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. http://www.who.int/mediacentre/factsheets/fs369/en/.

  2. https://www.wikipedia.org/.

  3. https://tec.citius.usc.es/ir/code/dc.html.

  4. The collection was divided into two halves because the early risk challenge proposed in Losada et al. (2017b) promoted the development of supervised learning solutions. DLU16A was the training split and DLU16B was the test split. We are concerned here with unsupervised (search-based) methods and, therefore, we use these two splits as independent test corpora.

  5. https://code.google.com/archive/p/word2vec/.

  6. https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/.

  7. http://fegalaz.usc.es/~gamallo/resources/count-models.tar.gz.

  8. http://dumps.wikimedia.org/enwiktionary.

  9. Observe that these experiments involve a single search for depressed individuals and, thus, we cannot perform tests of statistical significance over the differences found.

  10. https://github.com/gamallo/depression_classification.

References

  • Abdaoui, A., Azé, J., Bringay, S., & Poncelet, P. (2017). Feel: A french expanded emotion lexicon. Language Resources and Evaluation, 51(3), 833–855.

    Article  Google Scholar 

  • Almeida, H., Briand, A., & Meurs, M. J. (2017). Detecting early risk of depression from social media user-generated content. In Working notes of CLEF 2017: Conference and labs of the evaluation forum, CEUR workshop proceedings.

  • Baccianella, S., Esuli, A., & Sebastiani, F. (2010). SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta. http://www.lrec-conf.org/proceedings/lrec2010/pdf/769_Paper.pdf.

  • Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search. Reading: Addison Wesley.

    Google Scholar 

  • Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meeting of the association for computational linguistics (Vol. 1: long papers), Baltimore, Maryland, pp. 238–247.

  • Benamara, F., Cesarano, C., Picariello, A., & Reforgiato, D. (2007). Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In Proceedings of ICWSM conference.

  • Biemann, C. (2016). Vectors or graphs? On differences of representations for distributional semantic models. In Proceedings of the workshop on cognitive aspects of the lexicon, Osaka, Japan, pp. 1–7.

  • Biemann, C., & Riedl, M. (2013). Text: Now in 2d! a framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1), 55–95.

    Article  Google Scholar 

  • Blacoe, W., & Lapata, M. (2012). A comparison of vector-based representations for semantic composition. In Empirical methods in natural language processing—EMNLP-2012, Jeju Island, Korea (pp. 546–556).

  • Bordag, S. (2008). A comparison of co-occurrence and similarity measures as simulations of context. In 9th CICLing, pp. 52–63.

  • Brandt, M., & Boucher, J. (1986). Concepts of depression in emotion lexicons of eight cultures. International Journal of Intercultural Relations, 10(3), 321–346. https://doi.org/10.1016/0147-1767(86)90016-7.

    Article  Google Scholar 

  • Cepoiu, M., McCusker, J., Cole, M. G., Sewitch, M., Belzile, E., & Ciampi, A. (2008). Recognition of depression by non-psychiatric physicians: A systematic literature review and meta-analysis. Journal of General Internal Medicine, 23(1), 25–36.

    Article  Google Scholar 

  • Cheng, F. P. G., Ramos, M. R., Bitsch, Á. J., Jonas, M. S., Ix, T., See, Q. P. L., et al. (2016). Psychologist in a pocket: Lexicon development and content validation of a mobile-based app for depression screening. JMIR Mhealth Uhealth, 4(3), e88. https://doi.org/10.2196/mhealth.5284.

    Article  Google Scholar 

  • Chenlo, J. M., & Losada, D. E. (2014). An empirical study of sentence features for subjectivity and polarity classification. Information Sciences, 280, 275–288.

    Article  Google Scholar 

  • Choudhury, M. D., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media. In E. Kiciman, N. B. Ellison, B. Hogan, P. Resnick, & I. Soboroff (Eds.) ICWSM. The AAAI Press. http://dblp.uni-trier.de/db/conf/icwsm/icwsm2013.html#ChoudhuryGCH13.

  • Coppersmith, G., Dredze, M., & Harman, C. (2014). Quantifying mental health signals in Twitter. In ACL workshop on computational linguistics and clinical psychology.

  • Devitt, A., & Ahmad, K. (2013). Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Language Resources and Evaluation, 47(2), 475–511.

    Article  Google Scholar 

  • Fellbaum, C. (1998). A semantic network of English: The mother of all WordNets. Computer and the Humanities, 32, 209–220.

    Article  Google Scholar 

  • Gamallo, P. (2017). Comparing explicit and predictive distributional semantic models endowed with syntactic contexts. Language Resources and Evaluation, 51(3), 727–743.

    Article  Google Scholar 

  • Gamallo, P., & Bordag, S. (2011). Is singular value decomposition useful for word simalirity extraction. Language Resources and Evaluation, 45(2), 95–119.

    Article  Google Scholar 

  • Gamallo, P., & Garcia, M. (2017). Linguakit: uma ferramenta multilingue para a análise linguística e a extração de informação. Linguamática, 9(1), 19–28.

    Article  Google Scholar 

  • Guntuku, S. C., Yaden, D. B., Kern, M. L., Ungar, L. H., & Eichstaedt, J. C. (2017). Detecting depression and mental illness on social media: An integrative review. Current Opinion in Behavioral Sciences, 18(Supplement C), 43–49. sI: 18: Big data in the behavioural sciences (2017).

  • Huang, E., Socher, R., & Manning, C. (2012). Improving word representations via global context and multiple word prototypes. In ACL-2012, Jeju Island, Korea, pp. 873–882.

  • Landauer, T., & Dumais, S. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquision, induction and representation of knowledge. Psychological Review, 10(2), 211–240.

    Article  Google Scholar 

  • Lebret, R., & Collobert, R. (2015). Rehabilitation of count-based models for word vector representations. In A. F. Gelbukh (Ed) CICLing (1). Lecture notes in computer science (vol. 9041, pp. 417–429). Springer.

  • Levy, O., & Goldberg, Y. (2014a). Dependency-based word embeddings. In Proceedings of the 52nd annual meeting of the association for computational linguistics, ACL 2014, June 22–27, 2014, Baltimore, MD, USA, pp. 302–308.

  • Levy, O., & Goldberg, Y. (2014b). Linguistic regularities in sparse and explicit word representations. In Proceedings of the eighteenth conference on computational natural language learning, CoNLL 2014, Baltimore, Maryland, USA, June 26–27, 2014, pp. 171–180.

  • Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.

    Article  Google Scholar 

  • Liu, B. (2012). Sentiment analysis and opinion mining. San Rafael: Morgan & Claypool Publishers.

    Book  Google Scholar 

  • Losada, D. E., & Crestani, F. (2016). A test collection for research on depression and language use. In Proceedings conference and labs of the evaluation forum CLEF 2016, Evora, Portugal.

  • Losada, D. E., Crestani, F., & Parapar, J. (2017a). CLEF 2017 eRisk overview: Early risk prediction on the internet: Experimental foundations. In Working notes of CLEF 2017: Conference and labs of the evaluation forum, CEUR workshop proceedings.

  • Losada, D. E., Crestani, F., & Parapar, J. (2017b). eRISK 2017: CLEF lab on early risk prediction on the internet: Experimental foundations. In 8th international conference of the CLEF association (pp. 346–360). Springer Verlag.

  • Mikolov, T., Yih, Wt., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies, Atlanta, Georgia, pp. 746–751.

  • Mitchell, A. J., Rao, S., & Vaze, A. (2011). International comparison of clinicians’ ability to identify depression in primary care: Meta-analysis and meta-regression of predictors. British Journal of General Practice, 61(583), e72–e80.

    Article  Google Scholar 

  • Mitra, B., & Craswell, N. (2017). An introduction to neural information retrieval. Foundations and Trends in Information Retrieval (to appear).

  • Nease, D., & Maloin, J. (2003). Depression screening: A practical strategy. The Journal of Family Practice, 52(2), 118–124.

    Google Scholar 

  • Neuman, Y., Assaf, D., Cohen, Y., & Knoll, J. L. (2015). Profiling school shooters: Automatic text-based analysis. Frontiers in Psychiatry, 6, 86. https://doi.org/10.3389/fpsyt.2015.00086.

    Article  Google Scholar 

  • Neuman, Y., Cohen, Y., Assaf, D., & Kedma, G. (2012). Proactive screening for depression through metaphorical and automatic text analysis. Artificial Intelligence in Medicine, 56(1), 19–25.

    Article  Google Scholar 

  • Padró, M., Idiart, M., Villavicencio, A., & Ramisch, C. (2014). Nothing like good old frequency: Studying context filters for distributional thesauri. In Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, a meeting of SIGDAT, a special interest group of the ACL, pp. 419–424.

  • Piasecki, M., Szpakowicz, S., Fellbaum, C., & Pedersen, B. S. (2013). Introduction to the special issue: On wordnets and relations. Language Resources and Evaluation, 47(3), 757–767.

    Article  Google Scholar 

  • Ramirez-Esparza, N., Chung, C. K., Kacewicz, E., & Pennebaker, J. W. (2008). The psychology of word use in depression forums in english and in spanish: Testing two text analytic approaches. In Proceddings of the ICWSM 2008.

  • Schwartz, H. A., Eichstaedt, J., Kern, M. L., Park, G., Sap, M., Stillwell, D., Kosinski, M., & Ungar, L. (2014). Towards assessing changes in degree of depression through facebook. In ACL workshop on computational linguistics and clinical psychology, pp. 118–125.

  • Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for computational linguistics (pp. 417–424). Association for Computational Linguistics.

  • Wang, L., & Xia, R. (2017). Sentiment lexicon construction with representation learning based on hierarchical sentiment supervision. In Proceedings of the 2017 conference on empirical methods in natural language processing, EMNLP 2017, Copenhagen, Denmark, September 9–11, 2017, pp. 502–510.https://aclanthology.info/papers/D17-1052/d17-1052.

  • Wang, P., Lane, M., Olfson, M., Pincus, H., Wells, K., & Kessler, R. (2005). Twelve-month use of mental health services in the United States: Results from the national comorbidity survey replication. Archives of General Psychiatry, 62(6), 629–640.

    Article  Google Scholar 

Download references

Acknowledgements

This work has received financial support from (i) the “Ministerio de Economía y Competitividad” of the Government of Spain and FEDER Funds under the research Project TIN2015-64282-R, (ii) a 2016 BBVA Foundation Grant for Researchers and Cultural Creators, (iii) a TelePares (MINECO, ref:FFI2014-51978-C2-1-R) project, and iv) Xunta de Galicia—“Consellería de Cultura, Educación e Ordenación Universitaria” and FEDER Funds through the following 2016–2019 accreditation: ED431G/08.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David E. Losada.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Losada, D.E., Gamallo, P. Evaluating and improving lexical resources for detecting signs of depression in text. Lang Resources & Evaluation 54, 1–24 (2020). https://doi.org/10.1007/s10579-018-9423-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-018-9423-1

Keywords

Navigation