A Test Collection for Research on Depression and Language Use
Several studies in the literature have shown that the words people use are indicative of their psychological states. In particular, depression was found to be associated with distinctive linguistic patterns. However, there is a lack of publicly available data for doing research on the interaction between language and depression. In this paper, we describe our first steps to fill this gap. We outline the methodology we have adopted to build and make publicly available a test collection on depression and language use. The resulting corpus includes a series of textual interactions written by different subjects. The new collection not only encourages research on differences in language between depressed and non-depressed individuals, but also on the evolution of the language use of depressed individuals. Further, we propose a novel early detection task and define a novel effectiveness measure to systematically compare early detection algorithms. This new measure takes into account both the accuracy of the decisions taken by the algorithm and the delay in detecting positive cases. We also present baseline results with novel detection methods that process users’ interactions in different ways.
KeywordsPositive Case Minority Class Depressed Individual Test Collection Late Detection
This research was funded by the Swiss National Science Foundation (project “Early risk prediction on the Internet: an evaluation corpus”, 2015). The first author also thanks the financial support obtained from “Ministerio de Economía y Competitividad” of the Goverment of Spain and FEDER Funds under the research project TIN2015-64282-R.
- 1.Aslam, J., Diaz, F., Ekstrand-Abueg, M., McCreadie, R., Pavlu, V., Sakai, T.: TREC temporal summarization track overview. In: Proceedings of the 23rd Text Retrieval Conference, Gaithersburg (2014)Google Scholar
- 2.Biega, J., Mele, I., Weikum, G.: Probabilistic prediction of privacy risks in user search histories. In: Proceedings of the First International Workshop on Privacy and Security of Big Data, PSBD 2014, pp. 29–36. ACM, New York (2014)Google Scholar
- 3.Choudhury, M.D., Counts, S., Horvitz, E.: Social media as a measurement tool of depression in populations. In: Davis, H.C., Halpin, H., Pentland, A., Bernstein, M., Adamic, L.A. (eds.) WebSci, pp. 47–56. ACM (2013)Google Scholar
- 4.Choudhury, M.D., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Kiciman, E., Ellison, N.B., Hogan, B., Resnick, P., Soboroff, I. (eds.) ICWSM. The AAAI Press (2013)Google Scholar
- 5.Coppersmith, G., Dredze, M., Harman, C.: Quantifying mental health signals in Twitter. In: ACL Workshop on Computational Linguistics and Clinical Psychology (2014)Google Scholar
- 6.Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K., Mitchell, M.: CLPsych: depression and PTSD on Twitter. In: NAACL Workshop on Computational Linguistics and Clinical Psychology (2015)Google Scholar
- 7.Dinakar, K., Weinstein, E., Lieberman, H., Selman, R.L.: Stacked generalization learning to analyze teenage distress. In: Adar, E., Resnick, P., Choudhury, M.D., Hogan, B., Oh, A. (eds.) ICWSM. The AAAI Press (2014)Google Scholar
- 9.Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)Google Scholar
- 10.Nallapati, R.: Discriminative models for information retrieval. In: Proceeding of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64–71 (2004)Google Scholar
- 12.Park, M., Cha, C., Cha, M.: Depressive moods of users portrayed in Twitter. In: 18th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD ) Workshop on Health Informatics (HI-KDD ) (2012)Google Scholar
- 13.Park, M., McDonald, D.W., Cha, M.: Perception differences between the depressed and non-depressed users in Twitter. In: Kiciman, E., Ellison, N.B., Hogan, B., Resnick, P., Soboroff, I. (eds.) ICWSM. The AAAI Press (2013)Google Scholar
- 14.Paul, M.J., Dredze, M.: You are what you Tweet: analyzing Twitter for public health. In: Adamic, L.A., Baeza-Yates, R.A., Counts, S., (eds.) ICWSM. The AAAI Press (2011)Google Scholar