Can We Assess Mental Health Through Social Media and Smart Devices? Addressing Bias in Methodology and Evaluation

  • Adam TsakalidisEmail author
  • Maria Liakata
  • Theo Damoulas
  • Alexandra I. Cristea
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11053)


Predicting mental health from smartphone and social media data on a longitudinal basis has recently attracted great interest, with very promising results being reported across many studies [3, 9, 13, 26]. Such approaches have the potential to revolutionise mental health assessment, if their development and evaluation follows a real world deployment setting. In this work we take a closer look at state-of-the-art approaches, using different mental health datasets and indicators, different feature sources and multiple simulations, in order to assess their ability to generalise. We demonstrate that under a pragmatic evaluation framework, none of the approaches deliver or even approach the reported performances. In fact, we show that current state-of-the-art approaches can barely outperform the most naïve baselines in the real-world setting, posing serious questions not only about their deployment ability, but also about the contribution of the derived features for the mental health assessment task and how to make better use of such data in the future.


Mental health Bias Evaluation Wellbeing Natural language processing Smartphones Sensors Social media Challenges 



The current work was supported by the EPSRC through the University of Warwick’s CDT in Urban Science and Progress (grant EP/L016400/1) and through The Alan Turing Institute (grant EP/N510129/1). We would like to thank the anonymous reviewers for their detailed feedback and the authors of the works that were analysed in our paper (N. Jaques, R. LiKamWa, M. Musolesi) for the fruitful discussions over several aspects of the presented challenges.


  1. 1.
    Bogomolov, A., Lepri, B., Ferron, M., Pianesi, F., Pentland, A.S.: Pervasive stress recognition for sustainable living. In: 2014 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), pp. 345–350. IEEE (2014)Google Scholar
  2. 2.
    Bogomolov, A., Lepri, B., Pianesi, F.: Happiness recognition from mobile phone data. In: 2013 International Conference on Social Computing (SocialCom), pp. 790–795. IEEE (2013)Google Scholar
  3. 3.
    Canzian, L., Musolesi, M.: Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 1293–1304. ACM (2015)Google Scholar
  4. 4.
    DeMasi, O., Kording, K., Recht, B.: Meaningless comparisons lead to false optimism in medical machine learning. PLoS One 12(9), e0184604 (2017)CrossRefGoogle Scholar
  5. 5.
    Farhan, A.A., et al.: Behavior vs. Introspection: refining prediction of clinical depression via smartphone sensing data. In: Wireless Health, pp. 30–37 (2016)Google Scholar
  6. 6.
    Gimpel, K., et al.: Part-of-speech tagging for Twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 42–47. Association for Computational Linguistics (2011)Google Scholar
  7. 7.
    Herrman, H., Saxena, S., Moodie, R., et al.: Promoting mental health: concepts, emerging evidence, practice: a report of the world health organization, Department of Mental Health and Substance Abuse in Collaboration with the Victorian Health Promotion Foundation and the University of Melbourne. World Health Organization (2005)Google Scholar
  8. 8.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177. ACM (2004)Google Scholar
  9. 9.
    Jaques, N., Taylor, S., Azaria, A., Ghandeharioun, A., Sano, A., Picard, R.: Predicting students’ happiness from physiology, phone, mobility, and behavioral data. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 222–228. IEEE (2015)Google Scholar
  10. 10.
    Jaques, N., Taylor, S., Sano, A., Picard, R.: Multi-task, multi-kernel learning for estimating individual wellbeing. In: Proceedings NIPS Workshop on Multimodal Machine Learning, Montreal, Quebec (2015)Google Scholar
  11. 11.
    Kiritchenko, S., Zhu, X., Mohammad, S.M.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014)CrossRefGoogle Scholar
  12. 12.
    Kroenke, K., Strine, T.W., Spitzer, R.L., Williams, J.B., Berry, J.T., Mokdad, A.H.: The PHQ-8 as a measure of current depression in the general population. J. Affect. Disord. 114(1), 163–173 (2009)CrossRefGoogle Scholar
  13. 13.
    LiKamWa, R., Liu, Y., Lane, N.D., Zhong, L.: MoodScope: building a mood sensor from smartphone usage patterns. In: Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, pp. 389–402. ACM (2013)Google Scholar
  14. 14.
    Ma, Y., Xu, B., Bai, Y., Sun, G., Zhu, R.: Daily mood assessment based on mobile phone sensing. In: 2012 9th International Conference on Wearable and Implantable Body Sensor Networks (BSN), pp. 142–147. IEEE (2012)Google Scholar
  15. 15.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  16. 16.
    Mohammad, S.: #Emotional Tweets. In: *SEM 2012: The 1st Joint Conference on Lexical and Computational Semantics - Proceedings of the Main Conference and the Shared Task, and Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), vols. 1 and 2, pp. 246–255. Association for Computational Linguistics (2012)Google Scholar
  17. 17.
    Mohammad, S., Dunne, C., Dorr, B.: Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2, pp. 599–608. Association for Computational Linguistics (2009)Google Scholar
  18. 18.
    Nielsen, F.Å.: A new ANEW: evaluation of a word list for sentiment analysis in microblogs. In: Workshop on ‘Making Sense of Microposts’: Big Things Come in Small Packages, pp. 93–98 (2011)Google Scholar
  19. 19.
    OECD: How’s Life? 2013: Measuring Well-being (2013).
  20. 20.
    Olesen, J., Gustavsson, A., Svensson, M., Wittchen, H.U., Jönsson, B.: The economic cost of brain disorders in Europe. Eur. J. Neurol. 19(1), 155–162 (2012)CrossRefGoogle Scholar
  21. 21.
    Preoţiuc-Pietro, D., Volkova, S., Lampos, V., Bachrach, Y., Aletras, N.: Studying user income through language, behaviour and affect in social media. PloS One 10(9), e0138717 (2015)CrossRefGoogle Scholar
  22. 22.
    Servia-Rodríguez, S., Rachuri, K.K., Mascolo, C., Rentfrow, P.J., Lathia, N., Sandstrom, G.M.: Mobile sensing at the service of mental well-being: a large-scale longitudinal study. In: Proceedings of the 26th International Conference on World Wide Web, pp. 103–112. International World Wide Web Conferences Steering Committee (2017)Google Scholar
  23. 23.
    Suhara, Y., Xu, Y., Pentland, A.: DeepMood: forecasting depressed mood based on self-reported histories via recurrent neural networks. In: Proceedings of the 26th International Conference on World Wide Web, pp. 715–724. International World Wide Web Conferences Steering Committee (2017)Google Scholar
  24. 24.
    Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for Twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1555–1565 (2014)Google Scholar
  25. 25.
    Tennant, R., et al.: The Warwick-Edinburgh mental well-being scale (WEMWBS): development and UK validation. Health Qual. Life Outcomes 5(1), 63 (2007)CrossRefGoogle Scholar
  26. 26.
    Tsakalidis, A., Liakata, M., Damoulas, T., Jellinek, B., Guo, W., Cristea, A.I.: Combining heterogeneous user generated data to sense well-being. In: Proceedings of the 26th International Conference on Computational Linguistics, pp. 3007–3018 (2016)Google Scholar
  27. 27.
    Wang, R., et al.: CrossCheck: toward passive sensing and detection of mental health changes in people with schizophrenia. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 886–897. ACM (2016)Google Scholar
  28. 28.
    Wang, R., et al.: StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 3–14. ACM (2014)Google Scholar
  29. 29.
    Watson, D., Clark, L.A., Tellegen, A.: Development and validation of brief measures of positive and negative affect: the PANAS scales. J. Pers. Soc. Psychol. 54(6), 1063 (1988)CrossRefGoogle Scholar
  30. 30.
    Zhu, X., Kiritchenko, S., Mohammad, S.M.: NRC-Canada-2014: recent improvements in the sentiment analysis of Tweets. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 443–447. Citeseer (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Adam Tsakalidis
    • 1
    • 2
    Email author
  • Maria Liakata
    • 1
    • 2
  • Theo Damoulas
    • 1
    • 2
  • Alexandra I. Cristea
    • 1
    • 3
  1. 1.Department of Computer ScienceUniversity of WarwickCoventryUK
  2. 2.The Alan Turing InstituteLondonUK
  3. 3.Department of Computer ScienceDurham UniversityDurhamUK

Personalised recommendations