Abstract
This paper briefly describes our research groups’ efforts in tackling Task 1 (Early Detection of Signs of Self-Harm), and Task 2 (Measuring the Severity of the Signs of Depression) from the CLEF eRisk Track. Core to how we approached these problems was the use of BERT-based classifiers which were trained specifically for each task. Our results on both tasks indicate that this approach delivers high performance across a series of measures, particularly for Task 1, where our submissions obtained the best performance for precision, F1, latency-weighted F1 and ERDE at 5 and 50. This work suggests that BERT-based classifiers, when trained appropriately, can accurately infer which social media users are at risk of self-harming, with precision up to 91.3% for Task 1. Given these promising results, it will be interesting to further refine the training regime, classifier and early detection scoring mechanism, as well as apply the same approach to other related tasks (e.g., anorexia, depression, suicide).
Keywords
- Self-harm
- Depression
- Classification
- Social media
- Early detection
- BERT
- XLM-RoBERTa
Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2021, 21–24 September 2021, Bucharest, Romania.
Complementary content: https://github.com/brunneis/ilab-erisk-2020/.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Baumgartner, J., Zannettou, S., Keegan, B., Squire, M., Blackburn, J.: The pushshift reddit dataset. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 14, pp. 830–839 (2020)
Burdisso, S.G., Errecalde, M., Montes-y Gómez, M.: A text classification framework for simple and effective early depression detection over social media streams. Expert Syst. Appl. 133, 182–197 (2019)
Burdisso, S.G., Errecalde, M., Montes-y Gómez, M.: UNSL at eRisk 2019: a unified approach for anorexia, self-harm and depression detection in social media. In: CLEF (Working Notes) (2019)
Chancellor, S., De Choudhury, M.: Methods in predictive techniques for mental health status on social media: a critical review. NPJ Digit. Med. 3(1), 1–11 (2020)
Cohan, A., Young, S., Goharian, N.: Triaging mental health forum posts. In: Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, pp. 143–147 (2016)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451 (2020)
Devlin, J., Chang, M.W.: Open sourcing BERT: state-of-the-art pre-training for natural language processing (2018). http://aiweb.techfak.uni-bielefeld.de/content/bworld-robot-control-software/. Accessed 22 Apr 2021
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gao, Z., Feng, A., Song, X., Wu, X.: Target-dependent sentiment classification with BERT. IEEE Access 7, 154290–154299 (2019)
Lample, G., Conneau, A.: Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291 (2019)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Losada, D.E., Crestani, F.: A test collection for research on depression and language use. In: Fuhr, N., et al. (eds.) CLEF 2016. LNCS, vol. 9822, pp. 28–39. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44564-9_3
Losada, D.E., Crestani, F., Parapar, J.: CLEF 2017 eRisk overview: early risk prediction on the internet: experimental foundations. In: CEUR Workshop Proceedings, vol. 1866 (2017)
Losada, D.E., Crestani, F., Parapar, J.: Overview of eRisk 2018: early risk prediction on the internet (extended lab overview). In: CEUR Workshop Proceedings, vol. 2125 (2018)
Losada, D.E., Crestani, F., Parapar, J.: Overview of eRisk 2019 early risk prediction on the internet. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 340–357. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28577-7_27
Losada, D.E., Crestani, F., Parapar, J.: Overview of eRisk 2020: early risk prediction on the internet. In: Arampatzis, A., et al. (eds.) CLEF 2020. LNCS, vol. 12260, pp. 272–287. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_20
Martínez-Castaño, R., Htait, A., Azzopardi, L., Moshfeghi, Y.: Early risk detection of self-harm and depression severity using BERT-based transformers: iLab at CLEF eRisk 2020. Early Risk Prediction on the Internet (2020)
Nikolov, A., Radivchev, V.: Nikolov-Radivchev at SemEval-2019 task 6: offensive tweet classification with BERT and ensembles. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 691–695 (2019)
Obeid, J.S., et al.: Identifying and predicting intentional self-harm in electronic health record clinical notes: deep learning approach. JMIR Med. Inform. 8(7), e17784 (2020)
Parikh, P., et al.: Multi-label categorization of accounts of sexism using a neural framework. In: EMNLP/IJCNLP (1) (2019)
Peters, M., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), pp. 2227–2237 (2018)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018). https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf. Accessed 22 Apr 2021
Sadeque, F., Xu, D., Bethard, S.: Measuring the latency of depression detection in social media. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 495–503 (2018)
Skaik, R., Inkpen, D.: Using social media for mental health surveillance: a review. ACM Comput. Surv. (CSUR) 53(6), 1–31 (2020)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Yates, A., Cohan, A., Goharian, N.: Depression and self-harm risk assessment in online forums. CoRR abs/1709.01848 (2017)
Acknowledgements
The first author would like to thank the following funding bodies for their support: FEDER/Ministerio de Ciencia, Innovación y Universidades, Agencia Estatal de Investigación/Project (RTI2018-093336-B-C21), Consellería de Educación, Universidade e Formación Profesional and the European Regional Development Fund (ERDF) (accreditation 2019–2022 ED431G-2019/04, ED431C 2018/29, ED431C 2018/19).
The second and third authors would like to thank the UKRI’s EPSRC Project Cumulative Revelations in Personal Data (Grant Number: EP/R033897/1) for their support. We would also like to thank David Losada for arranging this collaboration.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Martínez-Castaño, R., Htait, A., Azzopardi, L., Moshfeghi, Y. (2021). BERT-Based Transformers for Early Detection of Mental Health Illnesses. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-85251-1_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)