Skip to main content

A Re-labeling Approach Based on Approximate Nearest Neighbors for Identifying Gambling Disorders in Social Media

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2023)

Abstract

This paper describes a novel approach based on Approximate Nearest Neighbors (ANN) techniques for modifying the granularity of the label schema in the training dataset of a classification task, from a user-based annotation to a message-based one. In particular, we tackle Task 1 of the CLEF 2022 eRisk Workshop which consists in the processing of messages written by Social Media users, in order to detect early signs of pathological gambling. Our proposal is based on the calculation of the nearest neighbors of the vectorial representations of the given messages, originally annotated at user-level. This way, we obtain a re-labeled training dataset in which messages from the same user can be either positive or negative. We then use this re-labeled dataset for performing the final classification on test instances. Compared to other systems participating in the task, our approach achieves the best average performance in the proposed evaluation frameworks, and shows to be the fastest one in terms of time needed to process the whole test dataset. This indicates that the proposed relabeling scheme allows us to capture more easily the textual information that leads to a correct detection of pathological gambling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/spotify/annoy.

  2. 2.

    https://gamblershelp.com.au; http://getgamblingfacts.ca; https://gamtalk.org; https://gamcare.org.uk.

  3. 3.

    https://www.gamtalk.org/groups/community/.

References

  1. Aumüller, M., Bernhardsson, E., Faithfull, A.J.: ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. CoRR abs/1807.05614 (2018). http://arxiv.org/abs/1807.05614

  2. Basile, A., et al.: UPV-symanto at eRisk 2021: mental health author profiling for early risk prediction on the internet. Working Notes of CLEF (2021)

    Google Scholar 

  3. Bernhardsson, E.: Annoy: approximate nearest neighbors in C++/Python (2018). https://pypi.org/project/annoy/. Python package version 1.13.0

  4. Bucur, A.M., Cosma, A., Dinu, L.P.: Early risk detection of pathological gambling, self-harm and depression using BERT. Working Notes of CLEF (2021)

    Google Scholar 

  5. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)

    Article  Google Scholar 

  6. Cer, D., et al.: Universal sentence encoder. CoRR abs/1803.11175 (2018). http://arxiv.org/abs/1803.11175

  7. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423

  8. Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020). https://doi.org/10.1162/tacl_a_00325

    Article  Google Scholar 

  9. Iyyer, M., Manjunatha, V., Boyd-Graber, J., Daumé III, H.: Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1681–1691. Association for Computational Linguistics, Beijing (2015). https://doi.org/10.3115/v1/P15-1162, https://aclanthology.org/P15-1162

  10. Li, W., et al.: Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement. IEEE Trans. Knowl. Data Eng. 32(8), 1475–1488 (2019)

    Article  Google Scholar 

  11. Lopes, R.P.: Cedri at erisk 2021: A naive approach to early detection of psychological disorders in social media. In: CEUR Workshop Proceedings, pp. 981–991. CEUR Workshop Proceedings (2021)

    Google Scholar 

  12. Losada, D.E., Crestani, F., Parapar, J.: Overview of eRisk at CLEF 2020: early risk prediction on the internet (extended overview). In: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, vol. 2696 (2020). http://ceur-ws.org/Vol-2696/paper_253.pdf

  13. Loyola, J.M., Burdisso, S., Thompson, H., Cagnina, L., Errecalde, M.: UNSL at eRisk 2021: a comparison of three early alert policies for early risk detection. In: Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum, Bucarest, Romania (2021)

    Google Scholar 

  14. Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. CoRR abs/1603.09320 (2016). http://arxiv.org/abs/1603.09320

  15. Maupomé, D., Armstrong, M.D., Rancourt, F., Soulas, T., Meurs, M.J.: Early detection of signs of pathological gambling, self-harm and depression through topic extraction and neural networks. In: Proceedings of the Working Notes of CLEF (2021)

    Google Scholar 

  16. Parapar, J., Martín-Rodilla, P., Losada, D.E., Crestani, F.: Overview of erisk at CLEF 2021: early risk prediction on the internet (extended overview). In: Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, vol. 2021, no. 2936, pp. 864–887 (2021). http://ceur-ws.org/Vol-2936/paper-72.pdf

  17. Parapar, J., Martín Rodilla, P., Losada, D.E., Crestani, F.: Overview of eRisk 2022: early risk prediction on the internet. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 347–357. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-28577-7_27

    Chapter  Google Scholar 

  18. Potenza, M.N., et al.: Gambling disorder. Nat. Rev. Dis. Primers 5(1), 1–21 (2019)

    Article  Google Scholar 

  19. Potenza, M.N., Kosten, T.R., Rounsaville, B.J.: Pathological gambling. Jama 286(2), 141–144 (2001)

    Article  Google Scholar 

  20. Rash, C.J., Weinstock, J., Van Patten, R.: A review of gambling disorder and substance use disorders. Sugaku Exposit. 7, 3 (2016)

    Google Scholar 

  21. Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762

Download references

Acknowledgments

This work has been partially supported by the Spanish Ministry of Science and Innovation within the DOTT-HEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32 and OBSER-MENH Project (MCIN/AEI/10.13039/501100011033 and NextGenerationEU/PRTR) under Grant TED2021-130398B-C21 as well as project RAICES (IMIENS 2022) and the research network AEI RED2018-102312-T (IA-Biomed).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andres Duque .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fabregat, H., Duque, A., Araujo, L., Martinez-Romo, J. (2023). A Re-labeling Approach Based on Approximate Nearest Neighbors for Identifying Gambling Disorders in Social Media. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42448-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42447-2

  • Online ISBN: 978-3-031-42448-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics