Abstract
This paper describes a novel approach based on Approximate Nearest Neighbors (ANN) techniques for modifying the granularity of the label schema in the training dataset of a classification task, from a user-based annotation to a message-based one. In particular, we tackle Task 1 of the CLEF 2022 eRisk Workshop which consists in the processing of messages written by Social Media users, in order to detect early signs of pathological gambling. Our proposal is based on the calculation of the nearest neighbors of the vectorial representations of the given messages, originally annotated at user-level. This way, we obtain a re-labeled training dataset in which messages from the same user can be either positive or negative. We then use this re-labeled dataset for performing the final classification on test instances. Compared to other systems participating in the task, our approach achieves the best average performance in the proposed evaluation frameworks, and shows to be the fastest one in terms of time needed to process the whole test dataset. This indicates that the proposed relabeling scheme allows us to capture more easily the textual information that leads to a correct detection of pathological gambling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aumüller, M., Bernhardsson, E., Faithfull, A.J.: ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. CoRR abs/1807.05614 (2018). http://arxiv.org/abs/1807.05614
Basile, A., et al.: UPV-symanto at eRisk 2021: mental health author profiling for early risk prediction on the internet. Working Notes of CLEF (2021)
Bernhardsson, E.: Annoy: approximate nearest neighbors in C++/Python (2018). https://pypi.org/project/annoy/. Python package version 1.13.0
Bucur, A.M., Cosma, A., Dinu, L.P.: Early risk detection of pathological gambling, self-harm and depression using BERT. Working Notes of CLEF (2021)
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Cer, D., et al.: Universal sentence encoder. CoRR abs/1803.11175 (2018). http://arxiv.org/abs/1803.11175
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020). https://doi.org/10.1162/tacl_a_00325
Iyyer, M., Manjunatha, V., Boyd-Graber, J., Daumé III, H.: Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1681–1691. Association for Computational Linguistics, Beijing (2015). https://doi.org/10.3115/v1/P15-1162, https://aclanthology.org/P15-1162
Li, W., et al.: Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement. IEEE Trans. Knowl. Data Eng. 32(8), 1475–1488 (2019)
Lopes, R.P.: Cedri at erisk 2021: A naive approach to early detection of psychological disorders in social media. In: CEUR Workshop Proceedings, pp. 981–991. CEUR Workshop Proceedings (2021)
Losada, D.E., Crestani, F., Parapar, J.: Overview of eRisk at CLEF 2020: early risk prediction on the internet (extended overview). In: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, vol. 2696 (2020). http://ceur-ws.org/Vol-2696/paper_253.pdf
Loyola, J.M., Burdisso, S., Thompson, H., Cagnina, L., Errecalde, M.: UNSL at eRisk 2021: a comparison of three early alert policies for early risk detection. In: Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum, Bucarest, Romania (2021)
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. CoRR abs/1603.09320 (2016). http://arxiv.org/abs/1603.09320
Maupomé, D., Armstrong, M.D., Rancourt, F., Soulas, T., Meurs, M.J.: Early detection of signs of pathological gambling, self-harm and depression through topic extraction and neural networks. In: Proceedings of the Working Notes of CLEF (2021)
Parapar, J., Martín-Rodilla, P., Losada, D.E., Crestani, F.: Overview of erisk at CLEF 2021: early risk prediction on the internet (extended overview). In: Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, vol. 2021, no. 2936, pp. 864–887 (2021). http://ceur-ws.org/Vol-2936/paper-72.pdf
Parapar, J., Martín Rodilla, P., Losada, D.E., Crestani, F.: Overview of eRisk 2022: early risk prediction on the internet. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 347–357. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-28577-7_27
Potenza, M.N., et al.: Gambling disorder. Nat. Rev. Dis. Primers 5(1), 1–21 (2019)
Potenza, M.N., Kosten, T.R., Rounsaville, B.J.: Pathological gambling. Jama 286(2), 141–144 (2001)
Rash, C.J., Weinstock, J., Van Patten, R.: A review of gambling disorder and substance use disorders. Sugaku Exposit. 7, 3 (2016)
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Acknowledgments
This work has been partially supported by the Spanish Ministry of Science and Innovation within the DOTT-HEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32 and OBSER-MENH Project (MCIN/AEI/10.13039/501100011033 and NextGenerationEU/PRTR) under Grant TED2021-130398B-C21 as well as project RAICES (IMIENS 2022) and the research network AEI RED2018-102312-T (IA-Biomed).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fabregat, H., Duque, A., Araujo, L., Martinez-Romo, J. (2023). A Re-labeling Approach Based on Approximate Nearest Neighbors for Identifying Gambling Disorders in Social Media. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-42448-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42447-2
Online ISBN: 978-3-031-42448-9
eBook Packages: Computer ScienceComputer Science (R0)