A Re-labeling Approach Based on Approximate Nearest Neighbors for Identifying Gambling Disorders in Social Media

Fabregat, Hermenegildo; Duque, Andres; Araujo, Lourdes; Martinez-Romo, Juan

doi:10.1007/978-3-031-42448-9_15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14163))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

601 Accesses

Abstract

This paper describes a novel approach based on Approximate Nearest Neighbors (ANN) techniques for modifying the granularity of the label schema in the training dataset of a classification task, from a user-based annotation to a message-based one. In particular, we tackle Task 1 of the CLEF 2022 eRisk Workshop which consists in the processing of messages written by Social Media users, in order to detect early signs of pathological gambling. Our proposal is based on the calculation of the nearest neighbors of the vectorial representations of the given messages, originally annotated at user-level. This way, we obtain a re-labeled training dataset in which messages from the same user can be either positive or negative. We then use this re-labeled dataset for performing the final classification on test instances. Compared to other systems participating in the task, our approach achieves the best average performance in the proposed evaluation frameworks, and shows to be the fastest one in terms of time needed to process the whole test dataset. This indicates that the proposed relabeling scheme allows us to capture more easily the textual information that leads to a correct detection of pathological gambling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aumüller, M., Bernhardsson, E., Faithfull, A.J.: ANN-benchmarks: a benchmarking tool for approximate nearest neighbor algorithms. CoRR abs/1807.05614 (2018). http://arxiv.org/abs/1807.05614
Basile, A., et al.: UPV-symanto at eRisk 2021: mental health author profiling for early risk prediction on the internet. Working Notes of CLEF (2021)
Google Scholar
Bernhardsson, E.: Annoy: approximate nearest neighbors in C++/Python (2018). https://pypi.org/project/annoy/. Python package version 1.13.0
Bucur, A.M., Cosma, A., Dinu, L.P.: Early risk detection of pathological gambling, self-harm and depression using BERT. Working Notes of CLEF (2021)
Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Article Google Scholar
Cer, D., et al.: Universal sentence encoder. CoRR abs/1803.11175 (2018). http://arxiv.org/abs/1803.11175
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423
Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020). https://doi.org/10.1162/tacl_a_00325
Article Google Scholar
Iyyer, M., Manjunatha, V., Boyd-Graber, J., Daumé III, H.: Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1681–1691. Association for Computational Linguistics, Beijing (2015). https://doi.org/10.3115/v1/P15-1162, https://aclanthology.org/P15-1162
Li, W., et al.: Approximate nearest neighbor search on high dimensional data-experiments, analyses, and improvement. IEEE Trans. Knowl. Data Eng. 32(8), 1475–1488 (2019)
Article Google Scholar
Lopes, R.P.: Cedri at erisk 2021: A naive approach to early detection of psychological disorders in social media. In: CEUR Workshop Proceedings, pp. 981–991. CEUR Workshop Proceedings (2021)
Google Scholar
Losada, D.E., Crestani, F., Parapar, J.: Overview of eRisk at CLEF 2020: early risk prediction on the internet (extended overview). In: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, vol. 2696 (2020). http://ceur-ws.org/Vol-2696/paper_253.pdf
Loyola, J.M., Burdisso, S., Thompson, H., Cagnina, L., Errecalde, M.: UNSL at eRisk 2021: a comparison of three early alert policies for early risk detection. In: Working Notes of CLEF 2021-Conference and Labs of the Evaluation Forum, Bucarest, Romania (2021)
Google Scholar
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. CoRR abs/1603.09320 (2016). http://arxiv.org/abs/1603.09320
Maupomé, D., Armstrong, M.D., Rancourt, F., Soulas, T., Meurs, M.J.: Early detection of signs of pathological gambling, self-harm and depression through topic extraction and neural networks. In: Proceedings of the Working Notes of CLEF (2021)
Google Scholar
Parapar, J., Martín-Rodilla, P., Losada, D.E., Crestani, F.: Overview of erisk at CLEF 2021: early risk prediction on the internet (extended overview). In: Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, vol. 2021, no. 2936, pp. 864–887 (2021). http://ceur-ws.org/Vol-2936/paper-72.pdf
Parapar, J., Martín Rodilla, P., Losada, D.E., Crestani, F.: Overview of eRisk 2022: early risk prediction on the internet. In: Crestani, F., et al. (eds.) CLEF 2019. LNCS, vol. 11696, pp. 347–357. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-28577-7_27
Chapter Google Scholar
Potenza, M.N., et al.: Gambling disorder. Nat. Rev. Dis. Primers 5(1), 1–21 (2019)
Article Google Scholar
Potenza, M.N., Kosten, T.R., Rounsaville, B.J.: Pathological gambling. Jama 286(2), 141–144 (2001)
Article Google Scholar
Rash, C.J., Weinstock, J., Van Patten, R.: A review of gambling disorder and substance use disorders. Sugaku Exposit. 7, 3 (2016)
Google Scholar
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762

Download references

Acknowledgments

This work has been partially supported by the Spanish Ministry of Science and Innovation within the DOTT-HEALTH Project (MCI/AEI/FEDER, UE) under Grant PID2019-106942RB-C32 and OBSER-MENH Project (MCIN/AEI/10.13039/501100011033 and NextGenerationEU/PRTR) under Grant TED2021-130398B-C21 as well as project RAICES (IMIENS 2022) and the research network AEI RED2018-102312-T (IA-Biomed).

Author information

Authors and Affiliations

NLP & IR Group, Dpto. Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia (UNED), Juan del Rosal 16, 28040, Madrid, Spain
Hermenegildo Fabregat, Andres Duque, Lourdes Araujo & Juan Martinez-Romo
IMIENS: Instituto Mixto de Investigación, Escuela Nacional de Sanidad, Monforte de Lemos 5, 28019, Madrid, Spain
Andres Duque, Lourdes Araujo & Juan Martinez-Romo
Avature Machine Learning, Madrid, Spain
Hermenegildo Fabregat

Authors

Hermenegildo Fabregat
View author publications
You can also search for this author in PubMed Google Scholar
Andres Duque
View author publications
You can also search for this author in PubMed Google Scholar
Lourdes Araujo
View author publications
You can also search for this author in PubMed Google Scholar
Juan Martinez-Romo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andres Duque .

Editor information

Editors and Affiliations

Democritus University of Thrace, Xanthi, Greece
Avi Arampatzis
University of Amsterdam, Amsterdam, The Netherlands
Evangelos Kanoulas
CERTH-ITI, Thessaloniki, Greece
Theodora Tsikrika
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Utrecht University, Utrecht, The Netherlands
Anastasia Giachanou
Elsevier, Amsterdam, The Netherlands
Dan Li
University of Amsterdam, Amsterdam, The Netherlands
Mohammad Aliannejadi
University of Lausanne, Lausanne, Switzerland
Michalis Vlachos
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fabregat, H., Duque, A., Araujo, L., Martinez-Romo, J. (2023). A Re-labeling Approach Based on Approximate Nearest Neighbors for Identifying Gambling Disorders in Social Media. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. Lecture Notes in Computer Science, vol 14163. Springer, Cham. https://doi.org/10.1007/978-3-031-42448-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-42448-9_15
Published: 11 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42447-2
Online ISBN: 978-3-031-42448-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Re-labeling Approach Based on Approximate Nearest Neighbors for Identifying Gambling Disorders in Social Media