Skip to main content

Advertisement

Log in

Domain-adaptive pre-training on a BERT model for the automatic detection of misogynistic tweets in Spanish

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Violence against women is a major social issue. One in every three women worldwide has been subjected to physical or sexual violence. The pervasive violence against women in the physical world, the ever-growing presence of social media in our lives, and its lack of content moderation have led to an influx of misogynistic social media content. We contribute to preventing violence against women by introducing a BERT architecture with domain-adaptive pre-training to detect misogynistic tweets in Spanish automatically. We used the IbeEval 2018 Spanish dataset for automatic misogyny identification, obtaining an accuracy of 84.60%, precision of 79.64%, recall at 86.70%, and F-1 score of 83.02%, outperforming the state of the art. We also conducted a manual error analysis and discovered 469 mislabeled tweets and a misogynistic bias in the IbeEval 2018 Spanish dataset. Our debiased model outperformed the current literature on automatic misogyny detection with an accuracy of 84.35%, precision of 84.64%, recall of 83.93%, and F-1 score of 84.28%. Lastly, we addressed the need for misogyny detection on other social media by experimenting with a manually curated and labeled dataset of Facebook comments in Spanish for automatic misogyny detection. We obtained excellent results with an accuracy of 87.85%. Misogyny is a complex social issue, so an interdisciplinary approach might benefit future models for automatically detecting misogyny.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://github.com/JustAnotherArchivist/snscrape.

  2. https://github.com/thomasthiebaud/spacy-fastlang.

  3. https://huggingface.co/.

References

  • Aayel A, Magdy W (2021) Stance detection on social media: state of the art and trends. Inf Process Manag 58(4):102–597. https://doi.org/10.1016/j.ipm.2021.102597

    Article  Google Scholar 

  • Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Effects of user similarity in social media. In: Proceedings of the fifth ACM international conference on web search and data mining. Association for Computing Machinery, New York, NY, USA, pp 703–712. https://doi.org/10.1145/2124295.2124378

  • Banda JM, Tekumalla R, Wang G, Yu J, Liu T, Ding Y, Artemova K, Tutubalina E, Chowell G (2022) A large-scale COVID-19 Twitter chatter dataset for open scientific research—an international collaboration. Zenodo. https://doi.org/10.5281/zenodo.7297788

  • Bashar MA, Nayak R, Suzor N (2020) Regularising lstm classifier by transfer learning for detecting misogynistic tweets with small training set. Knowl Inf Syst 62:4029–4054. https://doi.org/10.1007/s10115-020-01481-0

    Article  Google Scholar 

  • Basile V, Bosco C, Fersini E, Nozza D, Patti V, Rangel Pardo FM, Rosso P, Sanguinetti M (2019) SemEval-2019 task 5: multilingual detection of hate speech against immigrants and women in Twitter. In: Proceedings of the 13th international workshop on semantic evaluation. Association for Computational Linguistics, Minneapolis, Minnesota, USA, pp 54–63. https://doi.org/10.18653/v1/S19-2007

  • Blake KR, O’Dean SM, Lian J, Denson TF (2021) Misogynistic tweets correlate with violence against women. Psychol Sci 32(3):315–325. https://doi.org/10.1177/0956797620968529

    Article  Google Scholar 

  • Cañete J, Chaperon G, Fuentes R, Pérez J, Ho J-H, Kang H (2020) Spanish pre-trained bert model and evaluation data. In: Practical machine learning for developing countries workshop at the international conference on learning representations 2020

  • Canós JS (2018) Misogyny identification through SVM at ibereval 2018. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR Workshop Proceedings, vol 2150, pp 229–233. CEUR-WS.org. http://ceur-ws.org/Vol-2150/AMI_paper1.pdf

  • Comito C, Falcone D, Talia D (2017) A peak detection method to uncover events from social media. In: Proceedings of the IEEE international conference on data science and advanced analytics (DSAA), pp 459–467 (2017). https://doi.org/10.1109/DSAA.2017.69

  • Council of Europe (2023) Cyberviolence against women. https://www.coe.int/en/web/cyberviolence/cyberviolence-against-women

  • Coyne SM, Rogers AA, Zurcher JD, Stockdale L, Booth M (2020) Does time spent using social media impact mental health? An eight year longitudinal study. Comput Hum Behav 104:106160. https://doi.org/10.1016/j.chb.2019.106160

    Article  Google Scholar 

  • Devlin J, Chang M.-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (long and short papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423

  • Dwivedi A, Lewis C (2021) How millennials’ life concerns shape social media behaviour. Behav Inf Technol 40(14):1467–1484. https://doi.org/10.1080/0144929X.2020.1760938

    Article  Google Scholar 

  • Fersini E, Rosso P, Anzovino M (2018) Overview of the task on automatic misogyny identification at ibereval 2018. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR Workshop proceedings, vol 2150, pp 214–228. CEUR-WS.org. http://ceur-ws.org/Vol-2150/overview-AMI.pdf

  • Frenda S, Ghanem B, Montes-y-Gómez M (2018) Exploration of misogyny in Spanish and English tweets. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th conference of the Spanish society for natural language processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR workshop proceedings, vol 2150, pp 260–267. CEUR-WS.org. http://ceur-ws.org/Vol-2150/AMI_paper6.pdf

  • Fulper R, Ciampaglia GL, Ferrara E, Ahn Y, Flammini A, Menczer F, Lewis B, Rowe K (2014) Misogynistic language on twitter and sexual violence. In: ChASM’14: computational approaches to social modeling

  • García-Díaz J.A, Cánovas-García M, Colomo-Palacios R, Valencia-García R (2021) Detecting misogyny in Spanish tweets. an approach based on linguistics features and word embeddings. Future Gener Comput Syst 114:506–518. https://doi.org/10.1016/j.future.2020.08.032

  • García-Díaz J, Jiménez-Zafra SM, García-Cumbreras MA (2022) Valencia–García R Evaluating feature combination strategies for hate-speech detection in Spanish using linguistic features and transformers. Complex Intell Sys. https://doi.org/10.1007/s40747-022-00693-x

    Article  Google Scholar 

  • Gobierno de México (2016) Qué es el feminicidio y cómo identificarlo? https://www.gob.mx/conavim/articulos/que-es-el-feminicidio-y-como-identificarlo?idiom=es

  • Goenaga I, Atutxa A, Gojenola K, Casillas A, de Ilarraza AD, Ezeiza N, Oronoz M, Pérez A, Perez-de-Viñaspre O (2018)Automatic misogyny identification using neural networks. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR workshop proceedings, vol 2150, pp 249–254. CEUR-WS.org (2018). http://ceur-ws.org/Vol-2150/AMI_paper4.pdf

  • Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org

  • Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, Smith NA (2020) Don’t stop pretraining: adapt language models to domains and tasks. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 8342–8360. https://doi.org/10.18653/v1/2020.acl-main.740

  • INEGI: Modulo sobre Ciberacoso 2020. INEGI. https://www.inegi.org.mx/contenidos/saladeprensa/boletines/2021/EstSociodemo/MOCIBA-2020.pdf

  • Kemp S (2023) Datareportal: digital 2023 global overview report. https://datareportal.com/reports/digital-2023-global-overview-report

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv. https://doi.org/10.48550/ARXIV.1412.6980

  • Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv. https://doi.org/10.48550/ARXIV.1711.05101

  • Manne K (2017) Down girl: the logic of Misogyny. Oxford University Press, Oxford

    Book  Google Scholar 

  • Nina-Alcocer V (2018) AMI at ibereval2018 automatic misogyny identification in Spanish and English tweets. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR Workshop Proceedings, vol 2150, pp 274–279. CEUR-WS.org. http://ceur-ws.org/Vol-2150/AMI_paper8.pdf

  • Observatorio Nacional de la Violencia Contra las Mujeres y los Integrantes del Grupo Familiar (2021) Datos y evidencias sobre violencia hacia las mujeres e integrantes del grupo familiar, según fuente de información. https://observatorioviolencia.pe/datos/

  • Otterbacher J, Bates J, Clough P (2017) Competent men and warm women: gender stereotypes and backlash in image search results. In: Proceedings of the 2017 CHI conference on human factors in computing systems (CHI’17). Association for Computing Machinery, New York, NY, USA, pp 6620–6631. https://doi.org/10.1145/3025453.3025727

  • Pamungkas EW, Basile V, atti V (2020) Misogyny detection in twitter: a multilingual and cross-domain study. Inf Process Manag 57(6):102360. https://doi.org/10.1016/j.ipm.2020.102360

    Article  Google Scholar 

  • Pamungkas EW, Cignarella AT, Basile V, Patti V (2020) 14-ExLab@UniTo for AMI at ibereval2018: exploiting lexical knowledge for detecting misogyny in English and Spanish tweets. In: Rosso P, Gonzalo J, Martínez R, Montalvo S, de Albornoz JC (eds) Proceedings of the third workshop on evaluation of human language technologies for Iberian languages (IberEval 2018) co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN 2018), Sevilla, Spain, September 18th, 2018. CEUR workshop proceedings, vol 2150, pp 234–241. CEUR-WS.org. http://ceurws.org/Vol2150/AMI_paper2.pdf

  • Plaza-Del-Arco F-M, Molina-González MD, Ureña-López LA, Martín-Valdivia MT (2020) Detecting misogyny and xenophobia in Spanish tweets using language technologies. ACM Trans Internet Technol. https://doi.org/10.1145/3369869

  • Plaza-del-Arco FM, Molina-González MD, Ureña-López LA, Martín-Valdivia MT (2021) Comparing pre-trained language models for Spanish hate speech detection. Expert Syst with Appl 166:114120. https://doi.org/10.1016/j.eswa.2020.114120

  • Posetti J, Aboulez N, Bontcheva K, Harrison J, Waisbord S (2020) Online violence against women journalists: a global snapshot of incidence and impacts. UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000375136

  • Rodríguez DA, Díaz-Ramírez A, Miranda-Vega JE, Trujillo L (2021) A systematic review of computer science solutions for addressing violence against women and children. IEEE Access 9:114622–114639. https://doi.org/10.1109/ACCESS.2021.3103459

    Article  Google Scholar 

  • Secretaria de Seguridad y Protección Ciudadana de México (2022) Información sobre violencia contra las mujeres Incidencia delictiva y llamadas de emergencia 9-1-1. https://drive.google.com/file/d/1jvGGrA31Q361fOuNChetkBu0pva_MGxF/view

  • Srivastava K, Chaudhury S, Bhat PS, Sahu S (2017) Misogyny, feminism, and sexual harassment. Ind Psychiatry J 26(2):111–113. https://doi.org/10.4103/ipj.ipj_32

    Article  Google Scholar 

  • Sveen W, Dewan M, Dexheimer JW (2022) The risk of coding racism into pediatric sepsis care: the necessity of antiracism in machine learning. J Pediatr 247:129–132. https://doi.org/10.1016/j.jpeds.2022.04.024

    Article  Google Scholar 

  • Taylor SJ, Muchnik L, Kumar M, Aral S (2023) Identity effects in social media. Nat Hum Behav 7(1):27–37. https://doi.org/10.1038/s41562-022-01459-8

    Article  Google Scholar 

  • Twitter I (2014) The 2014 #yearontwitter. Twitter. https://blog.twitter.com/official/en_us/a/2014/the-2014-yearontwitter.html

  • UN Women (2021) Facts and figures: ending violence against women. https://www.unwomen.org/en/what-we-do/ending-violence-against-women/facts-and-figures

  • United Nations Office for the Coordination of Humanitarian Affairs (2020a) A double pandemic: gender-based violence in Latin America and the early experience of women during Covid-19. https://bit.ly/3I6UQOE

  • United Nations Office for the Coordination of Humanitarian Affairs (2020b) Surge in violence against girls and women in Latin America and Caribbean. https://bit.ly/3O8GIbC

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L.u, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. http://arxiv.org/abs/1706.03762

  • Vogels EA (2021) The state of online harassment. Technical report, Pew Research Center. https://www.pewresearch.org/internet/2021/01/13/the-state-of-online-harassment/

  • World Health Organization (2021) Violence against women. https://www.who.int/news-room/fact-sheets/detail/violence-against-women

  • Zou J, Schiebinger L (2018) Ai can be sexist and racist—it’s time to make it fair. Nature 559(7714):324–326. https://doi.org/10.1038/d41586-018-05707-8

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

D.R. and A.D-R. conceived of the presented idea. D.R. and J.D-E. developed the theory, and D.R. performed the computations with the supervision of J.D-E. and L.T. A.D-R, and J-D-E verified the analytical methods. A.D-R., J.D-E and L.T. supervised the findings of this work. D.R. wrote the manuscript with input from all authors. All authors discussed the results and contributed to the final manuscript.

Corresponding author

Correspondence to Arnoldo Díaz-Ramírez.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodríguez, D.A., Diaz-Escobar, J., Díaz-Ramírez, A. et al. Domain-adaptive pre-training on a BERT model for the automatic detection of misogynistic tweets in Spanish. Soc. Netw. Anal. Min. 13, 126 (2023). https://doi.org/10.1007/s13278-023-01128-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-023-01128-2

Keywords

Navigation