Skip to main content

Automatic Identification and Classification of Misogynistic Language on Twitter

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10859))

Abstract

Hate speech may take different forms in online social media. Most of the investigations in the literature are focused on detecting abusive language in discussions about ethnicity, religion, gender identity and sexual orientation. In this paper, we address the problem of automatic detection and categorization of misogynous language in online social media. The main contribution of this paper is two-fold: (1) a corpus of misogynous tweets, labelled from different perspective and (2) an exploratory investigations on NLP features and ML models for detecting and classifying misogynistic language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/.

  2. 2.

    The dataset has been made available for the IberEval-2018 (https://amiibereval2018.wordpress.com/) and the EvalIta-2018 (https://amievalita2018.wordpress.com/) challenges.

  3. 3.

    https://www.fredericgodin.com/software/.

  4. 4.

    We employed the machine learning package scikit-learn: http://scikit-learn.org/stable/supervised_learning.html.

  5. 5.

    When training the considered classifiers, we didn’t apply any feature filtering or parameter tuning.

  6. 6.

    Results obtained with All Features are statistically significant (Student t-test with p-value equal to 0.05).

References

  1. Argamon, S., Whitelaw, C., Chase, P., Hota, S.R., Garg, N., Levitan, S.: Stylistic text classification using functional lexical features: research articles. J. Am. Soc. Inf. Sci. Technol. 58(6), 802–822 (2007)

    Article  Google Scholar 

  2. Byrt, T., Bishop, J., Carlin, J.B.: Bias, prevalence and kappa. J. Clin. Epidemiol. 46(5), 423–429 (1993)

    Article  Google Scholar 

  3. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37–46 (1960)

    Article  Google Scholar 

  4. Fulper, R., Ciampaglia, G.L., Ferrara, E., Ahn, Y., Flammini, A., Menczer, F., Lewis, B., Rowe, K.: Misogynistic language on Twitter and sexual violence. In: Proceedings of the ACM Web Science Workshop on Computational Approaches to Social Modeling (ChASM) (2014)

    Google Scholar 

  5. HaCohen-Kerner, Y., Beck, H., Yehudai, E., Rosenstein, M., Mughaz, D.: Cuisine: classification using stylistic feature sets and/or name-based feature sets. J. Assoc. Inf. Sci. Technol. 61(8), 1644–1657 (2010)

    Google Scholar 

  6. HaCohen-kerner, Y., Ido, Z., Ya’akobov, R.: Stance classification of tweets using skip char Ngrams. In: Altun, Y., Das, K., Mielikäinen, T., Malerba, D., Stefanowski, J., Read, J., Žitnik, M., Ceci, M., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10536, pp. 266–278. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71273-4_22

    Chapter  Google Scholar 

  7. Hewitt, S., Tiropanis, T., Bokhove, C.: The problem of identifying misogynist language on Twitter (and other online social spaces). In: Proceedings of the 8th ACM Conference on Web Science, pp. 333–335. ACM, May 2016

    Google Scholar 

  8. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 12th International AAAI Conference on Web and Social Media (2017)

    Google Scholar 

  9. Megarry, J.: Online incivility or sexual harassment? Conceptualising women’s experiences in the digital age. In: Women’s Studies International Forum, vol. 47, pp. 46–55. Pergamon (2014)

    Article  Google Scholar 

  10. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196, January 2014

    Google Scholar 

  11. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  12. Parker, R.I., Vannest, K.J., Davis, J.L.: Effect size in single-case research: a review of nine nonoverlap techniques. Behav. Modif. 35(4), 303–322 (2011)

    Article  Google Scholar 

  13. Poland, B.: Haters: Harassment, Abuse, and Violence Online. University of Nebraska Press, Lincoln (2016)

    Book  Google Scholar 

  14. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (2010)

    Google Scholar 

  15. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Valencia, Spain, pp. 1–10 (2017)

    Google Scholar 

  16. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)

    Article  Google Scholar 

  17. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In: SRW@ HLT-NAACL, pp. 88–93 (2016)

    Google Scholar 

Download references

Acknowledgements

The work of the third author was partially funded by the Spanish MINECO under the research project SomEMBED (TIN2015-71147-C2-1-P).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elisabetta Fersini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anzovino, M., Fersini, E., Rosso, P. (2018). Automatic Identification and Classification of Misogynistic Language on Twitter. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91947-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91946-1

  • Online ISBN: 978-3-319-91947-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics