Skip to main content

A Study of Text Representations for Hate Speech Detection

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Abstract

The pervasiveness of the Internet and social media have enabled the rapid and anonymous spread of Hate Speech content on microblogging platforms such as Twitter. Current EU and US legislation against hateful language, in conjunction with the large amount of data produced in these platforms has led to automatic tools being a necessary component of the Hate Speech detection task and pipeline. In this study, we examine the performance of several, diverse text representation techniques paired with multiple classification algorithms, on the automatic Hate Speech detection and abusive language discrimination task. We perform an experimental evaluation on binary and multiclass datasets, paired with significance testing. Our results show that simple hate-keyword frequency features (BoW) work best, followed by pre-trained word embeddings (GLoVe) as well as N-gram graphs (NGGs): a graph-based representation which proved to produce efficient, very low-dimensional but rich features for this task. A combination of these representations paired with Logistic Regression or 3-layer neural network classifiers achieved the best detection performance, in terms of micro and macro F-measure.

Supported by NCSR Demokritos, and the Department of Informatics and Telecommunications, National and Kapodistrian University of Athens.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://nlp.stanford.edu/projects/glove/.

  2. 2.

    https://github.com/cthem/hate-speech-detection.

  3. 3.

    https://github.com/igorbrigadir/stopwords.

  4. 4.

    https://github.com/t-davidson/hate-speech-and-offensive-language.

  5. 5.

    https://nlp.stanford.edu/software/lex-parser.html.

  6. 6.

    https://github.com/ZeerakW/hatespeech.

  7. 7.

    https://github.com/t-davidson/hate-speech-and-offensive-language.

References

  1. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)

    Google Scholar 

  2. Bourgonje, P., Moreno-Schneider, J., Srivastava, A., Rehm, G.: Automatic classification of abusive language and personal attacks in various forms of online communication. In: Rehm, G., Declerck, T. (eds.) GSCL 2017. LNCS (LNAI), vol. 10713, pp. 180–191. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73706-5_15

    Chapter  Google Scholar 

  3. Brown, A.: What is hate speech? part 1: the myth of hate. Law Phil. 36(4), 419–468 (2017)

    Article  Google Scholar 

  4. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    Article  Google Scholar 

  5. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. arXiv preprint arXiv:1703.04009 (2017)

  6. Del Vigna12, F., Cimino23, A., Dell’Orletta, F., Petrocchi, M., Tesconi, M.: Hate me, hate me not: hate speech detection on facebook (2017)

    Google Scholar 

  7. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, pp. 29–30. ACM (2015)

    Google Scholar 

  8. Fix, E., Hodges, J.L., Jr.: Discriminatory analysis-nonparametric discrimination: Small sample performance. CALIFORNIA UNIV BERKELEY, Technical report (1952)

    Google Scholar 

  9. Giannakopoulos, G.: Automatic Summarization from Multiple Documents. Ph.D. thesis, University of the Aegean (2009). http://www.iit.demokritos.gr/~ggianna/thesis.pdf

  10. Giannakopoulos, G., Karkaletsis, V., Vouros, G.A.: Testing the use of n-gram graphs in summarization sub-tasks. In: TAC (2008)

    Google Scholar 

  11. Giannakopoulos, G., Mavridi, P., Paliouras, G., Papadakis, G., Tserpes, K.: Representation models for text classification: a comparative analysis over three web document types. In: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics, p. 13. ACM (2012)

    Google Scholar 

  12. Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiq. Eng. 10(4), 215–230 (2015)

    Article  Google Scholar 

  13. Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: AAAI (2013)

    Google Scholar 

  14. Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666

    Chapter  Google Scholar 

  15. Liaw, A., Wiener, M., et al.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)

    Google Scholar 

  16. McCullagh, P., Nelder, J.A.: Generalized Linear Models, vol. 37. CRC Press, Boca Raton (1989)

    Book  Google Scholar 

  17. Menard, S.W.: Applied logistic regression analysis. No. 04; e-book (1995)

    Google Scholar 

  18. Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, pp. 52–56 (2017)

    Google Scholar 

  19. Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)

    Google Scholar 

  20. Papadakis, G., Giannakopoulos, G., Paliouras, G.: Graph vs. bag representation models for the topic classification of web documents. World Wide Web 19(5), 887–920 (2016)

    Article  Google Scholar 

  21. Park, J.H., Fung, P.: One-step and two-step classification for abusive language detection on twitter. arXiv preprint arXiv:1706.01206 (2017)

  22. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  23. Russell, S., Norvig, P., Intelligence, A.: A modern approach. Artif. Intell. 25(27), 79–80 (1995)

    Google Scholar 

  24. Saleem, H.M., Dillon, K.P., Benesch, S., Ruths, D.: A web of hate: tackling hateful speech in online social spaces. arXiv preprint arXiv:1709.10159 (2017)

  25. Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10 (2017)

    Google Scholar 

  26. Silva, L.A., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing the targets of hate in online social media. In: ICWSM, pp. 687–690 (2016)

    Google Scholar 

  27. Tsekouras, L., Varlamis, I., Giannakopoulos, G.: A graph-based text similarity measure that employs named entity information. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 765–771 (2017)

    Google Scholar 

  28. Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, pp. 19–26. Association for Computational Linguistics (2012)

    Google Scholar 

  29. Waseem, Z.: Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, pp. 138–142 (2016)

    Google Scholar 

  30. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93 (2016)

    Google Scholar 

  31. Xiang, G., Fan, B., Wang, L., Hong, J., Rose, C.: Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1980–1984. ACM (2012)

    Google Scholar 

  32. Xu, Z., Zhu, S.: Filtering offensive language in online communities using grammatical relations. In: Proceedings of the Seventh Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp. 1–10 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chrysoula Themeli , George Giannakopoulos or Nikiforos Pittaras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Themeli, C., Giannakopoulos, G., Pittaras, N. (2023). A Study of Text Representations for Hate Speech Detection. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24340-0_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24339-4

  • Online ISBN: 978-3-031-24340-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics