A Study of Text Representations for Hate Speech Detection

Themeli, Chrysoula; Giannakopoulos, George; Pittaras, Nikiforos

doi:10.1007/978-3-031-24340-0_32

Chrysoula Themeli^8,9,
George Giannakopoulos^9,10 &
Nikiforos Pittaras^8,9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13452))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

378 Accesses
1 Citations

Abstract

The pervasiveness of the Internet and social media have enabled the rapid and anonymous spread of Hate Speech content on microblogging platforms such as Twitter. Current EU and US legislation against hateful language, in conjunction with the large amount of data produced in these platforms has led to automatic tools being a necessary component of the Hate Speech detection task and pipeline. In this study, we examine the performance of several, diverse text representation techniques paired with multiple classification algorithms, on the automatic Hate Speech detection and abusive language discrimination task. We perform an experimental evaluation on binary and multiclass datasets, paired with significance testing. Our results show that simple hate-keyword frequency features (BoW) work best, followed by pre-trained word embeddings (GLoVe) as well as N-gram graphs (NGGs): a graph-based representation which proved to produce efficient, very low-dimensional but rich features for this task. A combination of these representations paired with Logistic Regression or 3-layer neural network classifiers achieved the best detection performance, in terms of micro and macro F-measure.

Supported by NCSR Demokritos, and the Department of Informatics and Telecommunications, National and Kapodistrian University of Athens.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 759–760. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Bourgonje, P., Moreno-Schneider, J., Srivastava, A., Rehm, G.: Automatic classification of abusive language and personal attacks in various forms of online communication. In: Rehm, G., Declerck, T. (eds.) GSCL 2017. LNCS (LNAI), vol. 10713, pp. 180–191. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73706-5_15
Chapter Google Scholar
Brown, A.: What is hate speech? part 1: the myth of hate. Law Phil. 36(4), 419–468 (2017)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Article Google Scholar
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. arXiv preprint arXiv:1703.04009 (2017)
Del Vigna12, F., Cimino23, A., Dell’Orletta, F., Petrocchi, M., Tesconi, M.: Hate me, hate me not: hate speech detection on facebook (2017)
Google Scholar
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, pp. 29–30. ACM (2015)
Google Scholar
Fix, E., Hodges, J.L., Jr.: Discriminatory analysis-nonparametric discrimination: Small sample performance. CALIFORNIA UNIV BERKELEY, Technical report (1952)
Google Scholar
Giannakopoulos, G.: Automatic Summarization from Multiple Documents. Ph.D. thesis, University of the Aegean (2009). http://www.iit.demokritos.gr/~ggianna/thesis.pdf
Giannakopoulos, G., Karkaletsis, V., Vouros, G.A.: Testing the use of n-gram graphs in summarization sub-tasks. In: TAC (2008)
Google Scholar
Giannakopoulos, G., Mavridi, P., Paliouras, G., Papadakis, G., Tserpes, K.: Representation models for text classification: a comparative analysis over three web document types. In: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics, p. 13. ACM (2012)
Google Scholar
Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiq. Eng. 10(4), 215–230 (2015)
Article Google Scholar
Kwok, I., Wang, Y.: Locate the hate: detecting tweets against blacks. In: AAAI (2013)
Google Scholar
Lewis, D.D.: Naive (Bayes) at forty: the independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026666
Chapter Google Scholar
Liaw, A., Wiener, M., et al.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Google Scholar
McCullagh, P., Nelder, J.A.: Generalized Linear Models, vol. 37. CRC Press, Boca Raton (1989)
Book Google Scholar
Menard, S.W.: Applied logistic regression analysis. No. 04; e-book (1995)
Google Scholar
Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, pp. 52–56 (2017)
Google Scholar
Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, pp. 145–153. International World Wide Web Conferences Steering Committee (2016)
Google Scholar
Papadakis, G., Giannakopoulos, G., Paliouras, G.: Graph vs. bag representation models for the topic classification of web documents. World Wide Web 19(5), 887–920 (2016)
Article Google Scholar
Park, J.H., Fung, P.: One-step and two-step classification for abusive language detection on twitter. arXiv preprint arXiv:1706.01206 (2017)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Russell, S., Norvig, P., Intelligence, A.: A modern approach. Artif. Intell. 25(27), 79–80 (1995)
Google Scholar
Saleem, H.M., Dillon, K.P., Benesch, S., Ruths, D.: A web of hate: tackling hateful speech in online social spaces. arXiv preprint arXiv:1709.10159 (2017)
Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10 (2017)
Google Scholar
Silva, L.A., Mondal, M., Correa, D., Benevenuto, F., Weber, I.: Analyzing the targets of hate in online social media. In: ICWSM, pp. 687–690 (2016)
Google Scholar
Tsekouras, L., Varlamis, I., Giannakopoulos, G.: A graph-based text similarity measure that employs named entity information. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pp. 765–771 (2017)
Google Scholar
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, pp. 19–26. Association for Computational Linguistics (2012)
Google Scholar
Waseem, Z.: Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, pp. 138–142 (2016)
Google Scholar
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93 (2016)
Google Scholar
Xiang, G., Fan, B., Wang, L., Hong, J., Rose, C.: Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1980–1984. ACM (2012)
Google Scholar
Xu, Z., Zhu, S.: Filtering offensive language in online communities using grammatical relations. In: Proceedings of the Seventh Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp. 1–10 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens, Greece
Chrysoula Themeli & Nikiforos Pittaras
NCSR Demokritos, Athens, Greece
Chrysoula Themeli, George Giannakopoulos & Nikiforos Pittaras
SciFY PNPC, Athens, Greece
George Giannakopoulos

Authors

Chrysoula Themeli
View author publications
You can also search for this author in PubMed Google Scholar
George Giannakopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Nikiforos Pittaras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Chrysoula Themeli , George Giannakopoulos or Nikiforos Pittaras .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Themeli, C., Giannakopoulos, G., Pittaras, N. (2023). A Study of Text Representations for Hate Speech Detection. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13452. Springer, Cham. https://doi.org/10.1007/978-3-031-24340-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-031-24340-0_32
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24339-4
Online ISBN: 978-3-031-24340-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Study of Text Representations for Hate Speech Detection