Label modification and bootstrapping for zero-shot cross-lingual hate speech detection

Bigoulaeva, Irina; Hangya, Viktor; Gurevych, Iryna; Fraser, Alexander

doi:10.1007/s10579-023-09637-4

Label modification and bootstrapping for zero-shot cross-lingual hate speech detection

Original Paper
Open access
Published: 18 February 2023

Volume 57, pages 1515–1546, (2023)
Cite this article

Download PDF

You have full access to this open access article

Language Resources and Evaluation Aims and scope Submit manuscript

Label modification and bootstrapping for zero-shot cross-lingual hate speech detection

Download PDF

2386 Accesses
2 Citations
Explore all metrics

Abstract

The goal of hate speech detection is to filter negative online content aiming at certain groups of people. Due to the easy accessibility and multilinguality of social media platforms, it is crucial to protect everyone which requires building hate speech detection systems for a wide range of languages. However, the available labeled hate speech datasets are limited, making it difficult to build systems for many languages. In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages, while highlighting label issues across application scenarios, such as inconsistent label sets of corpora or differing hate speech definitions, which hinder the application of such methods. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply them to the target language, which lacks labeled examples, and show that good performance can be achieved. We then incorporate unlabeled target language data for further model improvements by bootstrapping labels using an ensemble of different model architectures. Furthermore, we investigate the issue of label imbalance in hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance. We test simple data undersampling and oversampling techniques and show their effectiveness.

Challenges of Hate Speech Detection in Social Media

Article Open access 13 February 2021

Exploration of Multi-corpus Learning for Hate Speech Classification in Low Resource Scenarios

Bridging the Gaps: Multi Task Learning for Domain Transfer of Hate Speech Detection

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Due to the increased digitization of society, the impact of online discourse on everyday life is becoming more pronounced. A single hateful message shared on social media now has the potential to incite violent offline movements, as well as exert a negative emotional impact on millions of readers. For this reason, platforms such as Twitter and Facebook have created community policies to ensure civil conduct on the part of their users. The goal is to filter hate speech, which unlike mere offensive or vulgar content, is exclusively designed to attack or denigrate entire groups of people and has a damaging effect on communities. But with the sheer amount of posts being published, it is becoming difficult for humans to moderate them in a complete and timely manner. Different moderators are also not guaranteed to agree on every decision, even in the presence of well-defined classification guidelines. Moreover, due to their repeated and prolonged exposure to negative content, many moderators experience a decline in mental health Vidgen and Derczynski (2020). For these reasons, automatic hate speech detection has become a field of high interest.

In general, the task of classifying hate speech has been acknowledged as difficult de Gibert et al. (2018). One reason is data scarcity: there are currently few public hate speech datasets available, and the majority of them are for English. Thus, building systems for lower-resource languages is even more challenging Vidgen and Derczynski (2020). An additional difficulty of the task is the need to precisely define hate speech. While many people have an intuitive understanding of what hate speech is, this does not easily translate to a finite set of characteristics that can be used as annotation guidelines. Additionally, many hate speech datasets deal with specific hate speech subtypes, such as hate speech only against refugees, women or certain nationalities, which leads to stark differences between the content of their hate speech classes and making the available resources for a given set of hate speech subtypes in a low-resource language even scarcer.

It is therefore our aim to examine a cross-lingual setup, in which available hate speech resources from a higher-resource language are exploited. We address data scarcity in German, a generally high-resource language but a language for which there are not yet many hate speech datasets available (only a small number of datasets are available compared to English most of which differ in their label sets Vidgen and Derczynski (2020)). Our method is applied in a zero-shot setup that assumes no annotated training data in German. We develop a cross-lingual transfer learning approach based on cross-lingual word embeddings (CLWEs) and neural classifiers to provide access to hate speech data in English. We rely on a widely-used English dataset de Gibert et al. (2018) as our source-language data and the German dataset of the 2018 GermEval Shared Task on the Identification of Offensive Language Ruppenhofer et al. (2018) as our target language data in our experiments. As is often the case with hate speech datasets, the annotation schemas of these two datasets do not fully correspond. Therefore, as we discuss later, we modify their annotation using a few simple rules to ensure label compatibility.

In addition to training only on English, we leverage further data to improve our systems. Towards this end, we bootstrap on two unlabeled German datasets, one of which we crawled from the web. Using an ensemble of our cross-lingual models we predict the labels of previously-unseen data and assign labels with majority voting. We then use this bootstrapped data to further fine-tune the English-trained models. We find that for the majority of our architectures, cross-lingual performance after fine-tuning improves scores within the hate speech class as well as macro-average scores.

Since the majority of social media content is non-hateful, the datasets’ label distributions are skewed towards the no-hate label. Such class imbalances often lead to training issues, especially in case of small training corpora. For this reason we perform a series of additional experiments to test the impact of class ratio on model performance. We create several over- and undersampled versions of our training sets and compare the models’ performance. Our results suggest that severe class imbalance is indeed a problem, but that the best method to overcome it depends on the dataset size.

In sum, our work contributes by addressing three issues in zero-shot cross-lingual hate speech detection: (1) hate speech definition incompatibilities across resources, (2) data scarcity and (3) class imbalance. Regarding hate speech definition, we select compatible datasets and employ manual label modification. Regarding data scarcity, we pursue a cross-lingual setup in which we use English labeled data only to detect hate speech in German. Furthermore, we show that performance can be improved by leveraging unlabeled German sentences. Regarding class imbalance, we show that the imbalanced distributions of hate speech datasets can be compensated with sampling techniques, but that the optimal technique to use may depend on dataset size.

Similar methods have been applied in other tasks and have been used in other hate speech detection setups; however, to the best of our knowledge, no works on hate speech detection apply these methods in a zero-shot, cross-lingual setting.

2 Previous work

In this section we give an overview of previous work that addresses the three aforementioned issues of hate speech definition, data scarcity, and class imbalance.

2.1 Hate speech definitions

For as long as hate speech detection has been an area of interest, a multitude of terminologies have been associated with it. Schmidt and Wiegand (2017) note that the earliest work on the phenomenon did not use the term “hate speech” at all, but rather “abusive”, “hostile”, and “flames”. However, despite the vast amount of work that has since been done on detecting hate speech, the term still lacks a universally-accepted definition. In particular, Davidson et al. (2017) observe that the concept of “hate” was previously often conflated with the concept of “offensiveness”, and though more recent works tend to treat hate speech as a subtype of generally-offensive language Wiegand et al. (2018b); Gröndahl et al. (2018); Zampieri et al. (2019), ambiguities and inconsistencies regarding terminology use are still prevalent. The three datasets of HASOC Majumder et al. (2019) distinguish between the categories “Hate Speech” and “Offensive”, the difference being that the former is directed against a group while the latter is directed against an individual. On the other hand, the GermEval2018 dataset of Wiegand et al. (2018b) employs a hierarchical taxonomy, where the label “Offensive” is used as an umbrella term that includes “Abuse”, which is characterized as a “particularly strong form of offensive language” and bears resemblance to the concept of hate speech. Waseem et al. (2017b) also use the term “abuse” rather than “hate speech” in their analysis of contemporary datasets, and underscore the importance of distinguishing the target of abuse, as well as whether the abuse is implicit or explicit. This inspired the OLID taxonomy of Zampieri et al. (2019), which likewise does not use the term “hate speech” as a category label. Instead, the OLID dataset uses the label “Offensive”, which was likened to the “Offensive” category found in the GermEval2018 dataset Wiegand et al. (2018b). However, while the authors of OLID use the term “abuse” in their discussion, and the GermEval dataset contains a category named “Abuse”, these two terms are not implied to have similar meanings. Rather, the term “abuse” in the discussion of Zampieri et al. (2019) is meant to correspond to the label of “Offensive” in their dataset, which in the GermEval2018 dataset would include the label “Abuse” as a subset. Regarding the term “hate speech”, although Zampieri et al. (2019) do not use it as a category label, they nevertheless note that the concept fits into their three-level taxonomy as speech that is (1) offensive, (2) a targeted insult, and (3) targeted against a group.

From these datasets alone, it is clear that there are significant nuances and inconsistencies regarding the use of hate-related terminology. In addition, there are many other terms that are employed in connection to hate speech detection, oftentimes in the context of related, but separate tasks. Fortuna and Nunes (2018) offer a comparison of nine such terms, such as “cyberbullying”, “discrimination”, “flaming”, “toxic language”, and “abusive language”, with explanations of how these concepts differ from the concept of hate speech itself.

Hate speech datasets also differ in annotation schema, which is shown in recent surveys Vidgen and Derczynski (2020); Poletto et al. (2021); Pamungkas et al. (2021a). This variety is due to the multifaceted nature of hate speech, as it can be directed against individuals or groups, be implicit or explicit, and have varying themes such as race, gender, or disability. Quite often, it is seen as advantageous to focus on classifying finer-grained categories than to attempt a binary classification task, where there might be too much variation Poletto et al. (2021). There are datasets whose annotation schemas distinguish between racism and sexism, as well as datasets specific to certain target groups. The HatEval dataset Basile et al. (2019) gathers 13,000 English and 6600 Spanish tweets where the targets of hate speech are either immigrants or women. All tweets with the label “Hateful” must have one of these two targets. The dataset of Bretschneider and Peters (2017) views hate speech as “offensive statements” that express “fear and aggression”, and collects statements of this nature that are directed against foreigners. Meanwhile, hate speech exclusively against refugees and Muslims is the focus of Ross et al. (2016). The dataset of Davidson et al. (2017) defines hate speech as a statement that “expresses hatred towards a targeted group or is intended to be derogatory, to humiliate or to insult members of the group”. The three datasets of HASOC Majumder et al. (2019) do not focus on one particular target and contain a diverse set of sentences labeled as “Hate Speech”. The previously-mentioned OLID dataset of Zampieri et al. (2019) employs a multi-tiered annotation schema that distinguishes on one level whether or not a tweet is “Offensive”, then the type of offensiveness it contains, and finally the target of offensiveness.

Tables 1 illustrates the differences in the taxonomies of various datasets and the contradictory annotations that can arise as a result.^{Footnote 1} First are sentences 1 and 2, which both direct vulgar language at female politicians. However, Sentence 1 was given the label “Hateful” in accordance with the annotation principles of the HatEval dataset, while Sentence 2 from the dataset of Ross et al. (2016) was given a binary “No” label that signifies the absence of hate speech. Sentences 3 and 4 both direct insults against individuals, however Sentence 3 was annotated as “Hate Speech”, while Sentence 4 was not considered to be hate speech. Sentences 5 and 6 both make statements against the media, which is also a group of people. However, while the GermEval dataset’s label for such a sentence is “Abuse”, the Stormfront dataset labels such a sentence as “Hate”.

Table 1 Sentences of similar type carrying different class labels in different datasets. Label names are given as occurred in the datasets. German examples are translated to the best of the authors’ ability

Label modification and bootstrapping for zero-shot cross-lingual hate speech detection

Abstract

Similar content being viewed by others

Challenges of Hate Speech Detection in Social Media

Exploration of Multi-corpus Learning for Hate Speech Classification in Low Resource Scenarios

Bridging the Gaps: Multi Task Learning for Domain Transfer of Hate Speech Detection

1 Introduction

2 Previous work

2.1 Hate speech definitions

2.2 Hate speech data scarcity and cross-lingual transfer

2.3 Class imbalance of hate speech datasets

3 Experimental setup

3.1 Datasets

3.1.1 Annotation discrepancies

3.1.2 Addressing class imbalance

3.2 Models

4 Results

4.1 Bootstrapping

4.1.1 Bootstrapping on DE-TRAIN

4.1.2 Bootstrapping on German stormfront data

4.2 Data sampling experiments

5 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation