1 Introduction

Abusive language is becoming a relevant issue in social media platforms such as Facebook and Twitter. The rise of the phenomenon is also due to the anonymity given to users and to the lack of effective regulation provided by these platforms. On the one hand, social media provide a facility for improving connectedness between people with their relations. On the other hand, this facility is often exploited to propagate toxic content such as hate speech or other forms of abusive language. Given the current rate of user-generated content produced in every minute, manually monitoring abusive behavior in social media is impractical. Facebook and Twitter also made efforts to eliminate abusive content from their platformsFootnote 1 by providing clear policies on hateful conductsFootnote 2, implementing user report mechanisms, and employing content moderators to filter the abusive posting. However, these efforts are not a scalable and long-term solution to this problem.

Several studies from the Natural Language Processing (NLP) field have been done to tackle the problem of abusive language in social media. Most studies proposed a supervised approach to detect abusive content automatically using various models ranging from traditional machine learning approaches to recent neural-based approaches. Moreover, the majority of current studies only focused on a single language, i.e., English, and a single abusive language phenomenon, e.g., hate speech, sexism, racism, and so on, rather than multiple phenomena and how they are interconnected. However, abusive language in social media is not limited to specific languages, and it features multiple abusive phenomena. As a matter of fact, the most popular social media, such as Twitter and Facebook, are multilingual, as users are encouraged to express themselves spontaneously in their mother tongue, and online social conversations are characterized with multiple different topics. Therefore, in a variety of languages and contexts there is a considerable urgency to prevent online hate speech from spreading virally, becoming a significant factor in grave crimes against minorities or vulnerable categories. Specifically, robust approaches are needed for abusive language detection in a multidomain and multilingual environment, which will also enable the implementation of effective tools that could be employed to support both monitoring and content moderation activities such as automatic moderation and flagging of potentially hateful users and posts, also for guaranteeing a better compliance to governments demands to counteract the phenomenon [37]. A few works initiated cross-domain and cross-lingual studies on abusive language detection to tackle these aspects of the problem [48, 64, 94, 134]. However, some difficulties and issues still remain to obtain a robust model to detect abusive language across different domains and languages.

In this paper, we summarize the recent development of studies in the detection of abusive language in social media across domains and languages. Through this survey, we present a systematic overview of research conducted in this research area, providing a comprehensive view of the state of the art and datasets that are centered on this area. Our main objective is to draw a conclusion on the state of the art and to provide several possible opportunities for future work based on the existing open problems.

After the introduction, in Section 2, we describe several previous surveys on abusive language detection and related topics. We discuss the existing studies in multidomain abusive language detection, including a review of available datasets that could be exploited for this task in Section 3. Section 4 presents a comprehensive review of multilingual abusive language detection studies which covers state-of-the-art approaches and available datasets. An analysis of challenges and opportunities for this particular task in future work is discussed in Section 5. Finally, Section 6 presents conclusive remarks of this survey.

2 Related work

Few recent works focused on analyzing the current challenges in abusive language detection task based on existing works. Jurgens et al. [63] presented a position paper that outlines current challenges to fight online abuse and proposes several strategies to address them. They argued that most existing studies only focus on a narrow definition of abuse, and expanding the problem scope is needed in order to deal with more subtle but serious forms of abusive behavior, such as microaggressions. Secondly, they opined that we need to develop proactive technologies to counter abusive in the future, rather than only focus on the automatic detection perspective. Finally, they postulated that the community should take a role in contextualizing its effort inside the broader framework of justice, including explicit capabilities, restorative justice, and procedural justice, to support and promote a healthier community. Another work by Vidgen et al. [137] presented challenges and frontiers in abusive content detection. They outlined several challenges of the abusive content detection task from three different perspectives. From a research point of view, there are three challenges: the difficulties in categorizing abusive content, recognizing abusive content, and accounting for context. The dataset creation and distribution as well as ethical issues are the main challenges from the community perspective. They also outlined challenges based on research frontiers, which cover several issues in multimedia content that are not yet much explored, implementation of fairness and explainability, and cross-domain applications. MacAveney et al. [71] outlined and explored the current challenges of hate speech detection tasks in text. To understand the problem, they proposed a straightforward multi-view SVM approach to provide better interpretability than more complex neural models. Based on the experiment, they found two remaining issues in hate speech detection in the text, namely (i) the change of perspectives towards topic or issue over time; (ii) hate speech detection is a closed-loop system that only focuses on the current characteristics of the phenomenon, while the spreader of hate speech always looks at ways to outsmart the system.

The scientific study of abusive language, especially in the NLP field, has been growing incredibly fast in the last five years. The work of Schimdt and Wiegand [125] was the first study to provide a short, comprehensive, and systematic overview of hate speech detection tasks. This work presented what has been done so far in the hate speech detection task, focusing on the feature extraction approach. However, they also have several dedicated sections to describe bullying, classification approaches, available datasets with their annotation procedure, and the overall challenges of hate speech detection tasks. The work of Fortuna and Nunes [43] complements the aforementioned work by providing a more in-depth critical review of this area. Firstly, they presented more detailed discussion on the definition of hate speech based on several previous proposals from other studies. They also reviewed the feature extraction approach by classifying it into generic text mining features and specific hate speech detection features. A complete description of available datasets, including their collection and annotation approaches, was also provided. Finally, they outlined challenges and opportunities as outcome of their study, to provide better insight into future research development. Mishra et al. [76] also aimed to provide a comprehensive view on online abuse detection tasks. This study outlined the existing datasets and reviews the approaches to deal with this issue, including analyzing their strengths and weaknesses. In their conclusions, they highlighted the remaining challenges in the field and provide insights for future development: (i) the study of abusive language detection is still only focusing on specific languages and also specific abusive phenomena; (ii) most current approaches are vulnerable to the obfuscation of words; (iii) the difficulty to deal with the implicit abuse; (iv) the ever-changing nature of abusive phenomena makes the detection of new emerging phenomena difficult.

If we focus on the topic of resources for the detection of abusive phenomena in particular, there are two very recent survey studies, providing a critical review of the available resources, datasets and benchmark corpora for abusive language detection. Vidgen and Derczynski [136] presented a critical analysis of available abusive language datasets by discussing the goals underlying their development, the introduced taxonomies, and the annotation procedure. They also elaborated on the different ways to share datasets, including the introduction of the website https://hatespeechdata.com/, which is meant as a constantly updated catalogue of datasets annotated for hate speech, online abuse, and offensive language. Finally, they presented best practices for creating abusive language datasets based on the findings of the study. Similarly, Poletto et al. [103] provided a systematic review of resources and benchmarks for hate speech detection tasks. They described different strategies to develop datasets for hate speech detection based on five comparison perspectives, including type, topical focus, data source, annotation procedure, and language. They also provided an overview of all available resources for hate speech detection tasks based on their type, which covers corpora, resources released for shared tasks, and also lexica. Finally, they introduced a reflection on the impact of keywords used to collect the data when creating the hate speech corpora. Overall, these recent surveys on language resources capture and underline a great availability of benchmark datasets for the evaluation of abusive language and hate speech detection systems in several languages and with several topical focuses. The take away message is that such availability lays the foundation to address the urgent challenge of investigating architectures which are stable and well-performing across different languages and abusive domains. However, none of these works cover the multilingual and multidomain perspective and the related challenges specifically and extensively, while this is the main issue we address in the current work, with the main aim to develop a roadmap for scholars active in the field and a compass for future work.

3 Multidomain abusive language detection

Abusive language behavior is multifaceted and available datasets are characterized by different topical focuses. Abusive language is generally used as an umbrella term [143], covering several sub-categories, such as cyberbullying [55, 131], hate speech [33, 144], toxic comments [150], offensive language [152] and online aggression [67]. Several datasets have been proposed having different topical focuses, e.g., misogyny, racism, sexism, and so on, and sourced from different platforms, e.g., Facebook and Twitter. Most studies in this area also tend to focus on one topical focus, which makes difficult to quantify whether a model or feature set which perform well in one dataset is transferrable to other datasets [125, 145].

However, the abusive language phenomena are not constrained to one particular topical focus and platform. Therefore, having a robust model to detect abusive language across different topical focuses and platforms is important. Some existing studies proposed cross-domain abusive language detection [48, 64, 94, 134]. A model is trained on one specific dataset with a specific domain and tested in another dataset with a different domain. In this study, the domain term is used to describe both topical focuses and platforms. It has been stated that ensuring that a model can detect abusive language across different domains is one of the main challenges and an important frontier [137]. The cross-domain setting is also explored by Wiegand et al. [146] to prevent bias contained in the training data, as they experimentally found several biases in currently popular abusive language datasets, including topic bias and author bias. In this section, we discuss recent studies on cross-domain abusive language detection. We review available datasets that could be exploited for this task, focusing on English. Furthermore, we also describe several approaches that have been proposed in this research direction.

3.1 What datasets are available for multidomain abusive language detection?

In this section, we collect information about the available datasets from existing studies on abusive language detection across different domains. Several previous works in abusive language detection defined a domain as a topical focus [94, 134], such as hate speech, cyberbullying, and offensiveness. In contrast, some others describe it as platforms [48, 64] such as Twitter, Facebook, and Youtube. We select English datasets by focusing on topical focus and platform variety. The collection of abusive language datasets in languages other than English is also available in Section 4. We mainly extract this information from the two most recent survey studies on abusive language resources. First, Vidgen and Derczynski [136] provided the analysis of available training data for abusive language detection tasks and proposed best practices in creating training data of abusive language based on existing studies. Meanwhile, Poletto et al. [103] presented a more comprehensive study on resources and benchmarks available for hate speech detection tasks based on several aspects. We also add datasets from several shared tasks that were not covered by these works and a few datasets from very recent studies that were not available when these articles were published. Table 1 summarizes our findings on the available datasets for this research purpose. We discuss a more in-depth comparison between datasets and other aspects we need to consider when using these datasets for multidomain abusive language study based on existing works in the following.

Table 1 Summarization of available abusive language dataset across different topical focuses and sources (English only)

Topical focus

The motivation for several multidomain abusive language detection studies is to have a robust model that generalizes the problem across different topical focuses. Topical focus usually includes the addressed abusive phenomena, as well as the specific targets of the abusive behavior. However, some topics overlap with each other, i.e., misogyny and sexism or xenophobia and racism, due to a certain degree of subjectivity in defining these phenomena. The topical focus information presented in Table 1 is based on the information provided in the publications which accompany the proposed resources. However, some of these papers did not include a clear definition of the addressed phenomena. We observe that hate speech is the most covered topic by previous studies. However, on some hate speech datasets, we also discover other abusive phenomena such as offensiveness [33], racism [144], and sexism [144]. In this manner, a cross-domain abusive language detection setting means training a model on one or more topical focuses and testing it on completely different topical focuses.


Another objective of abusive language detection in the multidomain setting is to have a robust model to detect abusive content across different platforms. This task is also challenging since the available datasets are retrieved from various platforms, and every platform has different characteristics and uniqueness. Based on the information presented in Table 1, Twitter is the most studied platform for capturing the abusive phenomena. This is possibly due to the convenience of scraping tweet samples using the available Twitter API and the less strict policy on making the data publicly available. Facebook is another popular social media which becomes a data source by several studies. Other studies exploited news sites, online forums, and Youtube comments for gathering their data. Most studies used several defined keywords to query the data from the platforms mentioned above. Some of them used offensive words [32, 39, 41, 89], which are usually a strong signal of abusive content, while other studies decided to use more neutral keywords to maintain a real-world approach to the problem [11], or even both offensive and neutral keywords [152]. Some other works also exploited specific keywords related to some events that trigger abusive phenomena [144].


In Table 1, we provide information about the availability of the datasets. We manually check the published papers and mark a dataset as available when the authors explicitly mention the link to the dataset repository or state that the dataset is available for research purposes upon request. We can see that 26 out of 39 datasets were made available by their authors.Footnote 3 Most available datasets were obtained from Twitter, likely due to their policy or other regulation restricting data sharing from other sources such as Reddit, Youtube, and news sites. However, we also notice that some Twitter datasets are shared by only providing the tweet identifier [45, 144] and allow users to download them by using the publicly

available Twitter API. In this case, the number of entries could decrease due to the data decay (tweets were already deleted or are simply not available anymore).

Annotation scheme

This information is not provided in Table 1, but we perform a manual inspection regarding the annotation scheme of every dataset. Most datasets have binary labels, including abusive and not abusive class. Some other datasets have a multiclass annotation, capturing different abusive phenomena. For example, Davidson et al. [33] labeled not only the hateful tweets but also their offensiveness. Similarly, Waseem and Hovy [144] proposed to label racism and sexism separately. Some studies also proposed a finer-grained annotation scheme to capture more in-depth abusive phenomena. For example, Fersini et al. [39, 41] provided three layers of annotation to capture the misogyny phenomenon (misogyny or not), misogyny category and behavior (stereotype, dominance, derailing, sexual_harassment, and discredit), and the target of misogyny (active or passive). In a multidomain or cross-domain classification task, one of the most important steps is to unify the label annotation of every dataset. Most existing works modeled this task as a binary classification task [64, 94]. Therefore, they cast the multiclass annotation to binary annotation by combining different abusive phenomena into one class. In the case of finer-grained annotation, they only took the first layer of annotation, where the data is mainly annotated as either abusive or not abusive.

Data distribution

Data distribution also needs to be considered in the multidomain and cross-domain classification task, especially the percentage of abusive samples in the dataset. The different label distribution between training and testing sets would make the performance evaluation and comparison between systems difficult [92, 94]. Specifically, when systems are trained on skewed distributions of labels, with few examples in the abusive class, they will struggle to detect the abusive class on the test set, resulting in a higher rate of false negatives. Pamungkas et al. [93] observed that balancing the distribution in the training set improves the f1-score of the positive class significantly. Based on our investigation, the class distribution of abusive language datasets varies considerably, mostly depending on how the data is sampled and on the source of the data. However, we observe that most abusive language datasets have a lower percentage of abusive content than neutral content, with some datasets only containing less than 20% of abusive instances [45, 47, 150]. Some studies experimentally found that systems often struggle to detect the under-represented class, resulting in low f1-scores on the positive class (abusive label), which is an issue for real-world abusive language detection systems [58, 93]. Maintaining a uniform label distribution between training and test set was an approach often followed to provide a comparable evaluation in cross-domain classification [58, 134]. This approach, however, does not necessarily provide an accurate estimate of the robustness of the model in a realistic scenario, where the amount of abusive language could drastically change.

3.2 What has been done so far in multidomain abusive language detection study?

This section presents studies that have been done in abusive language detection, which focus on building robust models across different domains. We collect any publication found on Google Scholar by using four main keywords, namely “cross-domain abusive language detection”, “cross-domain hate speech detection”, “cross-platform abusive speech detection”, and “cross-platform hate speech detection”. These keywords are chosen after several observations using different keyword combinations. We limit our query to the first five pages for each keyword and sort results based on relevance, without a time filter. Furthermore, we also check each document’s cited documents and references on the first five pages to get more relevant publications. To avoid missing on the very recent works, we also exploit the same keywords on the proceeding of the last three years’ main NLP conferences on ACL Anthology platformsFootnote 4. Finally, we exclude some works which only experiment with different datasets, without any objectives and insights about domain-agnostic models. Figure 1 summarizes the methodology for the document collection in this survey study.

Fig. 1
figure 1

Documents collection methodology

We carefully read each work to obtain several key pieces of information to be discussed in this study. Table 2 summarizes the full list of works in this direction. Most studies only focused on English, and we only observe two studies that work on Italian [27] and Arabic [24]. Most of the chosen studies conducted a cross-domain experiment, where the domain can be refer to topical focuses or platforms. We also noticed that this research focus is still relatively new, with the earliest works were initiated in 2018 [64, 145, 147]. All studies adopted a supervised approach by training a model on a training set and predicting instances on the test set. Following, we provide a deeper discussion to compare each work based on the models (traditional machine learning based, neural based, or transformer based), features (a very wide variant of features), and approaches adopted to deal with domain-shift specifically.

Table 2 Summary of approaches adopted by existing studies for cross-domain abusive language detection tasks

3.2.1 Models

A wide variety of models was adopted to deal with this task. Some studies exploited traditional machine learning approaches such as linear support vector machine classifiers (LSVC) [64, 92, 94], logistic regression (LR) [121], and support vector machine (SVM) [24, 147]. Their argument for adopting the traditional approach was to provide better explainability of the knowledge transfer between domains. Some other studies adopted several neural-based models, including convolutional neural networks (CNN) [75, 141], long short-term memory (LSTM) [8, 75, 92, 94, 145], bidirectional LSTM (Bi-LSTM) [115], and gated recurrent unit (GRU) [27]. The most recent works focus more on investigating transferability or generalizability of state-of-the-art transformer-based models such as Bidirectional Encoder Representations from Transformers (BERT) [19, 48, 66, 79, 83, 90, 92, 134] and its variant like RoBERTa [48] in the cross-domain abusive language detection task.

In the early phases of cross-domain abusive language detection, specific models which adopt joint-learning [115] and multi-task [145] architectures achieved the best performance. These architectures were proven to be effective for transferring knowledge between domains. However, in the latest studies, transformer-based models succeed in achieving state-of-the-art results. The most recent study by Glavas et al. [48] shows that ROBERTa outperformed other models such as BERT in the cross-domain setting of the hate speech detection task. This result confirms a recent finding on other natural language processing tasks [18], i.e., that a pre-training language model trained on huge corpora provides a more general representation for knowledge transfer.

3.2.2 Feature representation

A wide range of features was also exploited in this particular task, ranging from straightforward n-gram representations to the most recent contextual language representations. Several text representation were used for the traditional machine learning model, including n-grams [24, 64, 75, 92, 94], TF-IDF [121], and word2vec [121]. Some studies also proposed to use linguistic features such as emoji information [27] and lexical [27, 92, 94, 147] features by using a specific lexicon. Most of the neural models in this task used word embedding as the text representation model. Several pre-trained models were exploited, such as FastText [27, 92, 94], GloVe [75, 134] and ELMo [115]. Finally, the transformer-based models use pre-trained models based on a very big corpus such as BERT [19, 48, 66, 79, 83, 90, 92, 134] and RoBERTa [48]. However, we also observe a study that proposes to re-train the BERT representation on a specific corpus related to abusive language [19]. Finally, the work by Nejadgholi and Kiritchenko [83] proposed to use unsupervised topic modeling approach to generate the features for obtaining better topic generalization on cross-dataset abusive language detection experiment.

Our study discover that several state-of-the-art pre-trained models provide the best feature representation and better generalization to deal with domain-shift in the cross-domain abusive language detection task. Interestingly, some studies proposed using external resources to facilitate the knowledge transfer between domains by delivering domain-independent features. These additional features were infused into either traditional models [147] or neural-based models [94] and succeeded in improving the prediction performance. Wiegand et al. [147] show the effectiveness of additional features from their novel abusive words lexicon in a cross-domain abusive language detection setting. The additional features were represented as a score based on the confidence learned by an SVM classifier. Similarly, Pamungkas et al. [92, 94] exploited the HurtLex lexicon, which contains a list of abusive words in 17 categories. The features were represented as a 17-column binary vector, to indicate the presence of each word category in the document. The vector was then concatenated to the representation of the message computed by LSTM network.

3.2.3 Domain transfer

The main challenge of cross-domain classification is the domain shift between training and testing data. Several methods have been proposed by studies in more mature areas, such as sentiment analysis [35, 95, 151]. These techniques are usually called domain-adaptation or domain-transfer, a specific approach to allow the model to learn domain-independent features, intersecting between two or more different domains. In the abusive language detection task, several features could represent an important signal for knowledge transfer between domains, such as the use of abusive words [147], emotional information [109, 119], and some other linguistic features [27, 66, 92, 94]

Table 2 shows that studies have different approaches to cope with the domain-shift problem. Some works proposed to combine the training sets from several different domains dataset [27, 48, 90, 92]. This straightforward approach allows the trained model to obtain wider domain coverage for detecting abusive language. Most aforementioned studies found that this simple approach was proven to be effective in this task. However, there is still a possibility that the trained model would struggle when applied to data from the totally unseen domain. Several other studies experimented with the use of lexicon as a domain-independent feature to bridge the domain-transfer. Wiegand et al. [147] used their novel lexicon automatically induced from HateBase, a platform that provides several keywords related to hate speech. Meanwhile, Pamungkas et al. [92, 94] and Corazza et al. [27] exploited HurtLex, a manually built lexicon by DeMauro [34], which contains offensive words structured in 17 different categories. Additional features from these lexica were also proven helpful to facilitate the transfer of knowledge between domains.

We also found some works that tried to modify the input sample for training the model in order to minimize the domain-shift issue between source and target domains. For example, Nejadgholi and Kiritchenko [83] used the topic modeling approach and proposed to remove the domain-specific instances from the training set, resulting in the improvement of the model’s performance. Another effort by Karan and Snajder [64] adopted a domain adaptation approach called FEDA (Frustratingly Easy Domain Adaptation), which works by duplicating features across domains to allow the model to learn domain-dependent weights for each feature. Finally, Mozafari et al. [79] proposed to deal with the racial bias on the abusive language dataset by re-weighting the input samples using the existing regularization approach. Their approach was shown to be effective in decreasing the dataset bias issue, which was found as one of the main problems in cross-domain classification.

We also notice that some studies focus more on providing better representation to improve the model’s domain generalization. Wang et al. [141] proposed a multi-aspect embedding, which combines several representations, including target, content, and linguistic behavior, to provide domain-transfer knowledge. Then, Caselli et al. [19] proposed to retrain state-of-the-art BERT with a huge abusive language corpus to obtain a more specific representation for abusive language detection tasks.

Furthermore, we discover two studies proposed new architectures to tackle cross-domain abusive language detection task specifically. Rizoiu et al. [115] proposed a joint-learning model based on Bi-LSTM, which allows the model to learn from two datasets sequentially, obtaining better generalization. In addition, Waseem et al. [145] proposed a multitask learning architecture based on LSTM to learn the problem from two or more tasks sequentially, providing a medium for knowledge transfer between domains. The rest of the works more focused on investigating the transferability of some models, including BERT in the cross-domain abusive language detection [121, 134]. They found that using BERT only without a specific approach for bridging domain-shift already achieves a solid result.

4 Multilingual abusive language detection

Another prominent challenge in abusive language detection is the multilinguality issue. Even if in the last years abusive language datasets were developed for other languages, including Italian [15, 41], Spanish [41], and German [148], English remains by far the most represented language. Recently, deep learning approaches have been applied, achieving state-of-the-art results for some languages [9, 78]. However, most of the proposed models are tested in monolingual settings, mostly in English. Since the most popular social media such as Twitter and Facebook are highly multilingual, fostering their users to interact in their primary language, there is a considerable urgency to develop a robust approach for abusive language detection in a multilingual environment, also for guaranteeing a better compliance to governments demands for counteracting the phenomenon — see, e.g., the recently issued EU commission Code of Conduct on countering illegal hate speech online [37].

Similarly to other natural language processing tasks [62], detecting abusive language in less-resourced languages is a prominent and timely challenge. For example, the escalation of hate speech against Muslims in Rohingya Myanmar was also affected by the failure to stop spreading hate comments on Facebook due to the difficulty of processing Burmese text automatically Footnote 5. The current availability of datasets in many languages [103], makes the time ripe for addressing the multilingual challenge. Cross-lingual transfer learning is the common approach to transfer knowledge from one language (usually with more available resources) to another language (usually with less resources) [69, 126]. In this approach, models are trained and optimized on a dataset from one language (called source language), and then tested on another language (called target language). Zero-shot learning is an extreme case of transfer learning, where a model trained on one language (such as in this work) or one domain is employed to predict samples from a totally unseen language or domain [51]. The less extreme form of transfer learning is few-shot learning, where a percentage of samples from unseen data (target language) is added to the training set, allowing the model to learn a better generalization between two languages or domains [126].

In this section, we discuss the development of studies in building robust models to detect abusive language across multiple languages. Specifically, we focus on the abusive language detection task in a cross-lingual setting. We review the available abusive language datasets in languages other than English, which could be exploited for this task. Importantly, we also deeply discuss several existing approaches that have been proposed in this task, mainly focusing on the method to transfer knowledge between languages.

4.1 What datasets are available for multilingual abusive language detection study?

In this section, we present information regarding the available datasets for abusive language tasks across different languages. Since we already presented the English datasets in the cross-domain part, in this section we only review the available datasets in languages other than English, which we will call lower-resourced languages for the rest of this article. We obtain this information based on the two most recent reviews [103, 136] which focused on the available resources in abusive language tasks. In addition, we also add more uncovered resources from the most recent shared tasks in the abusive language field, such as Misogyny@EVALITA2020 [40], HaSpeeDe@EVALITA2020 [122], and OffensEval@SemEval2020 [154]. We also search for the recently available resources from the last edition of Language Resources and Evaluation Conference (LREC) 2020Footnote 6 and Workshop on Online Abuse and Harms (WOAH) 2020Footnote 7, where we discover some datasets that are still not covered in these surveys. Table 3 summarizes the information of these lower-resourced languages datasets for abusive language detection task. We provide an in-depth discussion focusing on the comparison of these resources in the following.

Table 3 Summary of available abusive language datasets across different languages


In Table 3, we use the ISO 639-1 language code to represent the language names. We provide the list of languages with their corresponding code in Appendix Table 6. Based on Table 3, the abusive language datasets were already available in 18 different languages. Despite being not as many as in English, we notice that some languages have more resources than others, such as Arabic (AR), Hindi (HI), and Italian (IT). However, some other languages only have one resource available such as Czech (CS), Croatian (HR), Poland (PL), Swedish (SW), Turkish (TR), and Vietnamese (VI). The availability of these lower-resourced datasets indicates that this research direction is still growing. However, we notice that these resources are more centered on Indo-European languages. We still could not find datasets in the Niger-Congo language family which are mostly used in some African regions. The datasets in Afro-Asiatic, Austronesian, and other language families are also far less than Indo-European languages. Moreover, we observe Hindi-English (HI-EN) code-mixed datasets, all focusing on detecting hate speech. The first dataset of hate speech in Hindi-English code-mixed was proposed by Bohra et al. [14]. Mandl et al. presented a new collection created for a shared task, Hate Speech and Offensive Content Identification (HASOC), at FIRE 2019. Recently, Rani et al. [111] proposed the first Hindi-English hate speech dataset containing tweets written in both Roman and the native Devanagari script. Additionally, a Swahili-English code-mixed hate speech dataset was recently published [87]. They gathered their dataset from Twitter, mainly related to the 2017 general election in Kenya. It is worth mentioning the work

by Oriola and Kotze [88] proposing a code-mixed Twitter dataset containing 14,896 tweets written in a mix of four different languages, namely English, Afrikaans, IsiZulu, and Sesotho.

Topical focus

Similarly to the English datasets, these lower-resourced languages datasets also feature different topical focuses, where hate speech is the most used phenomenon to describe the resource. Other datasets cover several abusive phenomena such as offensiveness, abusiveness, misogyny, aggressiveness, and cyberbullying. The topical focus is also an important aspect to be considered in the cross-lingual abusive language detection task. A study found that topic bias was one of the main issues in cross-lingual abusive language detection [8]. If we do not want to deal with topic-shift between languages, we notice some datasets which only focus on one topic and cover more than one language, such as hate speech and misogyny. We also aware that there are a lot of datasets that have hate speech topics. However, different approaches in collecting the data could potentially introduce another bias issue when exploited in cross-lingual settings. As observed by Arango et al. [8], several biases such as user bias, racial bias, and sampling bias could be an issue in cross-lingual abusive language detection task. Otherwise, we can freely choose the available datasets if we want to tackle both domain-shift and language-shift.

Data source

Most resources were retrieved from social media platforms such as Twitter, Facebook, and Instagram. Twitter is the most convenient platform which provides API and a more friendly policy to retrieve and distribute the samples gathered from its platforms. We can see from Table 3 that almost 60% of abusive language datasets were obtained from Twitter. Some other datasets were obtained from comments on news sites, online forums such as Reddit and Youtube comments. In a multilingual or cross-lingual setting, we also need to pay attention to the source of the data. Every source has its own specific characteristics, such as stylistic aspects and formality levels. Twitter data have some specific features, such as hashtags and user mentions. Language in social media platforms is usually used more informal language than other sources such as news site comments.


Based on the manual check, most of the abusive language datasets in lower-resourced language were made publicly available. We only discover 4 out of 60 resources were not shared publicly by their authors. However, some authors decided to provide only the tweet identifier due to some Twitter policies and allowed us to retrieve the tweets by using the Twitter public API. The restricted datasets are mostly obtained from other sources than Twitter, which provides a more strict policy for sharing the data.

Annotation scheme

Similar to the cross-domain setting, in the cross-lingual experiment, we also need to uniform the labels of every dataset. Most previous studies decided to binarize the label into two classes, namely abusive and not abusive. Based on our investigation, some datasets have more than two labels to capture a finer-grained phenomenon instead of merely limiting it to binary labels. Previous studies proposed to combine some labels when some of them can be safely merged into one class [64, 94]. For example as adopted by Karan and Snajder [64], they combined overtly aggressive and covertly aggressive labels as abusive class and not aggressive as not abusive class of the TRAC-1 datasets by Kumar et al. [67]. Otherwise, we can remove the data with a specific label when it is too problematic to merge some classes into one class. For example, the proposed dataset by Ousidhoum et al. [89] introduces some classes, including hate speech, abusive, offensive, disrespectful, fearful, and normal. In this case, we can combine hate speech, abusive, and offensive into one abusive class, but it is quite problematic to include the disrespectful and fearful label in the class, as proposed by Aluru et al. [6].

Data distribution

In the cross-lingual setting of abusive language detection task, we also need to consider the data distribution of training (in source languages) and testing (in target languages) data. Based on our manual inspection, most of the resources have more positive (abusive) samples than negative (not abusive) ones. As mentioned in the cross-domain part, maintaining the same class distribution of training and testing data is important to have a more reliable evaluation and avoid bias in the models [92, 94]. Therefore, if the test set only contains 20% of abusive instances, a similar distribution can be imposed on the training set in the source language by adding or removing instances.

4.2 What has been done so far in multilingual abusive language detection study?

This section presents the existing studies focusing on building robust models to detect abusive language across different languages automatically. Overall, we use the same approach, as shown in Fig. 1, to collect related studies from several publication repositories. The only difference is the keywords used to query the relevant publications. For this purpose, we employ four keywords, namely “cross-lingual abusive language detection”, “cross-lingual hate speech detection”, “multilingual abusive language detection”, and “multilingual hate speech detection”. We use these keywords in two scientific publication repositories, namely Google Scholar and ACL Anthology. In the case of Google Scholar, we limit the query only to the first five pages of each keyword, without any limitation on publication time. We also check the cited documents and references for each document shown in the query result. Finally, we also remove some studies which did not provide any objective and insight to build a robust model to detect abusive instances across languages. For example, we notice some experiments with different models to cope with datasets in different languages.

Table 4 summarizes the existing works found on abusive language detection across different languages. We notice that the study in this direction is still relatively new, with the first study found in 2019. The works are more centered on languages from the Indo-European family, such as English, French, Spanish, Italian, German, and Hindi, in line with the available resources. Most of them tried to transfer the knowledge from a resource-rich language (English) to other languages with the lower resource available. All studies proposed a supervised approach, where most of them utilized a multilingual language representation as a basis for knowledge transfer between languages. Following, we discuss the gathered studies in this direction, focusing on several aspects, including the model adopted, features used, and approaches proposed to deal with language-shift.

Table 4 Summary of approaches adopted in existing studies on cross-lingual abusive language detection tasks

4.2.1 Models

Based on Table 4, most studies implemented transformer-based architecture to deal with abusive language detection in a cross-lingual setting. However, we also observe some works that exploited a traditional machine learning approach, such as logistic regression [6, 10, 135], linear support vector machines [92, 94], and support vector machines [59]. They used multilingual language representation or simple translation tools (to translate the data training to the target languages) for the knowledge sharing between languages. Some studies also exploited several neural-based models such as LSTM [29, 92, 94, 135], Bi-LSTMs [29], and GRU [6, 28]. The more recent works adopted several transformer-based architectures due to the availability of multilingual transformer models such as Multilingual BERT [1, 6, 48, 92, 100, 132, 135], RoBERTa [30, 31], XLM [28, 132], and XLM-RoBERTa [30, 31, 48, 110]. Interestingly, we also notice some works that proposed a multichannel architecture based on the multilingual BERT model [20, 130], which allows the model to learn the task in several languages sequentially. Finally, we also discover a study proposed to adapt a multitask approach to deal with this task [89].

Based on our investigation, transformer-based models with multilingual language representations effectively deal with language-shift in the zero-shot cross-lingual abusive language detection task. A recent study shows that XLM-RoBERTa provided a more robust performance than other multilingual language models, including multilingual BERT and RoBERTa [30, 48, 110]. However, the most recent study shows that the use of a straightforward English BERT pre-trained model with the help of translation tools already achieved a competitive result. The more complex approaches that adopt joint-learning [94], multi-channel [130], or multi-task [89] architectures obtained more competitive results compared to previously mentioned models.

4.2.2 Feature representation

For the traditional models, some works used the LASER Embedding model, which provides a language-agnostic representation across 93 languages. A study by Basile et al. [10] proposed to use TF-IDF representation of bleached characters n-grams. Other studies simply translated the training data to the target language and used the word n-grams feature representation [6, 59, 92, 94]. Meanwhile, most neural-based models were coupled by multilingual word embedding models, including Facebook MUSE (Multilingual FastText) [6, 89, 92, 94] and Babylon Embeddings [89]. Finally, the transformer-based architectures exploited the multilingual pre-trained model trained on the very big corpus such as Multilingual BERT [1, 6, 48, 92, 100, 132, 135], RoBERTa [30, 31], ULMFit [31], and the recent XLM-RoBERTa [30, 31, 48, 110]. It is worth noting that we also discover that some features were introduced to complement the language representation, providing language-agnostic information for knowledge transfer such as a hate-specific lexicon (HurtLex) [29, 92, 94] and emotion features based on emoji presence [28].

Overall, almost all cross-lingual abusive language detection studies exploited multilingual language models as the main feature representation. In particular, the most recent studies found that a multilingual representation based on XLM-RoBERTa obtained the most robust result and outperformed other multilingual language models [30, 48, 110]. Several studies also presented the interesting finding that infusing language-agnostic features extracted from hate-specific lexicons HurtLex, in particular) [29, 94] and emoji-based features [28] could improve abusive language detection systems in a multilingual setting. In the case of HurtLex, the feature was represented as a one-hot vector which indicates the word presence in 17 HurtLex categories [94]. Meanwhile, Corazza etS al. [28] exploited common information conveyed by emoji for building a pre-trained Masked Language Model (MLM).

4.2.3 Language transfer approaches

Cross-lingual transfer learning is the common approach to transfer knowledge from one language to another language [69, 126]. In this approach, models are trained and optimized on a dataset from one language (called source language), and then tested on another language (called target language). In this task, a specific model or approaches is needed to facilitate the knowledge transfer between language. In this subsection, we discuss several approaches proposed by existing works to bridge the language-shift in cross-lingual abusive language detection task.

Several works proposed the most straightforward approach by utilizing machine translation tools to align data training and testing language. Most of them used Google Translate, which provides reliable translation results. Pamungkas et al. [92, 94] exploited Linear Support Vector Classifier with TF-IDF feature representation of translated data by Google Translated. Some other works also tried to align the language of test data to the source language before feeding them to state-of-the-art English BERT pre-trained models [6, 92]. The translation tools were also used to obtain parallel corpora in some studies which propose a joint learning or multichannel architecture. These architectures require these corpora to allow the model to learn the task in two or more languages sequentially [20, 92, 94, 130].

Some existing studies proposed to experiment by infusing language-agnostic features as language-independent information for transferring knowledge between languages. Pamungkas et al. [92, 94] and Corazza et al. [29] used features extracted from HurtLex [12], a multilingual lexicon that specifically contains abusive words. Another work by Corazza et al. [28] exploited a language-agnostic feature provided by emoji in the Twitter data. They argued that emoji could give some signals related to emotion information.

A novel architecture was also proposed by several works to obtain a better learning representation across different languages. Glavas et al. [48] proposed to continue the training process of Multilingual BERT and XLM-RoBERTa models via masked language modeling. Pamungkas et al. [92, 94] presented a joint-learning architecture model to learn the task in source and target languages sequentially. Then, Casula et al. [20] and Sohn et al. [130] introduced a similar architecture by introducing a multichannel model based on multilingual pre-trained models. Then, Stappen et al. [132] introduced novel architecture consisting of a frozen Transformer Language Model (TLM) and Attention-Maximum-Average Pooling (AXEL) to deal with the zero-shot cross-lingual classification. Finally, Ousidhoum et al. [89] proposed a multitask architecture based on Sluice Network [118] coupled with Babylon cross-lingual word embedding [129], which allows the model to share the same parameters from other related tasks.

The cross-lingual task heavily relied on the machine translation tools for a long time before the emergence of multilingual language representation in recent years. Some prior studies conducted an exploratory experiment to test the robustness of these multilingual language representation models in abusive language detection tasks, without any specific knowledge transfer approaches between languages. Pamungkas et al. [92, 94] and Aluru et al. [6] used a straightforward logistic regression model coupled with Multilingual LASER Embedding. In addition, they also experimented with the Multilingual FastText embedding. Then, several other works [6, 92, 94, 100] also tested the robustness of the Multilingual BERT model to tackle cross-lingual abusive language detection. Meanwhile, Ranasinghe et al. [110] and Dadu and Pant [30, 31] experimented with the recent state-of-the-art multilingual language representation XLM-RoBERTa to deal with this task. Finally, we observe work that proposed two data augmentation techniques for cross-lingual transfer by adding a training set with filtered other data samples and using an ensemble model based on the Multilingual BERT pre-trained model [1].

5 Challenge and opportunities

The analysis of the relevant literature done so far gives us a picture of cross-domain and cross-lingual abusive language detection as challenging tasks. Several challenges emerged, summarized as follows:

  • Bias issue on the existing datasets. Several studies mentioned that dataset bias is one of the main issue which contributes to the difficulties of abusive language detection in both cross-domain and cross-lingual settings. Several kinds of bias were found, including topic bias [146], author bias [146], and racial bias [32]. Among these biases, topic bias is the most influential issue, as noticed also by some works in cross-domain [94] and cross-lingual [8] abusive language detection task.

  • The insufficient ability of current multilingual language models to transfer knowledge between languages in the specific hate speech detection task. Especially, in the use of some swear words which are very culture-dependant and vary from a language to another. Similar issues were also observed by Pamungkas et al. [91, 94], where swear words have an important role in a cross-lingual setting of abusive language detection in which some of them are not directly translatable by using machine translation tools.

  • Language- and topic-shift. Language-shift is not the only issue to deal with in a cross-language setting, but also the topic-shift between one dataset and others [48]. This due to the differences in task formulation and the nature of the abusive language datasets. This issue is related to the first challenge mentioned in this list, which is also in line with the findings of a recent study in cross-lingual hate speech detection [8].

  • Unstable performance of models in different target languages [48]. Existing works show that the performance in more resource-rich languages is higher than in lower-resource languages. This may be related to multilingual language representation models being trained on different amounts of data in different languages [149].

  • Difficulties in producing a dataset that encompasses multiple facets and targets of abusive language online [48, 134]. The effort to merge several datasets with different topical focuses still does not obtain a significant result [92]. Actually, this issue is not only an issue for this research area, but rather for every task in which manually annotating a new dataset is very labor intensive and a highly subjective task.

  • Intrinsic complexity in defining abusive phenomena and variety of definitions. Different concepts and terms were introduced across studies for similar abusive phenomena [137]. This issue contributes to the difficulties in providing a better experimental setting for the cross-domain abusive language detection task.

Based on these challenges, we also point out several opportunities for future studies in this research direction, which are summarized below.

  • Applying debiasing approaches on the available dataset. Several studies have explored this direction by adapting debiasing techniques from other research topics to reduce the bias issue in abusive language datasets [96, 112]. They proved that reducing or removing bias on either the language model or the datasets could improve the model performance in detecting abusive language automatically.

  • Having an abusive dataset covering multiple facets of abusive behavior could be a first important step important to develop robust systems which are stable and well-performing across different abusive domains [134]. However, this is not a trivial task, since obtaining broad samples of abusive instances of real online discourse is very difficult.

  • Developing a pre-trained word embedding model, specifically for abusive language detection tasks, also in a multilingual setting [134]. Often text in abusive utterances has specific characteristics compared to traditional text, which involve either explicit mention of abusive words, obfuscated words and implicit abuse, i.e., indicating negative stereotypes. Several studies have been proposed to deal with this solution, which we think need to be followed up [13]. This solution may could help to cope with cross-domain and cross-lingual task difficulties.

  • Several previous studies showed the effectiveness of the infused features from a domain or language-independent resource both in cross-domain and cross-lingual settings. The further development of exploiting other resources could also help the model to transfer knowledge between domains and languages. For example, some studies highlighted the importance of emotion information in abusive language detection task [109, 119]. Therefore, exploring the use of emotion information as domain- or language-independent features for knowledge transfer would be valuable. Another study by Pamungkas et al. [93] also proved the usefulness of external features extracted from HurtLex, a multilingual lexicon that contains offensive words structured in 17 different categories. HurtLex contains a wide range of hateful words, organized in general categories sometimes related to cultural stereotypes, ranging from ethnic slurs to insulting words that target physical disabilities and derogatory senses in different languages. Specifically, they found that HurtLex can help the knowledge transfer of abusive language detection across languages, which often make use of rhetorical figures (e.g., metaphors, synecdoche, metonymy) and idiomatic expressions, and they are highly sensitive to geographical, temporal, and cultural variations, especially when the derogatory meaning is linked to stereotype and prejudice.

  • Tackling the lesser performance of multilingual language representations in low-resource languages also needs to be considered. Even a study by Wu and Dredze [149] found that 30% of languages in the multilingual BERT model with lower pretraining resources obtain worse performance than without using a pre-trained language model. Several possible solutions could be considered. One of the main answers is by developing monolingual embedding with sufficient training data. Since, Wu and Dredze [149] also found that monolingual language always obtains better performance than multilingual BERT when sufficient data available to develop the pre-trained model. Another possible solution is to extend the current multilingual model to improve its language coverage as proposed by Wang et al. [142].

  • Focusing on the model and architecture engineering to facilitate the language and domain transfer is another possible, solid solution for such endeavors. Existing studies show that some techniques such as joint-learning, multitask learning, and MLM-ing effectively alleviate the models’ performance. For future work, it could be interesting to implement other transfer learning approaches, by exploiting deep learning techniques, both traditional deep learning and adversarial deep learning [156].

  • On the theoretical counterpart, a careful study of the notion of every abusive behavior online which is modeled with the purpose of automatic detection is important, to obtain a clearer terminology and understanding of the abusive phenomena we want to capture in language. The study by Vidgen et al. [137] proposed several possible solutions to address this issue, which can be considered for future works.

6 Conclusions

This survey provides a comprehensive overview of existing studies in developing robust models to detect abusive language across different domains and languages. First, we present the available datasets that could be exploited in this research direction, covering multiple platforms, abusive phenomena, and languages. We also review the approaches that have been proposed in this field, focusing on analyzing the specific methods to transfer knowledge between domains and languages. Finally, we also present the current challenges and opportunities related to this focus based on the existing works, providing further research development insights.

This study observe that most of the available abusive language datasets are gathered from social media platforms such as Twitter, Facebook, and Instagram. Twitter is the most exploited source of data, which may be due to the convenience of retrieving the samples using the Twitter public API and of the policies for sharing the data. We also notice that hate speech is the most studied phenomenon, compared to other abusive phenomena such as toxicity, offensiveness, and cyberbullying. A wide variety of methods have been proposed to deal with the cross-domain study of abusive language detection task. However, the most recent transformer-based architecture succeeds in obtaining the most promising result. Several studies also proposed specific approaches to coping with the domain shift in the cross-domain setting, such as merging datasets from different domains, modifying the input sample to minimize domain-shift, and proposing novel architectures to facilitate domain transfer, and using external resources as a domain-independent feature.

In the cross-lingual settings, we focus on non-English resources and observe that abusive language datasets are already available in 18 languages, but they are more centered on the Indo-European languages family. There are several underrepresented or even unavailable yet resources in some language families, including Afro-Asiatic, Austronesian, and Niger-Congo. Most datasets in languages other than English were also retrieved from the Twitter platform. Most studies in this direction focus on transferring knowledge from a resource-rich language to other lower resource languages. Like in cross-domain studies, most works in cross-lingual settings also exploit transformer-based architectures and use the available multilingual language representation models. Other studies also proposed several specific approaches to share information between languages, including machine translation to align training and testing data, infusing language-agnostic features as language-independent information, and offering novel architectures to facilitate knowledge transfer between languages.

Finally, we identify some recent challenges and opportunities in this research direction. Dataset bias is one of the main issues contributing to the difficulties of cross-domain and cross-lingual settings of abusive language detection tasks. This issue is an open problem, whereas the challenge is to develop novel resources that are less biased and cover different facets of abusive phenomena online. On the theoretical side, different concepts and terms were used across studies to describe similar abusive phenomena, which is problematic in the context of the cross-domain setting of abusive language detection task. Further exploration of every abusive phenomenon notion is also vital to obtain more precise terminology in the abusive language field. Overall, analysis of the relevant literature done in this study gives us a picture of cross-domain and cross-lingual abusive language detection as challenging tasks. Despite this research field still being in the early phase of development, the existing studies confirm the urgency of tackling this task and its further development opportunities.