Definitions of hate speech
Definitions of hate speech can largely be found in four different contexts: (1) legal, (2) lexical, (3) scientific, and (4) practical; and these differ in scope and content. Below is a brief overview of each.
-
(1)
The purpose of legal definitions is straightforward: to identify messages that violate existing legal norms and require government regulation, i.e., messages that are publicly shared, incite, promote, or justify hatred, discrimination, or hostility toward a specific group and/or individual, based on certain attributes, such as race or ethnic origin, religion, disability, gender, age, sexual orientation/gender identity [4, 19,20,21]. There is no universally accepted legal definition of hate speech and different states, the primary duty bearers under international human rights, fall under different jurisdictions [22], Walker 1994 in Calvert [23]. What is considered “hateful” can be controversial and disputed. State actors, including Governments, legislatures, State authorities, and courts, therefore need a common understanding of the phenomenon before they can take relevant action to address it [24].
-
(2)
Five of the most well-known online dictionaries define hate speech as follows. Hate speech is: (1) “public speech that expresses hate or encourages violence towards a person or group based on something such as race, religion, sex, or sexual orientation” (The Cambridge Dictionary); (2) “speech expressing hatred of a particular group of people” (The Merriam Webster); (3) “speech that attacks a person or group on the basis of race, religion, gender, or sexual orientation” (Collins Dictionary); (4) “a statement expressing hatred for a particular group of people” (The MacMillan Dictionary); (5) “speech or writing that attacks or threatens a particular group of people, especially on the basis of race, religion or sexual orientation” (Oxford Learner's Dictionary). All these definitions share the same characteristics. They define hate speech as a message (written or oral) that expresses hatred and encourages violence toward a specific group of people. These simple definitions concentrate on the meaning of the words that make up the collocation—hate and speech.
-
(3)
Scientific definitions go much further, beyond the apparent meaning of the constituent words. The term hate speech is used in different disciplines such as economics, philosophy, sociology, psychology, or computer science, and, despite the lack of a definitional consensus, we can trace some of the common constituent characteristics. As Parekh [25] states, when defining hate speech, it is necessary to distinguish hate speech from related terms that do not fit the definition, such as expressing dislike, lack of respect, a demeaning view of others, disapproval, the use of abusive or insulting speech, and speech that does “not call for action” (p. 40). These subtle distinctions could be resolved by having a proper definition of hate speech.
According to Parekh [, pp. 40–41] hate speech “expresses, encourages, stirs up, or incites hatred against a group of individuals distinguished by a particular feature or set of features such as race, ethnicity, gender, religion, nationality, and sexual orientation. […] and is often (but not necessary) expressed in offensive, angry, abusive, and insulting language”. Moreover Cohen-Almagor [25, 26] states that it is intended to dehumanize, harass, intimidate, debase, degrade, victimize, or incite brutality against the targeted groups. For an extensive philosophical analysis of the term hate speech, see Brown [9, 22].
Parekh [25] further recognizes three main aspects of the construct of hate speech: (1) it is directed at specific individuals or groups of individuals, (2) undesirable attributes are ascribed to the target group, thereby creating stigmatization; (3) it leads to discrimination because the ascription of undesirable attributes encourages the target to be seen as undesirable.
The third aspect of Parekh's definition is highlighted by Gelber [27] as well—the inequality is a core element of hate speech. In the subordinate–superior relationship, hate speech puts the creator in a position of authority and the target in the position of the subordinate, which encourages the target to be seen as inferior and legitimizes discriminatory behavior [27].
According to Gelber [27], it is important also to note that the emotion of hatred does not have to be evoked for hate speech to cause harm. Hate is not and does not have to be a key element of the definition (see also [22], “myth of hate”). On the other hand, as Brown [9, 22] points out, if manifestations of hate speech did not include hate, then this concept would be grossly misleading, but it is not. In summary, a lot of speech evokes the strong emotion of hate, but not necessarily all of it.
Last but not least, it should be noted that hate speech need not relate only to a minority or disadvantaged social group as was previously thought by classic scholars such as Jacobs and Potter (1998) or Walker (1994) as cited in [28]. In past centuries hate speech was specifically directed at minorities (e.g., Afro-Americans, the Roma community).
In the computer science research literature (see our previous work [11] for an overview of the computer-science perspective on hate speech), the most common focus related to hate speech is its (semi-)automatic detection. With the help of natural language processing (NLP) and machine learning (ML), the models can classify a particular piece of content (usually textual) according to the binary presence of hate speech or a particular aspect thereof (e.g., whether it attacks an individual or a group), see the survey by Fortuna and Nunes [10] for more information. In computer science (and especially data science), there are two specific factors influencing the employed definitions of hate speech [11]: (1) adoption of how the data are already annotated (either by other researchers or by social media platforms, (2) the observable factors that can be identified from the content available in the dataset (i.e., without accessing more intangible clues such as the authors' thinking, intent, or attitude that can be studied in other disciplines, e.g. psychology.
Like many other disciplines, computer science lacks a generally accepted definition of hate speech. In addition, it is not clear how hate speech relates to other similar/related concepts (e.g., toxic language) or superior concepts (e.g., abusive/offensive language) [11]. The boundaries between these concepts are often blurred and there are few works that specifically tackle the distinctions. One example is Davidson et al. [29], who attempt to distinguish hate speech from other instances of offensive language in their work on the automated detection of hate speech.
-
(4)
Practical definitions can be found in the way online platforms (e.g., social media) and existing detection tools define hate speech. Facebook, Twitter, and YouTube are social media platforms that mediate online communication and have developed their own definitions of hate speech to ensure users stick to the rules and as part of their internal regulatory policies (e.g., terms of service or community standards). They have signed a Code of Conduct on the regulation of illegal hate speech with the European Commission [20]. Facebook’s Community Standards [30] define hate speech as “a direct attack against people on the basis of what they call protected characteristics: race, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity and serious disease”.
The practical tools for identifying and moderating hate speech, such as profanity filters, content moderation filters, or human driven approaches, have been widely studied [5, 10, 11, 31, 32]. In the context of social media platforms, the findings on utilization of practical tools are somewhat puzzling as they either show that social platforms perform too much or too little content moderation and lack transparent decision-making processes [33].
Capturing the complexity of hate speech in a unified, precise, and operable definition (either theoretical or more empirically oriented) could potentially lead to a better understanding of hate speech and help reduce its occurrence online. Fortuna and Nunes [10] reviewed all four types of (legal, lexical, scientific, and practical) hate speech definitions. Following a content analysis, they proposed their own theoretical definition (probably one of the most fleshed out ones) [10, p. 5]:
Hate speech is language that attacks or diminishes, that incites violence or hate against groups, based on specific characteristics such as physical appearance, religion, descent, national or ethnic origin, sexual orientation, gender identity or other, and it can occur with different linguistic styles, even in subtle forms or when humour is used.
There are other useful and inspiring attempts at a universal definition reached through a process of characterizing the toxicity of comments [34, 35], or utilizing context information [36, 37].
Nevertheless, when it comes to their practical application (e.g., during data annotation for machine learning), these theoretical definitions are still too vague and difficult to capture. In addition, many local factors (e.g., cultural, platform, task at hand) make it difficult or even impossible to produce a universal theoretical definition. Indeed, we argue that the potential for finding one seems unlikely at the moment.
Consequently, our aim in this paper is to define hate speech empirically by identifying specific, measurable, observable hate speech indicators.
Indicators of hate speech
An indicator is “a sign that shows you what something is like or how a situation is changing” (Oxford Learner's Dictionary). Previous studies have looked at indicators of cyberbullying [38], operational indicators of fake news [39], and behavioral indicators related to sexting [40]. Indicators are also frequently used in psychopathology as a means of conceptualizing and diagnosing mental health problems. We propose that operationalizing hate speech in the form of indicators could fulfil the same purpose—provide objective, measurable cues to help “diagnose” hate speech and find effective interventions.
Some existing research papers on hate speech deal with the content characteristics that describe hate speech and can be considered precursors to hate speech indicators. However, these are not primarily aimed at identifying hate speech indicators using an adequate methodology and so do not assess them satisfactory.
First, Fortuna and Nunes [10] identified four dimensions (that can be considered analogous to indicators) that allowed them to compare definitions of hate speech: (1) hate speech has specific targets, (2) hate speech is to incite violence or hate; (3) hate speech is to attack or diminish; and (4) hate speech can be expressed through (and hidden in) humor.
Secondly, some computer science studies have gone further than producing a binary classification and have distinguished different aspects of abusive/offensive language (as a superior concept to hate speech). Waseem et al. [12] proposed a typology of abusive language based on defining two dimensions: (1) directed versus generalized, and (2) explicit versus implicit abusive language. Zampieri et al. [13] similarly annotated and automatically classified offensive language according to directness (targeted insult, untargeted) and target identification (individual, group, other). By emphasizing these critical aspects of offensive/abusive language, these studies reduce the ambiguity in definitions of hate speech and related concepts. Ousidhoum et al. [41] focused specifically on hate speech and manually annotated a multilingual dataset of Tweets in three languages, distinguishing five aspects: directness, hostility type (e.g., abusive, hateful, offensive), target attribute (i.e., what is being targeted, e.g., origin, gender, religion), target group (i.e., who is being targeted, e.g., individual, women), and annotator sentiment (i.e., the emotion felt by the annotator after reading the Tweet, e.g., disgust, fear, anger).
Other things resembling indicators can be identified in data annotation methodologies as well. The manual annotation of hate speech is a subjective process and annotators need extensive cultural and societal knowledge [42]. Human annotators must therefore be provided with detailed instructions on how to recognize and label hate speech consistently. For this purpose, Waseem and Hovy [43] proposed a list of 11 criteria to guide annotators, examples items are: hate speech (1) uses sexist or racial slurs, (2) criticizes a minority and uses straw man arguments; and (3) defends xenophobia or sexism.
Finally, in NLP and ML models, training of detection models commonly involves a feature engineering step. Features utilized to train models (especially those with a strong prediction capability) can be considered a kind of indicator as well. Fortuna and Nunes [10] analyzed a number of detection methods and identified two categories of features: general features commonly used in text classification approaches and specific hate speech detection features. The latter category, which is more closely related to the concept of indicators, was further divided into several subcategories of features that are intrinsically related to the characteristics of hate speech: (1) othering language, (2) perpetrator characteristics, (3) objectivity-subjectivity of the language, (4) declarations of superiority of the ingroup, (5) a focus on particular stereotypes, and (6) intersectionism of oppression.
To conclude, despite the increasing understanding in the social sciences of the factors that may underlie online hateful behavior [44, 45], we found no studies attempting to systematically develop a set of indicators covering the main aspects of hate speech. To fill this gap and conduct a systematic study of hate speech indicators, we aim to provide an empirical definition of hate speech, while taking into account both social science and computer science approaches (with the help of existing research and interdisciplinary experts).