Quarantining online hate speech: technical and ethical perspectives
- 3.5k Downloads
In this paper we explore quarantining as a more ethical method for delimiting the spread of Hate Speech via online social media platforms. Currently, companies like Facebook, Twitter, and Google generally respond reactively to such material: offensive messages that have already been posted are reviewed by human moderators if complaints from users are received. The offensive posts are only subsequently removed if the complaints are upheld; therefore, they still cause the recipients psychological harm. In addition, this approach has frequently been criticised for delimiting freedom of expression, since it requires the service providers to elaborate and implement censorship regimes. In the last few years, an emerging generation of automatic Hate Speech detection systems has started to offer new strategies for dealing with this particular kind of offensive online material. Anticipating the future efficacy of such systems, the present article advocates an approach to online Hate Speech detection that is analogous to the quarantining of malicious computer software. If a given post is automatically classified as being harmful in a reliable manner, then it can be temporarily quarantined, and the direct recipients can receive an alert, which protects them from the harmful content in the first instance. The quarantining framework is an example of more ethical online safety technology that can be extended to the handling of Hate Speech. Crucially, it provides flexible options for obtaining a more justifiable balance between freedom of expression and appropriate censorship.
KeywordsHate speech Social media Ethical AI Quarantining Freedom of expression
[…] the category cannot exist without the state’s ratification, and this power of the state’s judicial language to establish and maintain the domain of what will be publicly speakable suggests that the state plays much more than a limiting function in such decisions; in fact, the state actively produces the domain of publicly acceptable speech, demarcating the line between the domains of the speakable and the unspeakable, and retaining the power to make and sustain that consequential line of demarcation. (Butler 1997, pp. 76–77; emphasis in original)
In the Western world, most nations now impose penalties for some forms of expression deemed hateful because of their content, and such approaches thereby institutionalise value pluralism: the relevant legislative bodies restrict the freedoms of certain citizens so that the interests and well-being of others can be safeguarded (Galston 1999).1 Self-evidently, these are areas where political philosophy and ethics become inextricably intertwined.
Hate speech is language that attacks or diminishes, that incites violence or hate against groups, based on specific characteristics such as physical appearance, religion, descent, national or ethnic origin, sexual orientation, gender identity or other, and it can occur with different linguistic styles, even in subtle forms or when humour is used. (Fortuna and Nunes 2018, p. 5)
The remaining sections of this article briefly summarise the current conventions for the regulation of online HS, before considering some of the ways in which the automatic detection of HS could be used to safeguard citizens within liberal democracies in a more ethical manner. While previous research in this area has focused primarily on the core task of developing automated methods for detecting HS, this article probes instead the way in which such technologies could be used as part of a larger infrastructure that moderates the content of social media posts in a way that does not excessively compromise freedom of expression. In particular, the method of quarantining is recommended as a particularly effective way of avoiding the problematical extremes of entirely unregulated free speech or coercively authoritarian censorship.
The regulation of online hate speech
As mentioned above, in response to growing public concerns about HS, most social media platforms have adopted self-imposed definitions, guidelines, and policies for dealing with this particular kind of offensive language. Continued criticism of these procedures, however, suggests that such an approach is far from ideal, therefore the current procedures adopted by Facebook, Twitter, and YouTube will be briefly summarised here as illustrative case-studies.
We define hate speech as a direct attack on people based on what we call protected characteristics—race, ethnicity, national origin, religious affiliation, sexual orientation, caste, sex, gender, gender identity and serious disease or disability. We also provide some protections for immigration status. We define “attack” as violent or dehumanising speech, statements of inferiority, or calls for exclusion or segregation. (Facebook 2019b)
Additional factors are the context of the comment, cultural norms (e.g., language, country), the genre/style of the comment (e.g., humour, satire), or if a post was reproduced as a means of criticism and opposition (Allan 2017). Yet the moderation process is subject to constant revision and modification. On 28th March 2019, Facebook announced that it would block and remove white supremacist content. The decision followed heavy criticism after a live-stream of a terrorist attack in Christchurch, New Zealand, had been made available on the social media platform. New Zealand’s Prime Minister Jacinda Ardern reacted saying that social media sites were ‘the publisher, not just the postman’ of extremist content online (BBC News 2019). Facebook’s decision to take a proactive stance against the spreading of nationalist and extremist material has important consequences for HS regulation on social media more generally. Nevertheless, the company’s methods for handling HS continue to be primarily reactive rather than proactive.
Language that treats others as less than human. Dehumanization can occur when others are denied of human qualities (animalistic dehumanization) or when others are denied of their human nature (mechanistic dehumanization). Examples can include comparing groups to animals and viruses (animalistic), or reducing groups to a tool for some other purpose (mechanistic). (Gadde and Harvey 2018)
The policy further states that it applies to ‘[a]ny group of people that can be distinguished by their shared characteristics such as their race, ethnicity, national origin, sexual orientation, gender, gender identity, religious affiliation, age, disability, serious disease, occupation, political beliefs, location, or social practices’ (Gadde and Harvey 2018).
Finally, since its emergence in 2005, YouTube has developed from being a merely video-sharing site to being an influential source of news, information, and entertainment for users worldwide. In recent years, it has provided a powerful platform for those who have sought to spread conspiracy theories (e.g. anti-vaccine), extremist views, and misinformation (see, e.g., Uscinski et al. 2018; Ottoni et al. 2018). Like Facebook and Twitter, YouTube relies on the reporting of dangerous or abusive content by already-offended users, and all such reports are subjected to qualitative review. Its ‘Hate Speech Policy’ (Google 2019) currently lists specific protected characteristics (11 in total, including ‘veteran status’), and a reporting tool is provided that can be used to raise concerns about videos, comments, or even whole channels that promote HS.3
The automatic detection and classification of hate speech
Given the brisk summaries in the previous section, it should be obvious why, in the last few years, the automatic detection of HS has become an active research priority. The various systems developed so far frequently adopt a binary classification framework: given a social media post P (e.g., a tweet), the system should classify P either as constituting HS or as not constituting HS. Consequently, Precision, Recall, and Accuracy are regularly used as metrics for determining system performance. For instance, Warner and Hirshberg (2012) developed a system that classified statements as being anti-Semitic or not, while Nobata et al. (2016) and Gao et al. (2017) used the categories ‘abusive’ or ‘clean’ instead. Since the precise nature of the task and the datasets used often varies from publication to publication, numerous classification strategies have been deployed (see Fortuna and Nunes 2018 for an overview) but comparing them in a meaningful manner is not always easy. While early systems tended to rely on basic word filters and simple syntactic structures to identify offensive language, more recent systems have sought to incorporate extra-linguistic knowledge-based features, as well as contextual information, to achieve more accurate detection and better classification rates. Even sociolinguistic features concerning the user’s background, posting history, and online characteristics have been included in the development and training of classifiers (e.g., Dadvar et al. 2013; Schmidt and Wiegand 2017a). To consider just a few examples in greater detail, Davidson et al. (2017) used logistic regression with L1 regularization to reduce the dimensionality of the data, and they favoured a ternary classification framework in which each tweet was identified as constituting offensive language or not, with all offensive tweets subsequently being classified as constituting HS or not. Their dataset of 25 k tweets had been manually labelled (via CrowdFlower), and using Accuracy as a scoring metric, they found that 91% of the offensive tweets were being correctly identified, but only 60% of the HS (i.e., only slightly better than random chance). By contrast, Qian et al. (2018) used a Conditional Variational Autoencoder (CVAE) to distinguish among 40 hate groups with 13 different hate group ideologies, using a dataset of 3.5 million tweets. Consequently, each tweet was associated with a specific hate category label (e.g., ‘ACT for America’) and a specific hate speech label (e.g., white nationalist, anti-immigration). This fine-grained approach enables more specific sub-classifications of HS posts, but it depends on there being enough data associated with each sub-type. In recent years, the application of Convolutional Neural Networks (CNNs) has produced higher Precision results in many HS-related tasks (e.g., Gambäck and Sidkar 2017).
Nonetheless, despite the many technical advances, there remain persistent difficulties concerning the annotation of HS-related training data. As noted above, different researchers focus on different tasks (e.g., anti-Semitic language [i.e., a specific subset of HS], abusive language [i.e., a superset of HS]). Therefore, they use different datasets, and it is often very difficult to compare and contrast the performance of the various systems. This lack of commonly-accepted training and test datasets has significantly hampered system development. There are also problems when it comes to labelling the data. The task of classifying millions of offensive tweets is usually crowd sourced (for practical reasons), yet it is hard to guarantee quality control using that method. The subjectivity of annotators remains problematical, and it arises from diverging perceptions of what constitutes HS. This divergence is a factor even when particular definitions of HS are specified, since the perceived tone and style of a given social media post (e.g., humorous, satirical) can vary greatly. Further, there has been a growing interest in the manner in which users respond to HS. The various strategies deployed are often grouped together as instances of ‘counter speech’. Crucially, users have been observed to use counter speech of different kinds, including pointing out and correcting misinformation, misrepresentations, and contradictions, warning of consequences, denouncing hateful speech, debunking via humour or sarcasm, deploying a notably positive tone, or using hostile language (Mathew et al. 2018). The important role of counter speech in countering HS is only starting to be understood, yet it clearly influences the spread of online hatred and misinformation. Clearly, more research concerning this topic (and specifically large-scale studies of datasets and the development of improved classification models) is needed.
The preceding paragraphs have offered a succinct overview of some of the recent developments in the task of detecting and classifying HS automatically. As mentioned earlier, though, the ex post facto identification of HS does not undo the harm that such material has already caused when posted online. While effective counter speech can certainly contribute to decreasing that harm (Butler 1997, p. 14), it would be far better to intercept potentially offensive posts at an earlier stage of the process, ideally before they have been read by the intended recipient. With this in mind, the following section will outline a framework for the automated quarantining of HS which extends an automated approach that has been used for several decades to decrease the damage caused by malware.
Quarantining hate speech
As summarised in “The regulation of online hate speech” section, social media o rganisations such as Facebook, Twitter, and YouTube currently use teams of moderators to determine whether potentially harmful posts should be removed or not. The current systems rely on already-offended users complaining about offensive messages, and the content of these is then assessed by teams of people who determine whether or not they should be deleted. These approaches will be referred to here as Too Little Too Late (TL2) methods, since they come into effect only after the intended harm has already been inflicted, both on the direct recipient of the message, and on any indirect recipients (including the thousands of human moderators who have to encounter hundreds of examples of disturbing material every day, see Newton 2019; Simon and Bowman 2019). Consequently, TL2 harm-reduction strategies are problematical, especially if we accept that language-mediated online harm is a serious as other sub-types (e.g., physical, financial). Also, in the influential theory of Information Ethics that Luciano Floridi (Floridi 2013, Chap. 4) has elaborated over the last few decades, there is a perceived need for an ethical framework that is primarily patient-oriented rather than agent-oriented. In other words, the moral impact of a given action is at least as important as the decision process the relevant agent followed when electing to take that action. Viewed from this perspective, reactive TL2 approaches are undesirable, since they do not prevent harm being caused. Inevitably, though, any proposed regulation designed to delimit harm raises familiar long-standing tensions between libertarian tendencies (e.g., freedom of expression) and more restrictive authoritarian ideologies/practices (e.g., censorship). These viewpoints have been prominent, for instance, in the important recent debates about HS legislation involving the legal theorists Dworkin (2009); Waldron (2012, 2017) and Weinstein (2017). The various disagreements have centred on topics such as whether HS bans necessarily undermine democratic legitimacy by depriving certain citizens of a voice in the political process, and diminishing their opportunity to speak without fear of criminal sanction. Contrasting views about such matters become vividly apparent in relation to online HS if the only available options are (i) to leave already-posted offensive material in situ, or (ii) to remove it entirely.
Yo if my son comes home & try’s 2 play with my daughters doll house I’m going 2 break it over his head & say n my voice ‘stop that’s gay’.
Yo if my son comes home & try’s 2 play with my daughters doll house I’m going 2 break it over his head & say n my voice ‘stop that’s gay.
Examples like this highlight the subtleties involved in identifying and HS-related content and implementing forms of textual censorship (e.g., ellipses, strikethroughs) that ameliorate the impact of the problematical content. Given the complexity of the task, it is important to consider an alternative form of (potentially temporary) censorship—namely, quarantining. This approach has been commonplace in cyber security applications since the late 1980s, especially as a form of protection against malware.6 For instance, Exchange Online Protection (EOP) is a spam and malware filter available as part of the Exchange Online email security service owned by Microsoft (Kjierland and Baumgartner 2018). It can be set to assess whether email is spam via EOP’s own Spam Confidence Level ruleset and the detection scores assigned by the relevant email server. Any mail message that is ranked at a value of (say) five or above from either of these checks is sent to a central quarantine area, where it is retained for 15 days before being deleted. This is just one example of how quarantining is regularly deployed to protect users against software specifically designed and intended to cause particular forms of online harm (e.g., data loss, data theft, server failure).
By contrast, if quarantining were deployed in these cases, and if a given P were automatically identified as constituting HS, then R would receive an alert such as the following:
R could then decide whether or not to read the post after seeing who has written it (e.g., ‘White Dragon’) and after being informed that it has been specifically flagged up as potentially constituting homophobic HS. R could also receive an indication of the degree of severity of the post by the value specified on the Hate O’Meter graphic. This value can easily be generated from the confidence scores (continuous values in the interval [0,1]) produced by the automated HS detection system. ‘0’ means that the post is not harmful in any way, ‘1’ means that the post is extremely harmful/offensive.
The Bipartite Method: S and R (and no one else) receive an HS alert; if they both consent to the message appearing, then P becomes visible for all other indirect recipients to read.
The Multipartite Method: even if S and R consent to post P, all other users (the indirect recipients) receive the alert when they first encounter P, and they can only access the contents if they give their consent.
The Elective Method: all users can specify the degree of online HS protection they desire. For instance, in a settings file they can specify that they want to be safeguarded from, say, racist HS, and/or homophobic HS, and/or sexist HS, and so on—or simply from all kinds of HS. Consequently, users will receive alerts for any P that falls into one of the HS subtypes from which they have chosen to be protected.
Clearly, these options involve different degrees of ‘friction’, where, in IT-related discourse, ‘friction’ denotes any process that prevents users/customers accessing as rapidly as possible the goods or services they require. For instance, having to select answers from pop-up windows, having to fill in sign-up forms, failing to find relevant product specification information, and encountering long load times—these are all examples of online friction. Such experiences may annoy users, and, in extreme cases, cause them to change to other service providers (Facebook 2019c). Consequently, for many AICT developers, zero-friction systems are self-evidently an ideal. However, it should be clear from the preceding discussion that there are numerous situations in which some friction is highly desirable. Although social media platforms have generally tried to facilitate low-friction interactions (e.g., making it as quick and easy as possible to upload and/or share photos, audio files, documents, messages), there are situations in which this can be problematical. Adopting a high-level perspective, the Bipartite Method has the least total friction since only two users encounter the HS-related alert, while The Multipartite Method has the greatest total friction since all users encounter the alert. Crucially, in the case of The Elective Method, the degree of personal friction is chosen by each user individually. In essence they become ‘voluntary consumers’ of it, and the users freely opt for a degree of friction (Sumner et al. 2011, p. 18). This does not prevent them subsequently asking the service provider to remove the HS content (as is currently the case), but it does give them more control over whether or not they access that potentially harmful content in the first place. In all of these cases, the friction itself could function as a deterrent, since it is more tiresome for S to post HS if doing so constantly triggers an alert that S subsequently has to process. The same procedure could also apply to the retweeting and automatic forwarding of potentially harmful messages, since the content of those messages could also be detected automatically and an alert prompted.
Case 1: S is a neo-Nazi. S seeks to post anti-Semitic HS on his own public social media feed (i.e., S = DR). S receives an alert and consents to the message appearing; therefore, the message is posted and can be viewed by IRs who happen to read S’s feed.
Case 2: S and DR are both neo-Nazis. S seeks to post anti-Semitic HS on DR’s public social media feed. S and DR both receive an alert and consent to the post; therefore, the message is posted and can be viewed by IRs who happen to read DR’s feed.
How a quarantining system handles these cases would depend on the implementation method adopted. If either the Multipartite or Elective Methods were adopted, then IRs who had chosen to be fully protected from HS would receive an alert, and could then decide whether or not to view the posted message. In practice, the system could be implemented in a similar manner as Parental Guidance Locks on internet streaming/catch-up services such as BBC iPlayer (2019). In other words, parents could set up a social media account for their child and specify appropriate HS protection levels. If an alert were triggered, then the parents could enter a password to access and assess the quarantined post, to determine whether or not it should be posted on their child’s social media feed.
While the emphasis so far in this discussion has fallen exclusively on HS, it is important to recognise that extremist groups often deploy a wide range of linguistic strategies when seeking to attract and recruit followers who may be sympathetic to their ideologies—and HS is only a part of this. Indeed, more subtle and complex techniques may not display explicit HS properties at all, the use of positive, euphemistic, and/or more abstract rhetoric may appeal to potential adherents just as effectively as HS. For instance, the terrorist group ISIS frequently used triumphant terminology like ‘brothers rise up’ and ‘claim victory’ in their recruitment strategies on social media (see, e.g., Awan 2017), while the controversial British far-right activist Tommy Robinson has repeatedly claimed he is attacking the ‘fascist ideology’ of Islam rather than Muslims specifically (Union Magazine 2015). Clearly, the recruitment functions of such discourses need to be examined attentively, and their diverse and multifaceted nature goes far beyond the specific task of HS detection for the purposes of quarantining.
This article has explored the problem of online text-based HS, and the ethical implications of the various strategies for dealing with this problematical phenomenon have been discussed. The current TL2 regulatory frameworks were described, and some of the problems resulting from this kind of reactive self-regulation were outlined. Crucially, it was suggested that they are undesirably ineffectual, especially when viewed from a patient-oriented ethical perspective. State-of-the-art methods for the automatic detection and classification of HS were then summarised, before the main emphasis shifted to the way in which these technologies might eventually be used when their performance has improved. In particular, quarantining has been explored as a viable approach that strikes an appropriate balance between libertarian and authoritarian tendencies. In this framework, HS is treated like a form of malware, and while the senders of the HS are not censored in a crude unilateral matter, the recipients of the HS are given the agency to determine how they wish to handle the HS they have received. This approach potentially preserves freedom of expression, but the harm caused by HS is still controlled in a safe fashion by those most directly affected.
By designing safer and more secure online products and services, the tech sector can equip all companies and users with better tools to tackle online harms. We want the UK to be a world-leader in the development of online safety technology and to ensure companies of all sizes have access to, and adopt, innovative solutions to improve the safety of their users. (HM Government 2019, p. 77)
Given this, the importance of technological infrastructures that can facilitate the development of safer and more ethical online products should be all too apparent. And handling online HS convincingly and effectively is simply one part of a complex whole.
This blatantly misogynistic meme combines the modes of image and writing, and the offensive nature of the whole arises from the juxtaposition of the parts. Taken in isolation, the text is not necessarily problematical: it is not inherently sexist in-and-of itself, and, in certain contexts, it could presumably constitute benignly humorous advice about sexual health and family planning. However, when presented with this particular image, the meaning of the text changes, and the violently misogynistic, connotations become apparent. Nonetheless, the multimodal character of the whole means that no current text-based HS-detection systems would classify the meme as being an instance of HS. Despite the conspicuous nature of this problem, research programmes focused on multimodal approaches to HS detection and classification have only just started to emerge (Hosseinmardi et al. 2015; Zhong et al. 2016).8 Clearly, there is much that remains to be accomplished.
The most notable exception being the United States of America, which currently has no HS legislation because of the concern that it would contravene the First Amendment.
Approximately 15,000 content moderators currently work for Facebook (see Newton 2019).
The reporting tools can be found here: https://support.google.com/youtube/answer/2802027.
This example is taken from/pol/: https://archive.4plebs.org/pol/thread/166140233 (April 2018).
This example is a tweet by the comedian Kevin Hart (https://variety.com/2018/film/news/kevin-hart-responds-homophobic-tweets-1203083215/). The controversy surrounded these tweets caused him to resign as the Oscars host in 2019.
Elementary quarantining methods were developed in the aftermath of the Morris worm in 1988 (see Nazario 2004, pp. 39–40).
For a brief overview, see Schmidt and Wiegand (2017a), especially Sect. 3.8.
Research on this paper is funded by the Humanities and Social Change International Foundation.
- Allan, R. (2017, June 27). Hard questions: Who should decide what is hate speech in an online global community? Facebook Newsroom. Retrieved January 28, 2019 from https://newsroom.fb.com/news/2017/06/hard-questions-hate-speech/.
- Aristotle. (1984). Nicomachean ethics 5.2.1129b23. In Aristotle, [4th century BCE], vol. 2.Google Scholar
- Bates, L. (2013, May 29). The day the Everyday Sexism Project won - and Facebook changed its image. The Independent. Retrieved April 12, 2019 from https://www.independent.co.uk/voices/comment/the-day-the-everyday-sexism-project-won-and-facebook-changed-its-image-8636661.html.
- BBC. (2019). What is the Parental Guidance Lock? Retrieved April 12, 2019 from https://www.bbc.co.uk/iplayer/help/how-to-guides/parental-guidance/parental_guidance_info.
- BBC News. (2019, March 28). Facebook to ban white nationalism and separatism. Retrieved April 12, 2019 from https://www.bbc.co.uk/news/world-us-canada-47728471.
- Benesch, S. (2019). The Dangerous Speech Project. Retrieved April 12, 2019 from https://dangerousspeech.org/.
- Berlin, I. (1969). Two concepts of liberty. In I. Berlin (Ed.), Four essays on liberty (pp. 118–172). Oxford: Oxford University Press.Google Scholar
- Butler, J. (1997). Excitable speech: A politics of the performative. Abingdon: Routledge.Google Scholar
- Dadvar, M., Trieschnigg, D., Ordelman, R., & de Jong, F. (2013). Improving cyberbullying detection with user context. In Proceedings of the 35th European Conference on IR Research: Advances in Information Retrieval. Moscow: European Conference on IR Research.Google Scholar
- Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech detection and the problem of offensive language. In Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM), . (pp. 512–515). Atlanta: ICWSM.Google Scholar
- Dworkin, R. (2009). Foreword. In I. Hare & J. Weinstein (Eds.), Extreme speech and democracy. Oxford: Oxford University Press.Google Scholar
- European Commission. (2019, March 18). Countering illegal hate speech online #NoPlace4Hate. European Commission Justice and Consumers Newsroom. Retrieved April 12, 2019 from https://dangerousspeech.org.
- Facebook. (2019). Community standards Part III: Objectionable content. Retrieved April 12, 2019 from https://www.facebook.com/communitystandards/objectionable_content.
- Facebook. (2019). Community standards: Hate speech. Retrieved February 18, 2019 from https://www.facebook.com/communitystandards/objectionable_content.
- Facebook. (2019). Zero friction future. Retrieved April 12, 2019 from https://www.facebook.com/business/m/zero-friction-future.
- Gadde, V., Harvey, D. (2018). Creating new policies together. Twitter Blog, 25 September 2018. Retrieved February 18, 2019 from https://blog.twitter.com/en_us/topics/company/2018/Creating-new-policies-together.html.
- Gao, L., Kuppersmith, A., & Huang, R. (2017). Recognizing explicit and implicit hate speech using a weakly supervised two-path bootstrapping approach. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1)Google Scholar
- Google. (2019). YouTube help: Hate speech policy. Retrieved April 12, 2019 from https://support.google.com/youtube/answer/2801939?hl=en-GB.
- HM Government. (2019). Online Harms White Paper, April 2019. Retrieved April 12, 2019 from https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/793360/Online_Harms_White_Paper.pdf.
- Hosseinmardi, H., Mattson, S. A., Rafiq, R. I., Han, R., Lv, Q., & Mishra, S. (2015). Detection of cyberbullying incidents on the Instagram social network. Computing Research Repository (CoRR) https://arxiv.org/abs/1503.03909.
- Kjierland S, Baumgartner P (2018) Anti-spam and anti-malware protection [EOP]. Microsoft Docs, Retrieved April 15, 2019 from https://docs.microsoft.com/en-us/office365/servicedescriptions/exchange-online-protection-service-description/anti-spam-and-anti-malware-protection-eop.
- Kress, G. (2010). Multimodality: A social semiotic approach to contemporary communication. London: Routledge.Google Scholar
- Lawrence, C. R. (1990). If he hollers, let him go: Regulating hate speech on campus. Controversies in Constitutional Law, 39(3), 431–483.Google Scholar
- Mathew, B., Tharad, H., Rajgaria, S., Singhania, P., Maity, S. K., Goyal, P., & Mukherje, A. (2018). Thou shalt not hate: Countering Online Hate Speech. ICWSM 2019. https://doi.org/10.13140/RG.2.2.31128.85765.
- Matsakis, L. (2018). Twitter releases new policy on ‘dehumanizing speech’. Wired, 25 September 2018. Retrieved February 18, 2019 from https://www.wired.com/story/twitter-dehumanizing-speech-policy/.
- Matsuda, M. J. (1993). Public response to racist speech: Considering the victim’s story. In M. J. Matsuda, C. R. Lawrence III, R. Delgado, & K. Williams (Eds.), Words that wound: Critical race theory, assaultive speech, and the first amendment (pp. 17–52). New York: Routledge.Google Scholar
- Nazario, J. (2004). Defense and detection strategies against internet worms. Boston: Artech House.Google Scholar
- Newton C (2019) The trauma floor: The secret lives of Facebook moderators in America. The Verge, 25 February 2019, Retrieved April 01, 2019 from https://www.theverge.com/2019/2/25/18229714/cognizant-facebook-content-moderator-interviews-trauma-working-conditions-arizona.
- Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web, (pp. 145–153).Google Scholar
- Qian, J., El Sherief, M., Belding-Royer, E. M., & Wang, W. Y. (2018). Hierarchical CVAE for fine-grained hate speech classification.In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (pp. 3550–3559).Google Scholar
- Reddit. (2019). Account and community restrictions: Quarantining subreddits. Retrieved August 12, 2019 from https://www.reddithelp.com/en/categories/rules-reporting/account-and-community-restrictions/quarantined-subreddits.
- Schmidt, A., Wiegand, M. (2017). A survey on hate speech detection using natural language processing’. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (pp. 1–10).Google Scholar
- Schmidt, A., Wiegand, M. (2017). A survey on hate speech using natural language processing. In Proceedings of the Fifth International workshop on natural language processing for social media (pp. 1–10).Google Scholar
- Simon, S., Bowman, E. (2019). Propaganda, hate speech, violence: The working lives of Facebook’s content moderators. NPR, 02 March 2019, Retrieved April 01, 2019 from https://www.npr.org/2019/03/02/699663284/the-working-lives-of-facebooks-content-moderators.
- Sindoni, M. G. (2018). Direct hate speech versus indirect fear speech: A multimodal critical discourse analysis of the Sun’s editorial “1 in 5 Brit Muslims’ sympathy for jihadis”. Lingue Linguaggi, 28, 267–292.Google Scholar
- Sumner, L. W. (2011). Criminalizing expression: Hate speech and obscenity. In J. Deigh & D. Dolinko (Eds.), The Oxford handbook of philosophy of criminal law (pp. 17–33). Oxford: Oxford University Press.Google Scholar
- Twitter. (2019). Hateful conduct policy. Retrieved April 12, 2019 from https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy.
- Union Magazine. (2015). Tommy Robinson interview for UNION magazine Edition#2. Retrieved August 18, 2019 from https://help.twitter.com/en/rules-and-policies/hateful-conduct-policyhttps://www.youtube.com/watch?v=5dRiBRM-BD8.
- Uscinski, J. E., DeWitt, D., & Atkinson, M. D. (2018). A web of conspiracy? internet and conspiracy theory. In A. Dyrendal, D. G. Robertson, & E. Asprem (Eds.), Handbook of conspiracy theory and contemporary religion (pp. 106–130). Leiden: Brill.Google Scholar
- Waldron, J. (2017). The conditions of legitimacy: A response to James Weinstein. Constitutional Commentary, 32, 697–714.Google Scholar
- Warner, W., Hirschberg, J. (2012). Detecting hate speech on the World Wide Web. In Proceedings of the 2012 Workshop on Language in Social Media (LSM 2012), (pp. 19–26).Google Scholar
- Weinstein, J. (2017). Hate speech bans, democracy, and political legitimacy. Constitutional Commentary, 32, 527–583.Google Scholar
- Weinstein, J., & Hare, I. (2009). General introduction: Free speech, democracy, and the suppression of extreme speech. Past and Present. https://doi.org/10.1093/acprof:oso/9780199548781.003.0001.CrossRefGoogle Scholar
- Zhong, H., Li, H., Squicciarini, A. C., Rajtmajer, S. M., Griffin, C., Miller, D. J., & Caragea, C. (2016). Content-driven detection of cyberbullying on the Instagram social network. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, (pp. 3952–3958).Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.