A crucial aspect of digital platforms argues Tarleton Gillespie (2018) is that they all must moderate while disavowing moderation. Platforms that rely on users and their information must take steps to ensure that information deemed problematic is identified and controlled. This section begins with a brief discussion of the definition and current understandings of hate speech, the origins of content moderation, and the role of AI. This discussion relies on a critical reading of publicly available materials, including posts, speeches, and interviews given by senior platform executives, media articles and transparency reports. We identify the following issues: a failure to identify racism as a structure of power and address it as such; a lack of any substantial contribution by racialised people; and the labour conditions of human moderators. We then find that AI systems used in content moderation repeat what human moderators are doing at scale, while also detracting from focusing on the problematic definitions of racist hate speech and the politics of race and racism.
Defining Hate Speech
In the days of ‘move fast and break things’ — Mark Zuckerberg’s famous quote for Facebook that has since become defunct — digital platforms rolled out their products with little concern for their potential impact. In these early days, Facebook did not have a moderation policy, other than excluding pornography, and did not even have a reporting mechanism for users to report inappropriate contents (Cartes, in Viejo-Otero, 2021). When such a functionality was built, both Facebook and Twitter, adopted an ‘operational’ approach to moderation policy, which emerged from the contents that users were reporting. Teams of in-house moderators would meet and discuss the reports, determine which should be removed or not, and then use these decisions to draft and update their relevant policies (Viejo-Otero, 2021). This dynamic approach to content policy, driven by what users reported, eventually led to the formulation of a set of clearly stated community guidelines (YouTubeFootnote 2) or community standards (FacebookFootnote 3). These were also informed by existing legal frameworks, especially laws governing illegal contents.
The approach to hate speech is very similar across the platforms of Facebook, YouTube and Twitter. The policies on hate speech are loosely based on the main legal international instruments, conventions and declarations. The two most directly relevant include the tri-partite International Bill of Human Rights, comprised of the Universal Declaration of Human Rights (UDHR, 1948), the International Covenant of Civil and Political Rights (ICCPR, 1966), and the International Covenant of Economic, Social and Cultural Rights (ICESCR, 1966); and the International Convention on the Elimination of All Forms of Racial Discrimination (ICERD, 1965).Footnote 4
The current platform policies on hate speech reflect the main points of these instruments and revolve around the notion of protected characteristics. Facebook, for instance, refers to race, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity and serious disease as protected characteristics. YouTube also includes age, immigration status, victims of a major violent event and their kin and veteran status. Content that incites to violence, discrimination or hatred against people who fall under these categories is considered hate speech and will be removed.
These definitions provide the framework within which platforms develop relevant policies. In general, neither the platforms, nor other institutions, for example the European Union, wish to expand, change or otherwise modify the definitions. This is clear in the Code of Conduct, which is a voluntary instrument signed by all the major platforms and the European Commission. The Code of Conduct has four main stipulations: (i) timely removal of illegal hate speech; (ii) hiring more content moderators; (iii) working with civil society in the removal of illegal hate speech (iv) and implementing a monitoring process to check for compliance. The recent Digital Services Act (European Commission, 2020) and the Audio-Visual Media Services Directive (AVMSD, 2018) that seek to regulate platforms do not provide any further definitions, relying on the existing EC Framework Decision of 2008 for illegal hate speech.
The definition of what constitutes hate speech therefore remains surprisingly unchanged since the mid-1900s (Siapera et al., 2017). In platforms, race is listed among many other protected characteristics, with no attention paid to the specificity of the experience of racism and racial exploitation. Indeed, the complete absence of any reference to the concrete historical circumstances that gave rise to race and racism as discussed by Quijano and many others, is striking. There is, for example, no distinction between races, so that contents that are against black people are equivalent to contents that are against white people. There is therefore no understanding or appreciation of the differences between the oppressed and the oppressors (Siapera & Viejo-Otero, 2021). The direct outcome of this race-blind approach in the definition of hate speech is that by equating oppressors and oppressed, it ends up privileging the former.Footnote 5 In other words, if the definition of what constitutes hate speech cannot in principle make a distinction between racist contents and criticisms of whiteness, it is inevitable that whiteness is protected and the system of power it represents remains intact. Additionally, because there is no focus on sharpening the definition, all the emphasis by the platforms (and the European Commission) is placed on the enforcement of policies for assessing and potentially removing contents, and the effectiveness of relevant measures. And it is at this level that the question of moderation enters the picture.
Human Moderation
The moderation of content on platforms is a key activity, as Gillespie (2018) has argued, because it effectively creates and maintains a product that is then sold to users. It is part of platforms’ brand identity, which they then use to attract and maintain users and engagement. The decisions, for example, by both Facebook and YouTube to exclude sexually explicit/pornographic contents or graphic violence created platforms meant to have a wide appeal. Content moderation is therefore an essential and structural feature of platforms. This began as human labour, as teams of human moderators considered contents that were flagged as potentially containing hate speech or other categories of problematic contents. This human moderation, in turn, raised three important issues concerning: (i) potentially subjective decisions lacking standardisation; (ii) the labour practices and working conditions for moderators; (iii) the psychological costs of continuous exposure to hateful contents.
When in 2017 The Guardian and other major newspapers leaked the materials used by Facebook to moderate contents, journalists noted the complexity of rules and the inconsistencies in their policy. An example used by The Guardian, is the distinction between credible and non-credible threats: “Remarks such as “Someone shoot Trump” should be deleted, because as the then head of state he was in a protected category. But it can be permissible to say: “To snap a bitch’s neck, make sure to apply all your pressure to the middle of her throat”, or “fuck off and die” because they are not regarded as credible threats.” (Hopkins, 2017, non-paginated). Moderators then must make decisions on contents based on rules that are difficult to interpret and apply and that may be modified often to include more examples of clarifications. A decision that a piece of content constitutes hate speech may have to rely on a content moderator’s specific interpretation of the content and the policy. Additionally, Facebook and other platforms’ policies are valid globally, regardless of cultural context (Gillespie, 2018). This means that moderators in different countries will have to understand and apply these policies in the same way, making any standardisation problematic.
These difficulties are further compounded by the labour conditions of moderators who are almost always externally contracted workers. They have certain targets to meet, i.e., contents that they must review, within a certain period. In The Guardian materials, it is estimated that moderators often have around 20 s to decide (Hopkins, 2017). Roberts (2019) estimates that there are some 100,000 content moderators across the world, and that they are in low paid, precarious or short-term contracts, working for companies that provide services to the platforms rather than for the platforms themselves. Their work is to review content that has been flagged as inappropriate and decide as soon as possible to move to the next piece of content. The job is of low status, and moderators are not expected to have any skills other than adequate knowledge of the language of the contents they moderate. A quick search of content moderator adverts on LinkedIn shows that this is an entry level position. For example, an advert for a Dublin-based content moderator in Arabic lists as mandatory criteria a ‘high school diploma or equivalent’ and at least a B2 level of Arabic, and while ‘affinity and cultural awareness of political and social situations regarding the relevant market’ are desirable, they are not required. These labour conditions as Roberts (2019) observes are exploitative. While content moderators constitute the unseen labour that is required to sustain the platforms, they enjoy none of the prestige or benefits that Facebook, Google or Twitter employees have. Roberts (2019: 70) observes that content moderators engage in “digital piecework” and are not offered any protections and benefits; for example, US content moderators were not given any health insurance in contrast to those directly employed by the platforms.
Roberts (2019) further exposed the toll of dealing with toxic contents on the mental health of content moderators. As one of her informants put it: “I can’t imagine anyone who does [this] job and is able to just walk out at the end of their shift and just be done. You dwell on it whether you want it or not” (Max Breen, quoted in Roberts, 2019: 115). Confronted with violent, explicit, gross, hateful contents hour after hour and day after day has been linked to post-traumatic stress disorder (PTSD). In 2018 Facebook paid $52 m in compensation to almost 10,000 content moderators in the US who had suffered PTSD (BBC, 2020). While platforms and the companies that manage content moderation on their behalf (for example, Accenture, CPL/Covalen, Cognizant) provide ‘wellness supports’, content moderators feel that their mental health is not adequately protected, and that often companies evade any responsibility for this through non-disclosure agreements and even getting workers to sign waivers (RTE, 2021).
Yet despite these problems that have received extensive publicity, platforms are still bound to moderate their contents. It is in the context of public scrutiny in the media and pressure by international bodies, such as the European Commission, that the shift to AI occurred.
AI Based Automated Moderation
While all platforms use automated content moderation, Facebook is the one that has been more transparent about it and has signalled very clearly the intention to use artificial intelligence. Mark Zuckerberg did this himself in the Blueprint for Content Governance and Enforcement (Zuckerberg, 2018). In this and other documents, Zuckerberg makes the case that accuracy and consistency are two key issues for content governance; that the scale of content posted on Facebook cannot be dealt with exclusively by human labour; and that repetitive tasks are better performed by computers (Zuckerberg, 2018; Newton, 2019). Additionally, AI systems can provide detailed metrics on content that was ‘actioned’, i.e., content that was flagged and led to a decision, and this is an important parameter that fulfils the demands for transparency and efficiency of content moderation. Finally, as digital platforms operate within the ideology and value system of Silicon Valley, providing technologically innovative solutions is their preferred route (c.f. Barbrook & Cameron, 1996; Morozov, 2013). In short, AI provides a “desirable, inevitable, unavoidable” (Gillespie, 2020) solution to the problems posed by content moderation and the human labour involved.
Elaborating on these points, firstly, a platform such as Facebook hosts billions of piecesFootnote 6 of content per day, making content moderation at this scale a task that cannot be dealt by humans alone. According to Zuckerberg (2018: non-paginated), because of advances in Artificial Intelligence but also “because of the multi-billion-dollar annual investments we can now fund” it is possible to “identify and remove a much larger percent of the harmful content — and we can often remove it faster”. In his 2018 testimony to the US Congress, Zuckerberg argued that “over the long term, building AI tools is going to be the scalable way to identify and root out most of this harmful content” (cited in Harwell, 2018). In the same testimony he referred to AI more than 30 times, alluding to the importance given to AI within Facebook. According to Joaquin Quiñonero Candela, Facebook’s director of Applied Machine Learning, the increased importance of AI is indicated by the physical location of the FAIR (Fundamental AI Research) and AML (Applied Machine Learning) teams next to Mark Zuckerberg’s own office, at Building 20, the main office at the Menlo Park headquarters (Hao, 2021).
Secondly, AI moderation is seen as a tool that will help protect moderators from the emotional and mental burden of viewing and acting on toxic contents. Specifically, Mike Schroepfer, Facebook’s Chief Technology Officer, explained that AI tools can “get the appropriate decisions on the content without having the same sort of emotional impact on the person viewing it. So there’s a ton of work that I can’t represent in 30 s here, but it is a key focus for all the tools teams to sort of reduce dramatically the human impact it would have by looking at this terrible stuff” (Schroepfer, cited in Newton, 2019). More recently, representatives for content moderators published an open letter saying that “management told moderators that we should no longer see certain varieties of toxic content coming up in the review tool from which we work — such as graphic violence or child abuse, for example” (cited in Foxglove, 2020).
But how are these systems deployed and what do they do? Facebook began using AI systems for proactive moderation — that is, for picking up potentially problematic contents without users flagging them — in 2016. Facebook’s Fundamental AI Research (FAIR) developed and refined their own in-house systems, such as Deep Text and FastText (an open-sourced library for text representation and classification). Their systems can be used for text and images, as well as for text on images and videos, while they have also developed models for different languages. Their system XLM-R (RoBERTa) works across 15 languages; their new system Linformer, introduced in late 2020, is a more efficient and precise classifier while another system (Reinforcement Integrity Optimizer or RIO) has been developed for optimizing hate speech classifiers that automatically review all content that gets uploaded on Facebook and Instagram (Schroepfer, 2021).
In a typical application of AI, Facebook deploys a model to predict whether something is hate speech based on the extent of its similarity with contents previously identified as having violated existing policies; then, another system determines the action to be taken, for example, to delete it, demote it, or send it for human review. Their newest system RIO has improved these processes by training the classification systems based on the performance not only of the prediction (how accurately it detected hate speech) but also on how successful the enforcement was (for example, how many people were protected from seeing the content) (Facebook, 2020). While RIO improved the overall system performance and efficiency of the training data, Linformer enabled training to consider contextual features, and XLM-R to consider additional languages (Schroepfer, 2021). Facebook’s transparency reports indicate that together these systems have an impressive rate of success. In the final quarter of 2020, 97.1% of the hate speech contents that were removed were proactively detected by Facebook’s AI systems, before they were reported and before they were seen by any users (Facebook Transparency Report, 2021). Overall, Facebook estimates that the prevalence of hate speech on the platform dropped from an estimated 0.11% in the third quarter of 2020 to 0.07% in the fourth quarter (this means that out of 10,000 pieces of content 11 and 7 were estimated to contain hate speech for the two quarters respectively). Facebook attributes this drop mainly to improvements in proactive moderation through the introduction of the systems discussed above.
While Facebook reports astonishing successes, other platforms have mixed results. Because of Covid-related restrictions and the shift from working from home, content moderator teams were furloughed, and their number decreased. This led YouTube to rely more on AI systems in the last few months. According to a report by the Financial Times, the systems removed proactively 11 million videos in the second quarter of 2020, twice the usual rate (Barker & Murphy, 2020). The accuracy of the removals was also lower, as about 50% of the removal appeals were upheld when AI was responsible for the removal, compared with less than 25% of those upheld when decisions were made by human moderators.
While researchers have questioned the extent to which AI can consider the context and nuances of language (Caplan, 2018), the response of platforms such as Facebook has typically been that the technology is constantly improving and that in combination with human moderation these systems will eventually be highly effective in recognizing hate speech (Schroepfer, 2021). However, AI for content moderation has been criticised not only in terms of its accuracy, but on a more conceptual basis. Gillespie (2020) argues that platforms are involved in a discursive justification of AI in content moderation in a way that becomes self-fulfilling and meets the platforms’ own ambitions for further growth: “platforms have reached a scale where only AI solutions seem viable; AI solutions allow platforms to grow further.” Gorwa et al. (2020) argue that AI introduces further opacity into moderation decisions, because the ways in which AI algorithms work is neither clear not accountable. Additionally, Gorwa et al. argue that AI is presumed to be un-biased, but this in fact obscures the ways in which certain viewpoints are privileged; they note that classifiers operate based on certain formulations of toxic, racist or misogynist content, and therefore ignoring others. A third critical point proposed by Gorwa et al. (2020) concerns the de-politicisation of the politics of content moderation; in proffering AI as an answer to problems of content moderation, platforms position themselves as invisible infrastructures and hide the political decision making behind the types of contents deemed acceptable or not. These are much more fundamental problems that cannot be addressed via technological improvements. Such arguments open a space for a critique of the role of AI in content moderation and its relationship with existing power systems. However, they tend to be general points that do not consider the effects of the shift to AI for race and racism more specifically. The next section develops a critique based on decolonial values.