1 Introduction

The internet is a source of information widely available to all as consumers or contributors. Online platforms are a source of news updates and general information gathering, amongst other things. Platforms like Facebook and Twitter (X) make it easy for their users to access a range of information, from what a friend had for lunch to the intricate details and nuances of a particular political discourse. As a result, the internet has become a crucial epistemic space for most. Think of two relatively recent global events to illustrate this point – the COVID-19 pandemic and the protests following the killing of George Floyd. These events unfolded both in the physical world and online, particularly on social media platforms. The pandemic led to a surge in online information, including misinformation, while the George Floyd protests sparked global activism, amplified by social media. Partly due to the physical restrictions posed by the COVID-19 lockdowns, most people gathered knowledge about the pandemic and the Black Lives Matter protests primarily online through various social media platforms. People relied on social media for information on both events – from news platforms, friends and family, and online personalities. More than ever, it was apparent how much social media platforms have revolutionised how we connect, communicate, and consume information. These platforms became integral to our epistemic lives.

When we get limitless access to information, it is arguably a positive for us as epistemic agents. This point highlights one of the putative positives of the internet – its ability to provide vast amounts of information at our fingertips. A simple search allows us to access countless articles, posts, and expert opinions on any topic. This accessibility has democratised knowledge (Pfister, 2011: 220), which is a welcome development (Coady, 2012; Munn, 2012). The democratisation of knowledge can simply be understood as the process of ‘bringing a wider range of people into the exchange of ideas, or as introducing new processes of information dissemination’ (Mößner & Kitcher, 2016: 1).

While this democratising potential of the internet fosters freedom of expression, amongst other epistemic benefits, it also allows misinformation, propaganda, and fake news to spread rapidly. This information we are exposed to on social media platforms could – and often does – also negatively impact us as epistemic agents. Without proper verification mechanisms, it can be challenging for users to discern accurate information from falsehoods.

Hence, being epistemic agents within these social media platforms makes us vulnerable to vicious epistemic practices and contents (Frost-Arnold, 2016). The more obvious ones could include misinformation, hate speech, and trolling. Recognising the pernicious nature of some content online has prompted major social media platforms like Facebook, Twitter, and Instagram to employ various content moderation systems that flag, or block information shared on their platforms that is factually incorrect or violates community standards, amongst other considerations. Content moderation is ‘the organized practice of screening user-generated content (UGC) [online] …in order to determine the appropriateness of the content for a given site, locality, or jurisdiction’ (Roberts, 2017: 1).

Employing content moderation is a step in the right direction. However, it has its downsides – one apparent one being that epistemic authority is now placed on companies whose primary aim is profitability. There are other issues associated with the practice of content moderation. These range from the exploitative conditions that content moderators have to work in, to the traumatic nature of content moderation (Frost-Arnold, 2023; Gillespie, 2018a: 111–140, Roberts, 2019). While there are certain practical solutions to these problems (like better working conditions for content moderators), it is not clear that epistemic considerations feature in platforms’ efforts to improve content moderation. Considering the crucial role that online platforms play in our epistemic lives, this seems like a fatal oversight.

My aim in this paper is twofold. First, I explore the epistemic challenges that current online content moderation models face. I categorise these challenges into challenges faced by moderators and challenges faced by users of the internet. All of these challenges affect our epistemic agency to varying degrees. Attending to these challenges ought to be a crucial consideration in formulating our methods of online content moderation. Second, I argue for an epistemic compass for online content moderation that categorises content and assigns the task of moderation to the best possible type of moderator in a way that mitigates the current epistemic challenges. Two benefits of my proposal are that it enables the internet to fulfil its potential of democratising knowledge and that it minimises the epistemic downside of placing epistemic authority in companies with motives that are not epistemic.

I proceed as follows. In Sect. 2, I go through the various forms of content moderation utilised by multiple platforms – human moderation, automated moderation and community moderation. While there are benefits to each method, they all face many challenges. In Sect. 3, I pin down content moderation challenges in epistemic terms. This epistemic phrasing highlights the epistemic considerations that I argue are important if we care about epistemic ends when we moderate content online. In Sect. 4, I present my argument for an epistemic compass for online content moderation that prioritises mitigating the epistemic challenges I have shown. I conclude in Sect. 5.

2 Content Moderation

At the start of the COVID-19 pandemic, much false information about its source and various curative measures was present online. Notably, the then-president of the United States, Donald Trump, speculated that ingesting disinfectants might be an effective cure on a televised broadcast. Although he eventually tracked back on his comments, claiming he was being sarcastic ‘just to see what would happen’ (Slotkin, 2020), the damage had already been done. For instance, New York City’s poison control centre recorded 30 cases related to ingesting various household disinfectants within 18 h (ibid.). The City’s health commissioner, Dr. Oxiris Barbot, had to release a video debunking claims that ingesting bleach can cure the virus (ibid.).

In cases like this, with severe health risks, it is obvious why false information needs to be debunked swiftly. Other socially harmful cases are arguably deserving of similar swift responses. However, from this example, we see that no matter how swiftly we move to debunk false claims, it might still be a little too late. This could be either because people exposed to these false claims are not always privy to the fact that the claims have been debunked or, even if they are, they still believe the false claims because they trust the source of this incorrect information. This trust might be warranted or unwarranted.

As I have stated, one space where false information thrives is on the internet. While the ease of access to information online might be a positive for us as epistemic agents, there are nefarious actors online.Footnote 1 These actors ensure that users of various online platforms are regularly exposed to false information and other pernicious content and behaviours like hate speech, violence, spamming, and online abuse. If debunking incorrect information fails to stop its spread and potential harm in cases like the COVID-19 one above, it most certainly fails in online cases. This might be partly because of factors like the sheer volume of information we are exposed to online, the belief-reinforcing echo chambers and epistemic bubbles we get locked into, or the fact that false information is more likely to go viral online (Vosoughi et al., 2018; Menczer & Hills, 2020). Online platforms have sought to curb the spread of incorrect information by utilising various forms of content moderation.

The goal of content moderation is to remove or restrict content that is harmful, offensive, or illegal while preserving freedom of speech and promoting a healthy online community. This is not necessarily new since ‘people have been creating and enforcing rules of engagement in online social spaces since the inception of those spaces and throughout the past four decades’ (Roberts, 2019: 1). However, its importance became more apparent due to relatively recent events like COVID-19 and the Rohingya crisis (Sablosky, 2021), for example, where social media companies were accused of either failing to moderate contents effectively or silencing marginalised groups through their moderation choices.

How exactly does content moderation happen? Several approaches and tools are used, each with benefits and challenges. Some platforms rely on algorithms and automated systems to flag and remove inappropriate content, while others employ human moderators. In what follows, I give a quick rundown of three forms of content moderation used by various platforms – human content moderation, automated content moderation, and community content moderation. I show the benefits of each and the challenges they might face.

Human Content ModerationFootnote 2 is the traditional method of content moderation, where a team of human workers review and approve or reject user-generated content. These content moderators review flagged material on social media platforms and determine if it violates community guidelines. One of the primary advantages of human content moderation is the potential nuanced understanding of cultural and social contexts, allowing moderators to exercise judgment and discretion (Banchik, 2021: 1533). This potential for nuance means that more obscure speech forms like ‘dog whistles’ (Bhat & Klein, 2020: 153–160) can be picked up by human moderators. Human moderators can also provide meaningful feedback to users, educate them on platform policies, and build trust within the online community. However, some challenges associated with human moderation include scalability issues due to the vast amount of user-generated content (Roberts, 2018; Gillespie, 2018a: 74–75, Ceci, 2023, Frost-Arnold, 2023: 60), poor working conditions for moderators (Chen, 2014; Gillespie, 2018a: 121, Roberts, 2019), and exposure to graphic and disturbing content leading to mental health issues (Gillespie, 2018a: 123, Roberts, 2019, Frost-Arnold, 2023: 69–70).

Despite the theoretical benefits of human moderation, the challenges presented, especially the problem of scale and working conditions of moderators, make it practically unfeasible. The commercialised content moderation model compromises the contextual understanding offered by human moderators. While improving working conditions, training, and support services can mitigate some of the issues associated with human content moderation, scalability remains a significant obstacle. One way of addressing the problem of scale is through automation.

Social media platforms are designed to be open and scalable, allowing anyone to sign up and post content, leading to a massive volume of information that is challenging for human moderation. To address this, platforms employ algorithms and artificial intelligence (AI) for automated content moderation. Some of the key benefits of this sort of automated moderation include its scalability and ability to handle sensitive material swiftly. Automated moderation can process large amounts of content efficiently (Gillespie, 2018b), removing harmful material before it is viewed, offering a desirable alternative to human moderation.

However, challenges exist in automated content moderation. Firstly, AI struggles with contextual understanding and may produce false positives or negatives, misidentifying harmless content or allowing subtle harmful content to pass through (Gillespie, 2020; Llansó, 2020; Gorwa et al., 2020). Secondly, there is a lack of transparency in how content is moderated (Petricca, 2020), leading to confusion and frustration among users (Díaz & Hecht-Felella, 2021: 10). Moderation policies are often unclear and inconsistently applied (West, 2018: 4370), making content moderation subjective and open to interpretation. Lastly, algorithms are inherently biased due to the data they are trained on, leading to potential discrimination and perpetuation of inequalities (Sinders, 2018). The power of platforms in choosing datasets and shaping public discourse raises concerns about accountability and legitimacy (Sap et al., 2019, 1668, Frost-Arnold, 2023: 95). Despite the potential improvements, automated content moderation is considered fundamentally flawed and limited, with human moderators remaining essential in certain areas. Many platforms employ a hybrid approach, combining human and automated moderation, yet challenges persist in addressing the individual issues highlighted. Community content moderation is also utilised to varying degrees by specific platforms.

Community moderation involves empowering platform users to moderate content themselves by reporting violations, providing feedback on flagged content, and participating in moderation programs. This can potentially foster a sense of user participation and responsibility. Contextually appropriate moderation is a significant benefit of community moderation as users shape community guidelines and rules, building trust and accountability. Platforms like Reddit utilise self-governance, allowing users to upvote or downvote posts, ensuring the visibility of relevant content.

However, challenges exist. Abuse and manipulation can potentially arise from false content reports, particularly through flagging. Flags, often with limited information, undergo review by moderators or AI, introducing bias and manipulation risks (Frost-Arnold, 2023: 50). Bad actors may exploit flagging to remove innocent content, especially targeting marginalised communities. Unintentional biases, influenced by stereotypes, also impact whose speech is flagged (ibid.). Additionally, reliance on unpaid and unskilled labour for moderation poses scalability issues. Volunteers face overwhelming workloads, leading to difficulties in reviewing and responding to reports, as well as facing emotional strain from exposure to harmful content. This brings us back to the problem of psychological trauma that human content moderators are exposed to.

Before moving on to the next section, a quick recap of this section would be helpful. I started by briefly motivating why content moderation happens and is important. This is because we are constantly exposed to harmful content, and content moderation aims to mitigate this exposure. Platforms have used three methods to moderate content – humans (individuals), automated systems, and communities. While these forms of content moderation are beneficial to varying degrees, they all face significant challenges that have been discussed in the literature. I have highlighted some of these benefits and challenges. Some challenges cut across different forms of content moderation, and so do the benefits. However, no one form of content moderation seems like a worthy trade-off in place of the others. That is probably why different platforms rely on hybrid forms of content moderation. Perhaps the best form of content moderation might require harnessing the best features of all three forms of content moderation.

3 Challenges of Content Moderation in Epistemic Terms

In this section, I outline some of the epistemic challenges that accompany current models of online content moderation. Frost-Arnold (2023), in her analysis of online content moderation, sheds light on some of the epistemic injustices that might come from and happen because of online content moderation. They can be categorised under epistemic exploitation, ‘epistemic dumping’, and epistemic silencing. In this section, I utilise Frost-Arnold’s arguments as a starting point for my arguments that expand on the epistemic harms associated with online content moderation. I distinguish between when these harms affect the epistemic agency of internet users and when they affect the epistemic agency of content moderators. This is not to say that the epistemic harms I highlight are distinct to either group. There are overlaps. I think that drawing this distinction is important for two reasons. Firstly, it situates the primary epistemic harms experienced by content moderators and users of the internet. Secondly, it gives us a clearer roadmap on what to focus on in each case if we aim to ameliorate these harms associated with online content moderation.

Starting with the harms to moderators, I look at two kinds of epistemic harms experienced by online content moderators argued for by Frost-Arnold (2023) – epistemic exploitation and epistemic dumping. These do not exhaust all the possible epistemic harms experienced by online content moderators. I have chosen to go with these two because the first is intuitively close to the more general harms experienced by content moderators that I highlighted in Sect. 2, and the second represents an insurmountable epistemic challenge to online content moderation.

Drawing on Gaile Pohlhaus’ (2017) three categories of epistemic injustice and Ruth Sample’s (2003) notion of exploitation, Frost-Arnold (2023) contends that content moderators face epistemic exploitation when their knowledge and expertise are undervalued and appropriated by the platforms they work for. Content moderation, as stated, involves the examination of user-generated content to ensure compliance with platform guidelines, and moderators play a crucial role in enforcing this. Despite their specialised knowledge, content moderators often receive inadequate compensation, and even if financially compensated, their epistemic labour is obscured by secrecy, as platforms restrict them from sharing their experiences (Gillespie, 2018a: 198, Roberts, 2019: 3, Criddle, 2021).

The invisibility of content moderators’ work, coupled with the lack of acknowledgement or appreciation, contributes to their epistemic exploitation. This exploitation shares similarities with Nora Berenstein’s (2016) concept of epistemic exploitation as unacknowledged and uncompensated emotional labour coerced from marginalised individuals. Content moderators, often from marginalised communities, endure the emotional toll of sifting through distressing content without fair compensation or adequate support systems, perpetuating a cycle of exploitation. Ultimately, content moderators experience epistemic exploitation when their specialised knowledge is undervalued, hidden, and utilised without due credit, leading to significant harm.

We have seen that content moderators are epistemically exploited when we consider the unacknowledged and uncompensated epistemic labour they endure. Their job ensures that when we use online platforms, what we encounter, for the most part, is a sanitised version of the internet. While we enjoy this somewhat sanitised version of the internet, content moderators handle the trash we are spared (Criddle, 2021). Frost-Arnold (2023: 80–86) refers to the harm they face as ‘epistemic dumping’. This occurs when content moderators, often marginalised individuals, bear the unequal burden of sorting, handling, and disposing of toxic epistemic trash. This process is considered a form of epistemic injustice, with privileged groups benefiting from sanitised online spaces without acknowledging the intellectual labour required for content moderation.

Turning to the epistemic harms of content moderation to internet users, how does content moderation, in general, affect our epistemic agency? In other words, how do content moderation practices harm us?

Content moderation involves a delicate balance between protecting users from harmful content and upholding freedom of speech. This is particularly challenging in the context of hate speech, as distinguishing between legitimate political debate and objectionable content can be difficult. While safeguarding users is crucial, fostering open discussion and free speech, especially in political discourse, is essential for a healthy democracy.

However, content moderation policies, when applied perniciously, can silence marginalised voices. An example highlighted by Guynn (2019) reveals that Facebook users identifying as Black reported experiencing ‘getting Zucked,’ where discussions about racism were flagged as hate speech.Footnote 3 This disproportionately affected Black users, hindering their ability to address racial discrimination.

Automated content moderation algorithms can exacerbate this issue by inadvertently targeting and silencing marginalised communities. Algorithms may over-identify content from these groups as violating platform policies, leading to the removal of posts discussing racism or discrimination. Biases within algorithms, intentional or not, can disproportionately impact marginalised individuals, contributing to systemic silencing (Gerrard & Thornham, 2020: 1276). Moreover, automated moderation often lacks contextual understanding, misinterpreting cultural references or expressions specific to marginalised groups as hate speech. This results in the removal of content that should be recognised as valid self-expression, contributing to cultural erasure and marginalisation.

Human content moderation, while challenging due to the vast amount of content and pressure to enforce platform standards quickly, also faces issues. Moderators, even those from marginalised backgrounds, may err on the side of caution, leading to the removal of important content for marginalised communities. This overzealous enforcement hinders discussions about systemic discrimination, social justice, and identity.

Frost-Arnold (2023: 51–59) argues for specific epistemic virtues, such as testimonial justice and hermeneutical justice, for content moderators. However, the working conditions and commercial content moderation models make attaining these virtues challenging or impossible (ibid.).

Beyond the potential silencing, another dimension of harm that users might face online due to content moderation is paternalism – specifically epistemic paternalism. Epistemic paternalism involves interfering with someone’s epistemic inquiry for their good (Ahlstrom-Vij, 2013). It encompasses the idea that some individuals or institutions may restrict or control access to certain information or ideas to promote what they consider more valuable knowledge. This process involves making decisions on behalf of others based on the assumption that the paternalistic party knows what is best for the person being influenced (Ahlstrom-Vij, 2013: 61). There are two key features of epistemic paternalism here. First is the curation of information by the paternalistic party, and the second is the control of an individual’s epistemic activities. For the first, imagine a situation where a teacher decides what material the students of a geography class must read to learn what is relevant. For the second, imagine where a parent forces their child to go to school. In the first case, the teacher prevents the students from encountering information they consider to be irrelevant while directing them to more valuable knowledge. In the second case, the parent decides on their child’s behalf. In both cases, the paternalistic party acts under the assumption that they know what is best and that you cannot help yourself without their interference.

In the context of online platforms, epistemic paternalism could manifest in various ways. For instance, online platforms might use algorithms and content curation techniques to prioritise or filter certain types of information, aiming to shape users’ experiences and guide their exposure to specific viewpoints. This happens whenever we go online, and the personalisation algorithms curate our online experience, from the targeted ads we see to the personalised search results we get. This is supposed to benefit us since it aims to make the internet more navigable and relevant to each user. This personalisation can be seen as paternalistic as the platform providers decide what content suits users and what should be hidden or de-prioritised, all for the user’s benefit.

This sort of paternalism online can be beneficial under certain circumstances. Castro et al. (2020) argue that epistemically paternalistic alterations to certain aspects of our online landscape can be beneficial in dealing with epistemically vicious situations (like exposure to fake news and misinformation) we might be subjected to online. They use the case of Facebook introducing a new metric called Click-Gap to combat fake news by demoting low-quality content in users’ news feeds (Castro et al., 2020: 34). This measure aims to prevent misinformation from going viral on the platform by assessing the gap between a website’s traffic from Facebook and its overall internet traffic. Despite the paternalistic nature of this intervention, it is considered permissible for several reasons. Firstly, it does not harm users. Rather, it protects individuals from significant harms associated with fake news (like the false stories about possible cures for COVID). Additionally, demoting fake news supports user autonomy by protecting them from misinformation that might influence their beliefs (Castro et al., 2020: 35).

So, on the one hand, we have epistemically paternalistic interventions like the one above (and content moderation more generally) in online platforms that can arguably be justified by their potential to protect individuals from harmful or misleading information, reducing the risk of manipulation and ensuring the quality of knowledge circulating online – since limiting exposure to specific content can shield users from false information, conspiracy theories, or harmful ideologies. On the other hand, epistemic paternalism through online content moderation raises concerns of epistemic importance. These concerns concern the legitimacy of platforms’ epistemic authority, harm to individuals’ epistemic autonomy, and the risk of bias. Let me go through these in more detail.

Firstly, when platforms decide to curate our online spaces and determine what contents are allowed and which need to be deleted, they assume a position of epistemic authority. However, it is not apparent that these assumed positions of authority are justified or justifiable. Think of Facebook’s pivotal role in our current knowledge economy and recall its initial purpose – a directory for college students. Nothing in Facebook’s origin suggests that it is aimed at or suitable for the pivotal epistemic role it plays today. Yet, it plays this role. Also, ‘a whistleblower report from inside Facebook revealed that they knew that their content algorithm pushed users into further radicalisation and promoted the growth of QAnon but did not ameliorate the issue’ (Aird, 2022: 9). For Facebook, their priority is keeping people on the platform.

Additionally, relying on online platforms to regulate our epistemic activities raises concerns over privatising public discourse, with private entities having significant control over the flow of information and ideas. Platforms are incentivised to remove any content deemed controversial or offensive to advertisers or governments, even if it is not harmful or offensive to users. This is dangerous, allowing powerful entities to control the narrative and silence dissenting voices. In instances when this happens, content moderation can amount to censorship. This is not ideal for us as epistemic agents, to say the least.

It could be argued that in the absence of an appropriate body to regulate our online environments, social media companies have developed significant expertise in moderating content and have various mechanisms in place that make them best placed to do this job. I agree with the argument. At the moment, social media companies represent our best shot at effectively moderating content on their platforms. This justifies their paternalistic interventions in certain cases (like Facebook’s Click-Gap example above). However, one of the central points of my argument is that these interventions through content moderation are epistemically significant to internet users. As such, epistemic (not just social or political) considerations must be central to moderation decisions.

Secondly, by illegitimately limiting information through content moderation, social media platforms may undermine individuals’ autonomy and hinder the free exchange of ideas. When this happens, users of online platforms are treated as passive recipients of knowledge rather than active epistemic agents. Restricting access to certain information limits individuals’ autonomy and hampers the potential for innovation and progress. By exerting control over the content that users can access and share, platforms effectively disempower individuals, relegating them to passive recipients of knowledge rather than active agents. When certain viewpoints or information are suppressed, users are deprived of the opportunity to critically engage with diverse perspectives and form their own informed opinions. This limitation not only stifles individual autonomy but also hampers the collective progress of society.

In cases where the moderated content is political speech, social media platforms not only undermine individual autonomy but also compromise the foundational principles of a democratic society. In a democratic framework, the free exchange of ideas is important for fostering informed public discourse and holding institutions accountable. When social media platforms unduly moderate content, they stand the risk of creating filter bubbles where individuals are insulated from dissenting viewpoints and diverse perspectives. Thus, by curating and controlling the information landscape, social media platforms risk becoming gatekeepers of knowledge, wielding disproportionate influence over public opinion and undermining the democratic ideals of transparency and openness.

Thirdly, as shown at various points above, content moderation policies may be biased, reflecting the perspectives and values of those who created them – these biased moderation decisions disproportionately affect members of socially marginalised groups. Ultimately, marginalised communities are disproportionately affected by paternalistic moderation decisions, as their voices and experiences are often dismissed or suppressed. This raises critical epistemic questions about whose knowledge is being prioritised and whose perspectives are being silenced in the name of content moderation. When content moderation is done in this way that silences, it is akin to censorship.

It is important that we note the distinction between censorship and content moderation. In a recent piece, Moses (2024) succinctly spells out some of these differences. While government censorship is often wielded as a tool to enforce political conformity and control, content moderation on social media platforms operates within a different framework (ibid.). Its primary aim is to strike a delicate balance between upholding the principles of free expression and minimising potential harm to users. Unlike censorship, which seeks to impose control, content moderation endeavours to foster inclusivity within digital spaces (ibid.). It is essential to recognise that users have alternatives in the digital realm. Unlike the blanket silencing effect of government censorship, being banned from one social media platform does not equate to being silenced entirely (ibid.).

So far, I have argued that current models of online content moderation harm moderators and users of the internet in various ways that affect their epistemic agency. If we aim to moderate online content in a way that cares about epistemic agency, we should look to ameliorate these worries. As stated in the introduction, these are not intended to encompass all the epistemic concerns for online content moderation. I have selected these partly due to their proximity to certain intuitive social problems often associated with content moderation. Let us now turn to what a positive picture of online content moderation might look like.

4 Epistemic Compass for Online Content Moderation

From the epistemic challenges of online content moderation I have just highlighted, we see that current models present insurmountable problems that affect our epistemic agency in varying ways. For example, the epistemic dumping faced by content moderators will remain if we keep moderating content in the commercialised way we do currently. The issues associated with epistemic paternalism will remain if companies with financial interests continue to dictate what the internet – an epistemic space, as I have argued – should look like. Current models of content moderation can be digital equivalents of gatekeeping that prevent the internet from attaining its potential as a space for the democratisation of knowledge. Theoretically, there are benefits to content moderation since it could act as a way of filtering out harmful or inappropriate content to enforce community guidelines on various online platforms. In a world teeming with misinformation, disinformation, hate speech, and offensive material, content moderation that does this job is essential. However, when content moderation becomes detrimental to individuals’ epistemic agency with the kinds of insurmountable challenges that I have shown, how can we moderate content to preserve its benefits and minimise its harms?

In what follows, I sketch the beginnings of a pluralistic account of moderation that preserves the benefits while minimising the harms of such moderation. I first consider an argument for the total abolishment of content moderation. While this move solves most of the epistemic challenges that are directly related to content moderation practices, there is a glaring problem with it – toxic and harmful content that still dominates the internet will flourish, along with its epistemic and moral harms. These sorts of content make the internet an undesirable epistemic space – hence the need for content moderators in the first place. The solution, I argue, is a form of content moderation that could reduce our exposure to toxic content online while mitigating most of the risks associated with current models of content moderation. This form of content moderation categorises content online and distributes the task of content moderation between human moderators, automated moderators, and community moderators in a way that plays to the strengths of each model of content moderation. This model, at the very least, provides us with the best starting point for effective content moderation that avoids epistemic injustice while enhancing the internet’s potential to democratise knowledge.

4.1 No Moderation

One might think that given the insurmountable challenges associated with content moderation that has been highlighted, our best bet will be to do away with these forms of content moderation entirely. In this section, I look at three epistemic benefits of such a move – its potential to help democratise knowledge, to resolve the epistemic challenges directly related to content moderation highlighted above, and to foster our epistemic agency. While these are desirable reasons to eliminate content moderation, the glaring problem with this move is that it allows toxic content to persist online. For now, let me go through the benefits of this move.

4.1.1 Democratising Knowledge

Eliminating content moderation would have the benefit of preserving a diverse range of perspectives online. When users are allowed to express themselves without stringent censorship, a wide spectrum of viewpoints can emerge. This diversity is essential for the democratisation of knowledge – since lots of diverse ideas will improve platforms, and marginalised voices can be heard and make a meaningful contribution to online spaces. As already argued, content moderation and algorithmic personalisation, if carried out too rigorously, can inadvertently contribute to the formation of echo chambers – insulated online spaces where individuals are exposed only to ideas that align with their existing beliefs (Nguyen 2020). Thus, by allowing a broader range of content, even if contentious, online platforms can encourage users to engage with differing perspectives and challenge their preconceived notions. This exposure to dissenting viewpoints fosters a more rounded understanding of complex issues. As a result, online platforms can facilitate more authentic and genuine online interactions. Users can express their thoughts and opinions without fear of being censored, potentially leading to honest conversations that reflect the complexities of real-world debates.

4.1.2 Resolving Epistemic Challenges

Creating an avenue for diverse voices to be heard spares marginalised people from the epistemic injustices that they experience because of current models of content moderation. Without content moderation, the current epistemic challenges caused by content moderation are eliminated. Marginalised users of the internet who have suffered from overzealous moderation policies that silence them will no longer face this issue and will have an avenue to share their experiences online. The issue of epistemic paternalism I stated initially will also be eliminated since, without content moderation, our internet space will no longer be curated by for-profit companies. Eliminating content moderation also eliminates the epistemic challenges faced by content moderators. Trivially, the epistemic dumping and epistemic exploitation that content moderators experience becomes a nonissue if there is no content moderation in the first place.

However, there is the obvious problem of misinformation that can accompany this kind of freedom. That is where the third potential benefit of eliminating content moderation kicks in.

4.1.3 Fostering Epistemic Agency

While misinformation is bad, on its own, its inevitability might create an opportunity for internet users to develop epistemic skills and virtues (‘epistemic sorting’ Worden 2019, ‘online epistemic virtues’ Heersmink, 2018) that might enable them to navigate the internet more responsibly. This can enhance responsible digital citizenship. When individuals are entrusted with greater agency over their online interactions, they are more responsible for their online activities rather than blaming it on the agenda of various platforms.

One way that internet users can do this is through ‘prebunking’. Prebunking refers to the practice of proactively addressing and debunking misinformation or false claims before they gain widespread acceptance or traction (Roozenbeek et al., 2020). In other words, it involves providing accurate information and evidence to counteract potential misconceptions before they have a chance to spread widely. It is rooted in the idea that people are more likely to believe false information if they encounter it without any context or corrective information (ibid.). By pre-emptively presenting accurate information and debunking false claims, prebunking aims to reduce the likelihood of misinformation taking hold and gaining credibility. In cases where it has been tried, people tested showed a lower likelihood of being susceptible to false information online (ibid.). Overall, prebunking is a strategy to promote critical thinking and media literacy, empowering individuals as epistemic agents to make informed decisions and resist the spread of misinformation.

4.2 Targeted Moderation

As positive as the above picture of no content moderation sounds, we know content moderation was introduced for a reason – to prevent online platforms from becoming breeding grounds for cyberbullying, hate speech, graphic violence, and other forms of offensive material. The presence of online content moderators aims to help mitigate our exposure to these harmful contents online. So, while some virtues of no content moderation might seem beneficial in tackling online vices, we also need a way of handling this other class of content we are exposed to online. How can we moderate content online to preserve the epistemic benefits of no content moderation while mitigating the challenges it faces?

In this section, I argue for a pluralistic model of content moderation that first categorises online content that needs moderating and, second, utilises the model of content moderation that best suits that content. So, rather than have a single method of content moderation for a particular platform, we have multiple methods of content moderation that address specific content on that platform. This involves classifying online content and allocating the task of moderation between human moderators, automated systems, and community moderators. This distribution leverages the unique strengths of each moderation model. Let me now turn to what this model might look like by focusing on community, automated, and human content moderation and the sorts of content that can be assigned to them.

4.2.1 Community Moderation

By community moderation, I mean a system of content moderation that empowers platform users by giving them the authority to moderate content on that platform. This is the kind of moderation model utilised by platforms like Reddit through upvoting and downvoting mechanisms. Users can collectively evaluate and rank content based on its accuracy and relevance. High-quality contributions receive more upvotes and visibility, while misleading or incorrect content is naturally pushed down. There are at least two key benefits to this model of content moderation: it ensures contextually relevant moderation, and it democratises knowledge. Let me go through these benefits.

First, community involvement in content moderation increases the chances for content moderation to be contextually relevant. Certain content may be innocuous in one context but offensive or harmful in another. By allowing community members to evaluate content within its appropriate context, platforms can reduce the likelihood of overzealous content removal or the inadvertent promotion of inappropriate material. Contextually relevant content moderation is especially crucial in promoting inclusivity and cultural sensitivity – something that bolsters the social nature of the online epistemic space.

Second, one of the fundamental aims of democratising knowledge is to ensure that a wide range of perspectives and voices are represented. Traditional gatekeepers of information may unintentionally exclude marginalised groups, perpetuating inequalities. Community content moderation lends itself to the attainment of the internet’s potential for democratising knowledge. Community content moderation provides an avenue that encourages users to actively participate in curating and validating information that dominates their epistemic space. This moves away from the current centralised form of content moderation that relies on commercial entities that run social media platforms to curate our epistemic spaces.

Looking at the benefits of no content moderation – democratising knowledge, resolving the epistemic challenges caused by content moderation highlighted, and fostering our epistemic agency – community content moderation attains these benefits but with the added advantage of community members curating their online spaces. This ensures that community members can discern what is relevant to their community and what is not. For community content moderation to work effectively, its primary class of content should be the sorts of content that require contextual understanding, such as news stories and culturally relevant materials. This creates a shift from the universalised models that dominate most commercialised content moderation practices to prioritising the interests of the community over the interests of the corporations. When utilised properly, community content moderation empowers users to actively engage in shaping the online spaces they inhabit. By involving the community in the moderation process, platforms can tap into the diversity of their user base, ensuring that content moderation is rooted in real-world contexts, cultural nuances, and shared values.

The benefit of community content moderation for contextually relevant material over the centralised content moderation model and automated moderation is that it avoids the enforcement of a single viewpoint that human moderators (in the centralised model) are prone to and the static nature of algorithms, respectively. Some outsourced individual moderators following arbitrary guidelines will not be able to moderate contextually relevant content as well as members of a community. Similarly, an algorithm that lacks the fluidity of humans will not be able to moderate content effectively in the ever-changing online landscape. While community content moderation might be beneficial in bringing the nuance necessary for the moderation of contextually relevant content, as is, it also means that internet users will still be exposed to violent imagery and other toxic content that they need to moderate. For these classes of content, I turn to automated methods of content moderation.

4.2.2 Automated Moderation

AI algorithms can be utilised to moderate clearly identifiable violent content. This includes content depicting physical violence and content like child pornography. There are two key benefits to utilising automated moderation for this class of content.

The first is that automated content moderation has the ability to process vast amounts of content in real time. This rapid analysis ensures that potentially harmful material is identified and addressed promptly, minimising the window of exposure for users. Moreover, automated moderation systems operate without fatigue, reducing the risk of psychological distress faced by human moderators. By swiftly identifying and removing violent content, platforms can create a more welcoming environment where users are protected from violent images.

The second benefit is that automated moderation systems can be programmed to provide users with instant feedback on the appropriateness of their content. This proactive approach encourages users to self-regulate and think twice before posting potentially harmful material. In the long run, this could lead to a reduction in the overall creation of violent content as users become more aware of the moderation guidelines.

For these reasons, automated moderation is best suited for clearly identifiable violent content. Community moderation will be somewhat ineffective here since for community members to moderate this class of content, it means they will have to be exposed to it in the first place. Individuals in the commercialised model of content moderation are also not suitable to moderate this class of content for reasons I have highlighted at various stages in this paper. While automated moderation might be effective in moderating clearly identifiable violent content, there is another class of harmful content online where they might fail.

4.2.3 Human Moderators

AI algorithms are ineffective in moderating harmful content like hate speech in the nuanced way they sometimes appear. They often struggle with understanding context, sarcasm, and cultural nuances, leading to both false positives and false negatives in hate speech moderation. Additionally, hate speech can evolve rapidly, adapting to language changes and cultural shifts, making it challenging for automated systems to keep up. So, for this class of harmful content, we will need a model of content moderation that understands nuance (something we get with community content moderation) but also ensures that internet users are not exposed to this class of harmful content.

This means that regardless of its undesirability, some level of commercialised human content moderation is necessary for the moderation of hate speech to introduce a vital layer of nuance and context that automated systems often lack. Trained human moderators can accurately discern the intention behind a statement, differentiate between legitimate expressions of opinion and hate speech, and consider the historical and cultural context in which a piece of content is posted. They are better equipped to handle complex cases that involve satire, irony, or subtle forms of hate speech that might elude algorithms. To identify what counts as hate speech and how it should be moderated by platforms, we can adopt a dignity-based framework. That is, platforms should employ an ethical framework that emphasises the inherent worth and value of every individual. Applying this principle to content moderation requires acknowledging the humanity of all users and recognising their entitlement to a safe and respectful online environment. Hence, we can pick out hateful content that needs to be moderated but still maintain an online space that does not marginalise members of already marginalised groups.

The benefit of utilising (commercialised) human content moderators for this class of content over community or automated moderators is that human content moderators have the nuance necessary to identify this kind of content, and utilising them instead of relying on the community ensures that internet users are not exposed to harmful content online. If we are to do this, it needs to be done in such a way that it minimises most of the harm to human content moderators that I identified in Sects. 2 and 3. This includes making sure that moderators are appropriately trained, properly renumerated, have the necessary support structures to do their work, and are not required to moderate so much content that they prioritise speed over quality and lose the nuance they provide.

* * *

It is worth noting that while my proposal aims to leverage the unique strength of each method of content moderation, it does not eliminate its weaknesses. Rather, it creates a more fluid method of content moderation that puts epistemic considerations at the fore. By doing this, even when one method of content moderation fails, the overarching epistemic considerations guide us towards a more appropriate model. As a rough example, if I have to moderate content that might contain content like nudity (on a platform where it is against community guidelines) it is important that the first port of call is automated moderation. This should also, not move straight to deleting content but to informing the person before their content is posted that it violates community guidelines for stated reasons. If they think this is an erroneous evaluation of their content, they can request for their content to be reviewed by human moderators with appropriate contextual knowledge. This way, we avoid overzealous automated moderation, internet users are more aware of the rules and understand better why their content might be removed, and we leverage the contextual nuance that human moderators bring. This represents a process different from how platforms currently use the various models of content moderation available to them. This potentially beneficial shift is possible when we centre epistemic considerations in our content moderation practices.

What I have emphasised so far in this section is the importance of centring moderation decisions around users of the internet in a way that enhances their epistemic agency. Community content moderation provides us with the best starting point to do this. However, there are certain challenges that mean that we will have to rely on automated and human content moderators. While the parameters of automated content moderation can be easily set to graphic images, we will still experience some level of epistemic dumping when we inevitably use human content moderators. At the very least, we can utilise human content moderators in a way that minimises their exposure to the epistemic harms they are currently exposed to. This includes practices like shredding the veil of opacity that currently accompanies the work they do and, more importantly, acknowledging their epistemic labour and having them work in less epistemically exploitative conditions. There are also potential challenges with community and automated moderation. Community moderation might be susceptible to manipulation, where members of a social group can decide to collectively downvote posts that might be relevant to members of another social group. Automated moderation can still produce false negatives in moderating violent content where images of police brutality, for example, might be taken down as graphic content.

Despite these possible challenges, the benefit of this model of online content moderation is that it centres the role of the epistemic agent in curating their online space while having the potential to diminish our encounter with harmful content online. It also addresses the majority of the epistemic challenges linked to existing content moderation methods. At a minimum, this framework offers an optimal foundation for proficient content moderation that prioritises epistemic agency, preventing epistemic injustices, and further amplifying the internet’s capacity to democratise knowledge.

5 Conclusion

Content moderation is complex and faces several challenges. The vast swaths of user-generated content mean that moderators must sift through enormous material to identify objectionable content. This content can be posted in various formats, including text, images, and videos, further complicating the moderation process. This problem is just from a practical standpoint. There still are enormous harms associated with content moderation. The psychological trauma that content moderators experience and the epistemic harms that accompany this present even more reasons for concern. Then, there is the issue of silencing marginalised groups when content moderation is done wrong. These problems are baked into current methods of content moderation. Even algorithms and AI have proven ineffective. Yet, we rely on the Internet for information, which is becoming more inescapable. We need to revamp how we think about content moderation and who has the authority to moderate online content. At the very least, members of these epistemic communities must play a crucial role in determining how their content is moderated. Whatever the case is, the internet is an epistemic community, and no method of content moderation is sufficient if it ignores epistemic ends. I have argued here for a model of content moderation that can set us on the path to prioritising epistemic ends in our content moderation practices.