1 Introduction

From ChatGPT to DALL-E, contemporary AI models have recently dominated headlines as revolutionary new tools, capable of penning witty poetry, delivering scientific articles, or fabricating fantasy landscapes on the fly. But often overlooked in this wide-eyed rhetoric is human work and its contribution to these models.

This chapter examines the human labor behind several foundational AI models. This is the open secret behind our technical marvels, the “artificial artificial intelligence” (Stephens, 2023) that prepares and powers them. In this framing, computational technologies are ascendant, but not yet smart enough. Human laborers are needed for this temporary transitional stage, to fill in the cognitive, affective, and physical gaps. Filling in the gaps means cleaning data by hand, manually labeling tricky scenes, or moderating ambiguous content. This is the invisible labor behind so-called automated systems (Munn, 2022; Gray & Suri, 2019), what some have referred to as “fauxtomation” (Taylor, 2018).

How might we delimit this labor from other types of labor? Platform labor is a broad term that can refer to ride-share drivers (Munn, 2017), digitally mediated care workers (Ticona & Mateescu, 2018), and freelance designers and professionals (Carlos Alvarez de la Vega et al., 2021), along with a wide array of other roles, in a diverse range of industry sectors. This chapter focuses on a certain form of platform labor often referred to as “crowd work” or “click work.” While there has been significant research on crowd work, much of it has concentrated on older, general-purpose crowd work platforms like Amazon Mechanical Turk (Ross et al., 2010; Paolacci et al., 2010).

However, an array of newer platforms have emerged in recent years specifically geared toward labor for AI technologies. Appen, Remotasks, Scale AI, CrowdWorks, and Defined.ai are some of the key players to enter this space. There has been less research on these platforms and particularly how they interface with very new generative AI models such as ChatGPT and DALL-E 2, both released in November 2022. This chapter thus acknowledges insights and commonalities from prior crowd work research while focusing more specifically on crowd labor for contemporary AI systems.

By all accounts, this human labor behind AI is swelling, forming a vast army. While obtaining precise demographics for these platforms is difficult, one study estimated that there were 19 million active users on these platforms at the end of 2020 (Kässi et al., 2021). Other figures can be obtained from the websites of individual providers. The Japanese platform CrowdWorks (2023), for example, boasts that it has 4.7 million workers in its ranks. These workers come from the USA, but also India, Venezuela, Russia, the Philippines, and dozens of other countries (Posch et al., 2022). And the global pandemic has only deepened the draw and uptake of this digitally mediated remote work (Braesemann et al., 2022). As AI technologies proliferate across industries and countries, this labor will only expand in scale and scope.

For technology pundits, AI technologies and platforms are a positive step, accelerating innovation and ushering in progress and prosperity (Brynjolfsson & McAfee, 2011, 2014). But more critical research has highlighted the social fallout of AI-driven shifts, its ability to extract capital in novel ways while increasing precarity and inequality. As the chapter will discuss, platform labor for AI is often low paid, operating on a piece-work model, leaving workers with little recourse or power, and targeting workers from Global South countries.

The result is that exploitative forms of labor are at the heart of our contemporary AI systems. This makes such labor a key issue for any serious development of digital humanism. Digital humanism is concerned with making our technologies more humane, more egalitarian, more just. Yet rather than benefiting humanity, the current conditions benefit a handful of elite technology companies while harming those who are already marginalized and disenfranchised. Workers pour their time and energy into these platforms, contributing their cognitive, affective, and physical skills, but are then discarded, without sharing in any of the financial, cultural, and intellectual capital that accompanies cutting-edge AI models. There can be no ethical AI without ethical crowd work.

2 Key Concepts in Platform Labor for AI

The origins of labor for AI are closely linked to crowd work platforms. Amazon Mechanical Turk is widely considered to be the first crowd work platform. At the turn of millennium, the Amazon marketplace was plagued with duplicate products. While Amazon engineers attempted many automated solutions to identify and remove these duplicates, they eventually gave up, deeming the problem too difficult. However, in 2001, one Amazon manager devised a solution and published it as a patent. Venky Harinarayan’s et al. (2001) patent described a “hybrid machine/human” arrangement which broke jobs down into “subtasks” and distributed them to a large pool of human laborers. While the platform was originally an in-house tool, its success led the company to releasing it to the public in 2005. Clients quickly adopted the platform, farming out “human intelligence tasks” to hundreds of workers.

While Amazon Mechanical Turk is still a major player in the crowd work space, it has recently been joined by an array of platforms specifically aimed at providing the labor for AI technologies. Samasource (2023) offers “human in the loop validation” for AI developers, claiming to offer “quicker time to accuracy” and “faster time to market.” CrowdWorks (2023) stresses that AI requires “massive amounts of training data” and presents data labeling as a “remote, part-time job where you can earn money wherever you want, whenever you want.” “80% of your AI efforts will be spent managing data,” Appen (2023) cautions developers and promises that its massive labor force will take care of this requirement, delivering “quality data for the AI lifecycle.”

Such rhetoric provides a way of understanding how crowd labor is framed in AI production—work that is at once absolutely necessary and deeply devalued. The creation, curation, and cleaning of training data is certainly understood as key to AI success. But this meticulous and monotonous labor can take thousands of hours. That kind of grunt work is beneath developers, who focus instead on the “innovative” work around machine learning architectures, techniques, and testing. Data annotation is “dirty work” (Rosette, 2019), outsourced to others who are cheaper and less talented. Their labor, largely invisible and quickly forgotten, sets the stage for experts to build the AI engines of the future.

2.1 Digital Piecework

Crowd work for AI platforms draws on a longstanding form of labor: piecework. Piecework, as its name suggests, is work which is compensated by a fixed rate per piece. Piecework emerged in the late nineteenth century in astronomical calculations and manual farm labor, where workers were paid for each result. But piecework achieved its greatest hold and attention in the domain of garment production. In Britain, workers toiled at home in poor conditions for subsistence-level wages, a practice known as the sweating system (Earl of Dunraven, 1890). Thanks to the struggles and organization of labor activists, this exploitative form of labor largely disappeared in developed countries after the mid-twentieth century. However, as Veena Dubal (2020) notes, Silicon Valley companies have resurrected this notorious model of compensation. Indeed, for Alkhatib et al. (2017), there is a clear link between historical forms of piecework and contemporary forms of crowd work. By chopping jobs into microtasks and farming them out to global workers, this digital piecework extracts long hours of computational labor at poverty-level rates.

Crowd work is low-paid, descending at times to pennies per task (Simonite, 2020). One study found that the average hourly wage for this work was $2 (Hara et al., 2018). A more recent meta-analysis suggested the average was more like $6 per hour (Hornuf & Vrankar, 2022) but highlighted the difficulty of measuring unpaid labor in this analysis. These analyses are echoed by workers. Workers must identify jobs, read instructions, complete each task, and then wait to get paid. Far from being a source of steady income, then, crowd work is highly fragmented—intense bursts of microtasks interspersed with long periods of down time. This makes it difficult for workers to precisely calculate their earnings—and when they do, they are typically lower than was anticipated or desired (Warin, 2022).

2.2 Unpaid Labor

Significant amounts of work on crowd platforms are not compensated at all. One study found that the most time intensive task for workers was managing their payment, a form of labor that is totally unpaid (Toxtli et al., 2021). The same study discovered that hypervigilance, or watching and identifying jobs, was another form of invisible and uncounted labor (Toxtli et al., 2021). In a survey of 656 online workers, participants said they spent 16 h per week on average browsing for jobs, reading about jobs, and applying for jobs (Wood et al., 2019, p. 943). For some workers, this means literally refreshing the webpage over and over again. For more tech-savvy workers, this means setting up a scripted alert whenever tasks with certain keywords come through. Yet whether manual or automated, workers must be ready to claim these tasks instantly; desirable jobs on these platforms disappear in a matter of seconds. One worker “didn’t feel like she could leave her apartment, or even her computer, lest she miss out on an opportunity to work on good tasks” (Schein, 2021, p. 412). This is high-pressure labor that is not counted as labor.

Another variant of unpaid labor is training. On crowd work platforms, workers often need to qualify for particular jobs. This typically entails completing batteries of test questions or undertaking sample tasks that approximate the real work (see Posada 2022 for one account). Such training can take hours or even days to complete but is not remunerated. These preparatory tasks are typically framed as an upskilling opportunity or a quality control measure, ensuring that workers can deliver a professional product to clients. For platform providers and their clients, this is the work needed to be “work ready”—the unpaid labor before the “real” labor begins.

Finally, unpaid labor takes place through the rejection mechanism built into crowd work platforms. Clients are able to provide ambiguous instructions to workers and then “mass reject” completed tasks for trivial deviations from these guidelines (Altenried, 2020). This dynamic is exacerbated by the fact that click work is extremely low paid (see above). Clients can request far more samples or tasks than they actually require, select their preferred data, and then reject the surplus. Rejection means that workers are simply not paid for this task. Workers may protest, but, as many testify, platforms overwhelmingly side with the client, the paying customer, in these disputes.

2.3 Toxic and Exhausting Labor

While the remuneration of this labor is bleak, it is worth looking beyond the economic conditions to consider the psychological and subjective impact of this work on the worker. For instance, the hypervigilance required to constantly monitor jobs and snap up good ones (Toxtli et al., 2021) suggests intense pressures on workers, which may be internalized as anxiety or stress. Drawing on two large surveys of two thousand workers, Glavin et al. (2021) found that both gig workers and crowd workers were more vulnerable to forms of loneliness, a finding that perfectly tracks with a form of labor which is highly individualized and often highly competitive. In addition, workers have little control over platform mechanisms, and this powerlessness can often produce annoyance or anger. One study found that technical problems in conducting tasks, platform competition, and the inability to disconnect from work all led to significant levels of frustration amongst workers (Strunk et al., 2022).

If this work can be psychologically damaging, it can also be simply exhausting. One study of 263 workers on a crowdsource platform found they became technologically overloaded, leading to burnout (Bunjak et al., 2021). Burned-out workers leave the platform, creating significant churn. But as Van Doorn (2017) notes, the turnover created by these accelerated conditions is designed into the model: as workers become exhausted and leave, a new set of precarious individuals comes on board to take their place. Baked into platforms, then, is the logic of obsolescence. Workers are driven to the point of breakdown and then quit in disgust at the conditions or are discarded when their performances falter.

Of course, such a logic is far from novel. Marx (1977, p. 348) diagnosed this same dynamic when he noted “‘capital’s drive towards a limitless draining away of labor-power.” Just as the soil was exhausted of nutrients, the body of the worker was exhausted of its productive potential. In this sense, the extraction of labor for AI technologies is not some unprecedented condition, but a repetition of what has come before. The exploitation of vulnerable or even desperate workers by elite technology companies is a pattern that seems all too familiar. This means that insights from the history of labor and from analyses of capital can still be fruitful, providing insights for understanding these conditions and recommendations for improving labor practices.

2.4 Colonial Circuits

Platform labor for AI is not equally distributed across the globe but is instead arranged in particular patterns. These patterns tend to follow long-established patterns of labor, where work is extracted from Global South locations and funneled to Global North actors. Cheap labor is coordinated and captured in the colonies and then transmitted to startups, developers, or tech titans. This is the well-known global division of labor (Caporaso, 1981), an imperialist appropriation of resources and labor which provides a boon to these so-called advanced economies (Hickel et al., 2022).

Labor for AI models does not so much upend this division as twist and extend it in various ways. These technologies enable new forms of flexibilization resulting in a division of labor between innovation-intensive production sites and their counterparts in the so-called periphery (Krzywdzinski, 2021). Companies leverage new digital technologies—together with a large informal sector and very limited regulation—to build an instant low-cost workforce in a marginal economy (Anwar & Graham, 2020).

Fragments of work are farmed out to laborers in the Global South who are essentially hired and fired with each task. Scaling up and down as necessary, this just-in-time workforce offers a model to companies that is lean, flexible, and above all, cheap. AI systems depend upon this work to function correctly and to “learn” rapidly, but this key labor from the Majority World is often rendered invisible (Amrute et al., 2022).

This is an asymmetric power relation. In one sense, data production hubs such as Ghana, Kenya, or South Africa mean that Africans are technically participating in the development of AI technologies. However, as Chan et al. (2021) stress, such “participation” is limited to work which is low level and low paid—and there are systemic barriers that prevent more meaningful or collaborative forms of participation. The result is a new form of extractivism (Monasterio Astobiza et al., 2022), reminiscent of the colonial plundering of resources, where valuable raw materials are harvested and built into lucrative AI products by powerful digital empires.

2.5 ChatGPT as Case Study

ChatGPT provides a case study that exemplifies many of these issues. This AI large language model can replicate human-sounding text in many genres and has been widely celebrated as an important innovation. The model was developed by OpenAI, a high-profile startup based in San Francisco with $11 billion in capital. However, as a high-profile report for TIME (Perrigo, 2023) documented, the firm came up against a crucial issue during development. The model had great potential but also major problems, regularly churning out responses that were racist, sexist, misogynistic, or toxic in various ways.

To remedy this issue, the firm turned to Samasource, a platform based in Kenya. Their mission was straightforward but fraught: provide labeled examples of violence, hate speech, and sexual abuse so that an AI model could be trained on them (Perrigo, 2023). Sama was sent hundreds of snippets of abhorrent text describing child sexual abuse, bestiality, murder, suicide, torture, self-harm, and incest in graphic detail. Sama workers had to read through each of these samples and manually label each one. Workers testified to the trauma or psychological fallout of reading through such depictions, over and over again, day after day. In this sense, these individuals are the haz-chem workers of the digital world, carrying out dirty and dangerous work in order to construct safe environments for the privileged (Munn, 2022).

The study found that workers were paid between $1.30 and $2 per hour, depending on their seniority and performance levels (Perrigo, 2023). This could be compared against the average salary for a software engineer at OpenAI, which is at least $250,000 USD per year, not including typical developer add-ons such as signing bonuses, stock options, and performance bonuses. In this single case study, then, we see low wages, harsh work conditions, psychological damage, and the extractivism of disposable Global South labor by a wealthy Global North company (Table 1).

Table 1 A summary of key issues (among many) in AI labor discussed in this chapter

3 Possible Solutions and Interventions

As the chapter has suggested, there are a number of significant issues with the current state of labor for AI systems. These issues are diverse, ranging from the financial (extremely low remuneration or nonpayment) through to the social (isolation, alienation, and sense of powerlessness) and the political and racial (exploitation of global division of labor and Global South workers), among others. Even from this cursory list, it is clear that there is no “silver bullet” solution for AI labor issues, no single technical fix that would address these multifaceted problems. However, there are several more modest suggestions which aim to improve the situation for workers.

Mutual aid is one possible intervention. After interviewing and surveying many workers on Amazon Mechanical Turk, Lilly Irani and Michael Silberman (2013) found that many experienced frustration at the lack of information on clients and the lack of accountability for misbehavior. As a result, the duo established Turkopticon, a forum where workers can rate clients, share stories, and exchange information. These kinds of spaces and forums also exist for other platform laborers, such as ride-share drivers. Such an intervention, while imperfect, disrupts the profound isolation and informational asymmetry that tend to characterize platform labor. This intervention allows workers to come together, share their experiences, warn others, and generally offer forms of support. It is one manifestation of mutual aid, a form of solidarity and support that workers have long used to improve their conditions and enhance their prospects. Such communality and support need to be extended into the context of AI labor, which is individualized and invisible. Indeed, in his book on AI labor, Dan McQuillan (2022) considers mutual aid to be a significant and strategic response to the brutal conditions that these technologies often impose.

Best Practices and Guidelines can also provide concrete recommendations for companies engaging in this form of work. The Allen Institute for Artificial Intelligence (2019), for instance, released its guidelines for AI labor as a public resource. The guidelines give an hourly rate for US-based work and international work; they establish a rubric for pricing work; they highlight the importance of worker privacy; they champion transparency and setting expectations with workers; and they caution about rejecting work. Other research has suggested that companies doing crowd work for natural language technologies takes up three ethical principles to mitigate harm and improve conditions (Shmueli et al., 2021). These principles and best practices have potential if companies seriously engage with them and uphold them. However, ethical principles in the context of AI are nonbinding and can easily be ignored (Munn, 2022), with high-minded ideals effectively acting as window-dressing while the real business of technical development continues apace.

For this reason, “soft” principles and norms must be accompanied by “harder” regulations and legislation. The application of laws to platform labor, which is globally distributed across many territories, is by no means trivial. However, as Cherry (2019) notes, precedents can be found in the EU’s GDPR scheme for data, in the laws applied to maritime workers, and to multinational codes of conduct, all of which have “extraterritorial” applicability. In the case of maritime workers, for instance, there are international laws, conventions, and standards that have been ratified by member states, forming a regulatory regime that is largely understood and followed across the globe. A similar scheme might be drafted specifically for AI crowd workers that recognizes their needs, establishes key protections, and defines a set of penalties for non-adherence. Such a scheme for “socially responsible crowd work” is possible, stresses Cherry (2019), but it requires creative thinking and buy-in from platforms, workers, and regulators (Table 2).

Table 2 A summary of potential interventions (among others) to improve labor for AI

4 Conclusions

Contemporary AI models are highly dependent on high-quality data for training, accuracy, and functionality. Producing such data often means annotating fields, labeling images, cleaning duplicates, or even developing new datasets for a particular domain or use case. Such production does not happen magically but instead requires vast amounts of human labor. This labor has typically been organized through crowd work platforms, where large jobs are broken into microtasks and distributed to a massive labor pool of workers. However, there are numerous problems with this approach, as this chapter has discussed. Workers are paid poorly or not at all, much of the work is invisible and uncounted (e.g., finding jobs), the tasks themselves can be taxing or even toxic, and the labor form is extractive, transferring labor from vulnerable populations to elite tech companies in ways that repeat colonial patterns. While there are no easy solutions to this situation, an array of interventions, from mutual aid to industry norms and harder regulation, could lead to incremental improvements in terms of work conditions, worker well-being, and more equitable forms of organization. Thoughtfully engaging with these issues—and carrying out the difficult negotiation and implementation of responses in real-world work situations—must be central to any institution or organization committed to digital humanism.

Discussion Questions for Students and Their Teachers

  1. 1.

    Why is human labor necessary for contemporary AI models, systems, and technologies?

  2. 2.

    How did crowd working emerge as a distinct form of labor? What similarities does it have with older, or historical, forms of labor?

  3. 3.

    How is crowd work paid, and what issues emerge around remuneration of this work?

  4. 4.

    What kind of labor conditions characterize this crowd work? What kinds of impacts does this have (not just economically but socially, psychologically, etc.)?

  5. 5.

    How is labor for AI organized globally? Describe how this distribution of labor perpetuates colonial patterns and power relations.

  6. 6.

    What kind of interventions could be made to AI labor in order to increase remuneration, improve labor conditions, and support the well-being of workers?

Learning Resources for Students

  1. 1.

    Altenried, Moritz. 2022. The Digital Factory. Chicago: University of Chicago Press.

    This book does an excellent job of showing the historical links between the Taylorist rationalization of work, where gestures were measured and optimized to boost production, and the control (gamification, algorithmic management) mechanisms embedded in crowd work, the gig economy, and other contemporary forms of labor.

  2. 2.

    Perrigo, Billy. 2023. “Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer.” TIME Magazine. January 18. https://time.com/6247678/openai-chatgpt-kenya-workers/.

    The case study in my chapter draws from this excellent piece of investigative journalism. This example showcases in a clear and powerful manner many of the underlying issues with labor for AI systems, including very low pay and poor working conditions, exposure to toxic material, and the exploitation of precarious labor pools in the Global South by Global North tech titans.

  3. 3.

    Amrute et al. 2022. “A Primer on AI in/from the Majority World: An Empirical Site and a Standpoint.” New York: Data & Society. https://www.ssrn.com/abstract=4199467.

    This report aims to reframe the conversation on technology and the Global South, focusing on its dynamism, its ingenious interventions, and its pools of potential labor, rather than what it lacks. The authors present an array of fascinating readings, arranged thematically, which augment typical viewpoints on AI and labor in a challenging and productive way.

  4. 4.

    Munn, Luke. 2022. Automation is a Myth. Stanford: Stanford University Press.

    This book is a short and accessible text that lays out key issues around automated technologies, contextualizing technologies, and racialized and gendered labor. Drawing on numerous disciplines and an array of rich stories from workers, it highlights the vast army of human labor that props up so-called “automated” systems.

  5. 5.

    Gray, Mary, and Suri, Siddharth. 2019. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Boston: Houghton Mifflin Harcourt.

    This book showcases the diverse forms of invisible labor that contribute toward our contemporary technical systems. This “ghost work” is carried out by women, migrants, students, and a range of other people to earn some money but is often underpaid or exploitative. Gray and Suri highlight the importance of this work, show how tech companies adopt it as a strategy, and discuss how it might be altered and improved for the better.