1 Introduction and background

Over the last few years, we have witnessed a large growth in the capabilities and applications of Artificial Intelligence (AI) in citizen science (CS). In 1995, Irwin used the concept to describe initiatives in science policy to be more responsive to people’s “understanding” and “concerns”, making science policy more “democratic”. Almost simultaneously, Bonney (1996) used the notion of CS to describe scientific projects in which “amateurs” provide observational data (such as bird spotting) and acquire new scientific skills in return. Over the years, both of these streams have been synthesized into contrasting ideal-type views of CS: a “productivity view” focusing on scientific output, and a “democratization view” considering scientific as well as non-scientific goals (Sauermann et al. 2020). Some projects focus on social change, inclusion, or advocacy rather than generating scientific knowledge in the traditional sense (Ottinger 2010). However, most citizen science projects consider the goal of knowledge production as essential (e.g., Bonney et al. 2009).

Members of the public—which we call here “citizen scientists,” or simply “citizens” to be taken to mean citizens of nation states but “members of a broadly construed community” (Eitzel et al. 2017, p. 6)—can participate in different types of citizen science and associated initiatives in several research fields. Citizen science projects are set up in astronomy and astrophysics, ecology and biodiversity, archaeology, biology, and neuroimaging, among other fields. For example, in ecology, citizens use sensors to contribute data to data collection programs and monitor air or water quality, while in astronomy they classify galaxies. Citizen science projects often create large-scale observational datasets including citizen-generated images crowdsourced through smartphone apps, or galactic data collected by astronomers with telescopes. These data can benefit both science and society. For example, large-scale data can be used to complement official data sources to improve reporting on the Sustainable Development Goals (Fritz et al. 2019). CS is now an expanding field and a promising arena for the creation of human–machine systems with increasing computational abilities, since many CS projects generate large datasets that can be used as training materials for AI subsets, such as machine learning (herein ML) (Lotfian et al. 2021; Wright et al. 2019). ML is achieved through adaptive algorithms that use large quantities of labelled data to autonomously detect patterns, make predictions, and recognize technical rules (Popenici and Kerr 2017, p. 2). In the literature, ML types are discussed in great detail, and readers are directed to relevant sources (e.g., Takano 2021; Lotfian et al. 2021).

1.1 The impact of ML on tasks in CS

Humans and ML have the potential to work together in new ways in CS, and make data collection, processing, and validation more efficient (Lotfian et al. 2021; Franzen et al. 2021; Ceccaroni et al. 2019). However, the use of ML raises the question of which tasks will be most affected and which will be relatively unaffected (Brynjolfsson et al. 2018). The distribution and content of tasks according to comparative advantage (who does what best at a given time), to maximize the effectiveness of specialization and increase the efficiency of the system (Kelling et al. 2013), can raise concerns. For example, by making citizen scientists’ contributions either too simple or too complex, or by reducing the range of what they can contribute, there is a risk of disengaging them (Leach et al. 2020). The question of a hypothetical ML takeover of citizen science was also raised by the participants at the 3rd European Citizen Science 2020 Conference (https://www.ecsa-conference.eu/) during a panel discussion intended to initiate a dialogue on how citizen scientists interact and collaborate with algorithms. As mentioned during the conference event, the current rapid progress in ML for image recognition and labelling, in particular the use of deep learning through convolutional neural networks and generative adversarial networks, presents a threat to human engagement in citizen science; if machines can confidently carry out the work required, then there can be no space for authentic engagement in the scientific process (Ponti et al. 2021). Therefore, as ML and other forms of AI become increasingly used in CS, even more fundamental questions arise as to whether we will continue to need citizen scientists and, if so, how their contributions will change because of automation.

Typically, allocating tasks between humans and machines is related to an increased effort to automate parts of human contributions based on what machines and humans are recognized to be better at (e.g., Fitts 1951) to maximize efficiency and speed to achieve a given goal (Tausch and Kluge, 2020). In CS, the use of ML presents opportunities to improve speed, accuracy, and efficiency in analysing massive datasets, monitoring the results, and identifying knowledge gaps (Ceccaroni et al. 2019). We may conjecture that if project organizers have primarily productivity goals, they may replace citizens as much as possible and only make tasks for them as meaningful as needed to keep them engaged. In contrast, if project organizers also have “democratization” goals, they are going to use machines more for the benefit of human engagement and may even involve citizens even where machines could do a more efficient job. The distinction may not matter much now, because AI is still not capable of replacing people completely. Nonetheless, this distinction can become critical once AI becomes more powerful—then project organizers will have to decide whether they want to maximize efficiency by replacing citizens, or maximize engagement by keeping them in the loop while using machines to make things more interesting for citizens. The distribution of epistemic agency would then be taken into account.

1.2 Distributing tasks and epistemic agency

In performing tasks, humans and machines do things, and as a result we can call them actors (those who act). More specifically, they are epistemic actors, because they do things to pursue specific epistemic goals (Ahlstrom-Vij 2013). In this paper, both attaining knowledge and providing solutions to specific problems qualify as epistemic goals. Considering that processes of knowing or problem-solving take place in increasingly entangled systems consisting of human and non-human actors, systems in which data from multiple sources gets processed, accepted, rejected, and modified in various ways by these different actors, the notion of epistemic agency needs to be examined to account for such socio-technical processes. Humans and algorithms are not seen here as self-contained epistemic actors in their own right, and a key empirical question is how agency dynamics play out in hybrid settings combining humans and machines. In this paper, we seek to answer the following questions:

RQ1:

What is the distribution of tasks between humans and ML in CS classification projects?

RQ2:

Based on this distribution, how is epistemic agency acted out in terms of whose knowledge shapes the distribution of tasks, who decides what knowledge is relevant to the classification, and who validates it?

Through the analysis of the narratives reported by organizers (e.g., project leaders, researchers) in the documentation of a sample of projects, we sought to gain descriptive insights into how citizens, experts, and ML participate in specific classification tasks and how task distribution affects their epistemic agency. Narratives are cultural artefacts that tell stories, which offer particular points of view or sets of values (Bal 2009). We consider academic papers and other non-fictional material used in this study as forms of “narratives” offering written accounts of the combination of CS and ML in classification projects.

We will now clarify the terms used in these questions, except for epistemic agency, which will be treated separately in the next section. For a proper understanding of the term task, we refer to Hackman’s (1969, p. 113) definition of the term as a job assigned to a person (or group) by an external agent or that can be self-generated. A task includes a set of instructions that specify which operations need to be performed by a person concerning an input, and/or what goal is to be achieved. We used Hackman’s (1969) conceptualization of tasks as a behaviour description, that is, a description of what an agent does to achieve a goal. Thus, the emphasis is placed on the reported behaviour of the task performer. This conceptualization applies to both humans and machines performing tasks. We chose to use the notion of task for two reasons. First, prior work on CS has also focused on it (e.g., Crowston et al. 2019; Franzoni and Sauermann 2014). Second, as Brynjolfsson et al. (2018) pointed out, the impact of ML on different jobs is likely to depend on the suitability of ML for specific tasks within jobs. Therefore, human participants in CS will be affected differently based on how suitable their tasks are for automation. In this paper, we focus on data-related tasks, such as data collection, processing, and analysis. Therefore, we excluded from our analysis other tasks that experts commonly do in CS projects such as, for example, securing funding, developing materials and methods, and writing papers. Regarding the term “expert”, we use it to include only professional scientists and professionals responsible for developing algorithms, setting up and running the projects. For the sake of our analysis, we do not use the term to refer to “expert citizens”, although we recognize that citizens can develop expertise in CS projects (e.g., Epstein 1995; Collins and Evans 2007). Citizens have been shown to develop expertise and perform tasks that extend beyond the ones they were mobilized with by scientists (Kasperowski and Hillman 2018).

The paper is structured as follows. We begin using a perspective in Science and Technology studies to define “epistemic agency” as a construct that allows exploration of what it means for actors to participate in socio-technical endeavours as CS classification projects. Second, we present the methodology used to collect and analyse our sample, followed by the results and discussion. The final section presents conclusions from this study and suggests future research directions.

The contribution of this paper is threefold: (1) to give scholars studying CS and human–machine integration a synthesis of results providing descriptive insights into the distribution of tasks and epistemic agency in CS classification projects; (2) to draw potential broader implications for the role of citizen scientists that are associated with the division of labour between the three actors; and (3) to point to relevant questions for future research.

2 What is epistemic agency?

A standard notion of agency used in the social sciences refers in a very broad sense to the capacity of an agent—usually a human—to act intentionally to influence or control social relationships or structures (Davidson 1980). In this study, we needed a more encompassing concept that goes beyond human intentions to include the agency of technologies and focuses on how different actors interact to influence the course of events. The notion of agency we use follows the influential conceptualizations used in Actor-Network Theory (ANT) as developed by Latour (2005) and Callon (1986). Our study is influenced by two aspects of these.

The notion of agency we use follows the influential conceptualizations used in Actor-Network Theory (ANT) as developed by Latour (2005) and Callon (1986). Our study is influenced by two aspects of these conceptualizations. First, material objects exercise agency much like humans although unlike humans they do not have intentions, only effects. Second, humans and non-humans do not possess/have agency, but exercise it by interacting with each other. According to Latour (2005), an actor can be anything—such as artifacts, tools, animals, and ideas—that modifies other actors through a series of actions. An actor makes others act. A network of actors makes room for epistemic agency, since the activity of knowing does not emerge only from the effort of one individual actor but from the efforts of several actors woven together in a “program of action” (Latour 1992, p. 226), bringing together both the intentions of humans and the functions of artefacts. In such a program, humans and machines form an assemblage and work together to pursue their epistemic goals.

This approach to agency reframes the role of ML and the way it relates to humans as presented in several accounts. For example, it helps us to go beyond the conception that, once ML is successfully implemented and trained to perform tasks, it can become autonomous, self-standing, and black-boxed in terms of epistemic agency (cf. Glaser et al. 2021, p. 3). The same view of ML as a discrete tool can be seen as enabling citizen science projects to require fewer volunteers thanks to the efficiency and speed afforded by computational technologies once a training dataset has been successfully developed (McClure et al. 2020). Arguably, these accounts of AI rely on a concept of agency located in discrete computational tools, which prevents us from considering both humans and non-humans as interwoven participants.

Consonant with the definition of agency used here, we conceptualize epistemic agency as the capacity of different actors to intervene, facilitate, or control the ways scientific knowledge is produced. Using a relational perspective, the distributed epistemic agency of algorithms comes to the fore: algorithms are no longer islands of automation but “assemblages” of hands tweaking and turning, swapping parts, and experimenting with new arrangements (Glaser et al. 2021, p. 5; Seaver 2019, p. 419; Pollock and Williams 2009). Algorithms are created in relational networks that exceed merely technical domains. Thus, investigating the epistemic agency in CS classification projects means disentangling these relational networks to examine how algorithms play a role that extends beyond the conditions under which they are developed and implemented (Glaser et al. 2021, p. 5). Human and non-human actors thus take turns performing the epistemic work required for achieving epistemic goals. The use of ANT to examine epistemic agency in projects that employ ML to sort through large datasets provides an analytic sensitivity that can reveal both the continuity and the simultaneous singularity and multiplicity of this phenomenon. Epistemic agency takes multiple forms, depending on the material–semiotic network in which it is entangled.

Considering the distributed nature of agency, we need to examine how it functions in hybrid settings. These settings are being developed in CS to perform certain research tasks using ML. It remains to be seen how the distribution of work between experts, citizens, and machines will affect their epistemic agency, and, ultimately, scientific knowledge production. As a result, science addresses the issue of managing large datasets by spreading the research process among various disciplines, machines, and actors outside of academic science, while also distributing control and command over the research process. Concern is raised over how epistemic agency is distributed over time, and how effective relationships between humans and technology are shaped (Knorr-Cetina 1999, 2007; Reyes-Galindo 2014).

3 Methodology

This study consisted of three main steps. In Step 1, we searched for documents about CS classification projects using ML. At the time we started this study, to the best of our knowledge, there was no repository of CS classification projects using ML. Our two main options were Internet searches and snowballing. We selected documents using three main criteria: (a) there must have been an implementation—or a proof-of-concept—of the ML application; (b) the texts must have been produced by personnel directly involved in the design and development of classification projects; and (c) the texts were deemed suitable to address our questions. In total, this study includes 38 published sources (27 journal articles, and 11 from a variety of articles including reports and blogs), retrieved between January and July 2020. All the documents are in the public domain and are obtainable without the authors’ permission. The used sources are referenced in Online Appendix 2.

We selected a purposive sample of 12 classification projects: Galaxy Zoo AI, Virus Spot, Multiple Sclerosis, Human Atlas, Plantsnap, MAIA (ML Assisted Image Annotation), iNaturalist, Milky Way, Twittersuicide, Mindcontrol, Observation.org, and Snapshot Serengeti. Our sample was drawn from Ceccaroni et al. (2019) and Citizen Science Salon in Discovermagazine.com (Ischell 2019). We chose to select a homogenous sample in order to identify important common patterns that cut across variations, and simplify our analysis. A homogenous sample usually requires a smaller number of cases (Patton 2002).

In Step 2, we used document analysis, a qualitative research approach used to analyse documents that have been produced prior to, and independently of, the researchers in the present study (Bowen 2009). For each project included in the sample, we produced meta-summaries (Online Appendix 1) of documents containing information relevant to address our research questions. A qualitative meta-summary is defined as a “form of systematic review or integration of qualitative findings in a target domain that are themselves topical or thematic summaries or surveys of data” (Sandelowski and Barroso, 2003, p. 227). Adapting the process for creating qualitative meta-summaries proposed by Sandelowski and Barroso (2003), we created a spreadsheet to summarise information from each source about the following aspects: the “data” tasks performed by citizen scientists, experts, and algorithms, respectively; the types of algorithms used; the sequence of tasks between humans and machines, and the reasons why the project combined humans and machines [the meta-summaries are available in Online Appendix 1]. One author reviewed all the sources. However, to ensure the trustworthiness of the review and provide direct evidence from the sources, we created anchor codes—e.g., [GZA-KA19-7]—in the meta-summaries to link the extracted information to the original statements in the sources (Online Appendix 3).

In Step 3, to analyse the meta-summaries, we used qualitative content analysis (QCA) (Hsieh and Shannon 2005, p. 1278). Content analysis assumes that texts can provide valuable information about a particular phenomenon (Bengtsson 2016). The unit of analysis was the individual classification project. NVivo 12 software (QSR International 2020) was set up for coding the collected secondary material. Through repeated examination and comparison, we identified themes in the data through inductive analysis. Two authors open-coded the meta-summaries, describing the tasks performed by citizens, experts, and machines, and categorized the codes based on their conceptual similarity. We decided to conduct a manifest analysis, which means that we remained close to the text, describing the visible, such as the words in the text, without trying to infer latent meanings (Bengtsson 2016). The coding structure is in ESM Appendix 5.

4 Results

In this section, we report the results from the data analysis of the 12 classification projects. The following two subsections address our two research questions: distribution of tasks (RQ1), and enactment of epistemic agency (RQ2).

4.1 Distribution of tasks between humans and ML

We begin by providing an overview of the main characteristics of the sampled projects with examples of tasks performed by citizens, experts, and machines, respectively (Table 1).

Table 1 Synopsis of project characteristics and tasks

Table 2 presents a summary of the three tasks most frequently performed by each actor across the projects. A complete description of all the tasks is included in ESM Appendix 4, along with examples of data for each. The table highlights the role of experts in checking model predictions and validating the results to ensure accurate outputs by models; the role of citizens labelling data to develop a training dataset fed into machines to make correct predictions; and the role of machines inferring patterns from new data after training with a labelled dataset.

Table 2 A comparison of the three main tasks performed by each actor

We now summarize the dataset according to the major categories and codes aggregated by the number of references (portions of coded text) across the 12 projects. In Figs. 1, 2, and 3, we present the distribution of tasks performed by citizens, experts, and ML across the projects.

Fig. 1
figure 1

Citizen tasks, aggregated and sorted by number of references

Fig. 2
figure 2

Machine tasks, aggregated and sorted by number of references

Fig. 3
figure 3

Expert tasks, aggregated and sorted by number of references

Finally, the multiple bar chart in Fig. 4 displays the tasks across the three actors to see the distinct roles of machines, citizens, and experts.

Fig. 4
figure 4

Comparison of tasks across the three actors

4.2 Epistemic agency

The best example of epistemic agency in the data is provided by Human-in-the-Loop (HITL). Developing and improving ML models without human assistance is not possible yet, therefore, HITL (Shih 2018) is the prevailing approach, which requires human interaction when algorithms encounter problems. Typically, HITL is used to combine human and machine knowledge to create a continuous circle where ML algorithms are trained, tested, tuned, and validated. In this loop, with the help of humans, algorithms become better trained and make more accurate predictions. In other words, at their current stage, ML algorithms can learn and improve on their own through trial and error.

Our analysis indicates a type of interaction in which humans and algorithms are interdependent and take turns to solve a task together, while the feedback loop allows continuous improvement of the system. Experts are the humans mainly involved in the HITL described in the analysed narratives. They train models, test, and validate them to improve accuracy by scoring their outputs when algorithms are not able to make the right decisions. They create a continuous feedback loop, allowing the algorithm to give better results over time. However, citizens are also involved at various stages of the process. We present three main examples of what we call citizens-in-the-loop, showing how citizens assist algorithms when they encounter difficulties. These examples show a type of interaction in which citizens and algorithms are interdependent and take turns to solve a task together, while the feedback loop allows continuous improvement of the system.

When algorithms provide incorrect suggestions. In Observation.org, a free tool for field observers to record and share their plant and animal sightings, citizen scientists upload images of flora and fauna and if the recognition algorithm fails to provide correct identification of the species, then citizens can edit the wrong suggestion on the observation screen. Based on this, the system shows whether citizens have accepted or rejected observation data. Citizens contribute to creating a sort of gold standard database used to train the ML model (Fig. 5).

Fig. 5
figure 5

ML provides incorrect suggestions

For example, in Snapshot Serengeti, machines may misclassify animals in the collected pictures. Citizen scientists are then tasked with identifying the animals and training the algorithms based on their observations. First, the algorithm classifies the picture. If the animal is detected with a certain probability, citizens come onto the scene. AI offers a primary classification (animal recognition) to the spotter (the trapper who uploaded records can also pre-classify the image). A citizen scientist validates/invalidates the pre-classification and an image is not considered as validated until there is at least a 75% consensus (which can be adjusted in the specific project) among all the citizens involved. This is the input for the algorithms.

When algorithms do not know yet how to perform classification. In Milky Way, a system leveraging citizen science and ML to detect interstellar bubbles, citizens identify patterns that machines cannot identify in bubble detection and contribute to building a database. Researchers use the citizen identification output to train ML and build a model of an automatic classifier (Fig. 6).

Fig. 6
figure 6

ML cannot identify outliers

When algorithms pose queries. In MAIA, an ML-assisted image annotation for the analysis of marine environmental images, an algorithm poses queries to citizens in the form of training data images. Citizens review these images and determine whether they contain objects of interest for classification or not. Then, they manually refine each image with a circle to mark the object of interest in the image, by modifying the circle position or size, so it closely fits the position and size of the object (Fig. 7).

Fig. 7
figure 7

ML posing queries

5 Discussion

In this study, we examined twelve classification projects combining CS and M in several scientific fields. This combination results in socio-technical epistemic systems consisting of human and non-human actors, each equipped with different amounts of knowledge and power. Our analysis suggests no task performed by either citizens or experts can be handled fully by ML at present. The suitability of a task for ML depends on the task characteristics and the level of knowledge required to perform it. However, machine learning may yield an epistemically stratified organization, as it requires more expert knowledge and skills, while still soliciting citizen scientists’ contributions. Due to the temporality of tasks in a CS project, the epistemic agency will be ascribed to different actors at different times. This means that, at different times, the epistemic agency of certain actors will come to the fore, while that of others will be obscured. Arguably, this means that narratives of ML, AI and CS (biography, success, optimization, ideal of science, etc.) must be understood in terms of how tasks are delegated and distributed among actors and their epistemic agency, as they change over time. Whether you ask the question “if”, or tell the story of “how” a project succeeds, when machines take over, etc., the answer will be different, and in fact it will differ based on when and where you examine the project.

Our descriptive results raise three main issues, which we will explore through the lens of perspectives informed by Science and Technology Studies (STS) and Information Systems. Specifically, we discuss the emphasis of optimization on the ideal of science, the problem of induction, and the role of citizen scientists.

5.1 Emphasis on optimization to achieve greater scale, accuracy, and speed: which ideal of science?

We examined projects in various scientific fields that use CS in conjunction with ML, such as neuroscience, sociology, oceanography, environmental science, botany, life sciences, astronomy, microbiology, and medicine. Despite the variety of scientific fields, the narratives about these projects in terms of project goals, human–machine integration, use of ML, and distribution of tasks between citizens, experts, and machines, all reveal a story of optimization, scaling up, increasing accuracy, and speeding up. These stories underpin a model of data-driven science propelled by ML affordances that make it easier to sift through massive datasets to infer patterns, while renewing inductive reasoning, and making research less hypothesis-driven (Mazzocchi 2015; Hochachka et al. 2012). At the same time, there is also a growing interest in using AI in research as a means of enabling new methods, processes, management, and evaluations (Chubb et al. 2021).

The diverse projects included in this study bring together a stereotypical narrative of how ML enables epistemic agency. Why is this narrative repeated in various scientific fields? Data-driven science seems to unite projects rather than separate them when they are reported for stakeholder audiences, embedding programs of action (Latour 1992), and envisioning a desired imagined future state (Glaser et al. 2021, p. 13). In this way, a scientific future is enacted, namely, one in which high-powered computing capacities enable the utilization of largely inductive ML applications that require less theoretical pre-processing of data (Glaser et al. 2021, p. 10). The shared narrative reflects a unity of science that extends beyond the epistemic agency that shapes these hybrid formats of science. As science opens up to outsiders to train the automated systems of high-capacity machines, outsiders are encouraged to participate as inductive actors, taking part in a program and performing an ideal of science that accommodates them in the research process. This does not imply that the observations made, classified, and used to train machines are not used to test hypotheses. Citizens’ access points, however, enact a process of induction, which tends to unify the sciences more than separate them. When scientists open the way for volunteers to train machines to speed up scientific processes, this is in line with inductivism’s ideals (Kasperowski et al. 2019).

Science has been associated with at least two main epistemic uncertainties known as the problem(s) of induction. Popper (1934) suggests that although some “problems” cannot be solved, they can be more or less successfully managed. First, observations are uncertain and are usually addressed and managed by different technologies, standards, and protocols. Second, finite observations cannot yield universal conclusions. Observers cannot observe everything and protocols in themselves suffer from being based on finite observations. Inundated with data, they will not be able to cope. Nowadays, ML can learn to recognize patterns in classified data that were not integrated into their original design, since the hybrid use of CS and ML when coupled with an abundance of data makes it possible to manage epistemic uncertainty (Popenici and Kerr 2017). Protocols are guaranteed to evolve and maintain their functions as data sets become larger and the epistemic agency of machines increases. However, when outlining an algorithm’s biography (Glaser et al. 2021), it is critical to include how much data an algorithm can handle without losing its explanatory power and its ability to make additional classifications beyond the original gold standard. Once this new data are gathered, it will be time to retrain the machine with the help of experts or citizens.

Science that incorporates outsiders such as citizen scientists in the research process is more likely to unify sciences than to divide them. Citizen science would then reflect a scientific ideal resembling some aspects of the discipline of observations through protocols, which is reminiscent of the logical empiricists’ argument for the purification of science (Kolakowski 1972). While the 1930s called for a purification of science and research, 90 years later the key terms are computer science, artificial intelligence, machine learning, open science, inclusion, mediation, transparency, democracy, and responsible research. In the 1930s, the epistemological concern was that science must close ranks to save itself; it now appears that science is encouraged to open ranks under a banner of openness and unified investigation.

5.2 Implications of optimization for the role of citizen scientists

The problem of induction. The three examples of citizen-in-the-loop in Sect. 4.2 suggest that ML is used as much as possible at a given point in time/stage of evolution of the project, and then humans step in to fill in the missing pieces. Depending on the current strength of ML, the temporal order in which machines and citizens take turns to work on tasks is different. If ML provides wrong outputs, then citizens help retrain the model, but if ML is somewhat better and makes suggestions, then citizens correct the model. If ML models work well, then humans take care of undetected exceptional cases. If ML is perceived as producing optimized outputs, citizens may be made redundant. This is a just conjecture for which there is no direct empirical evidence, but it is worth considering. If the outputs of ML are trusted and relied upon by experts, they may influence the socio-technical assemblage that generates the data on which the same ML is trained (Faraj et al. 2018; Pachidi et al. 2021; Lebowitz et al. 2021). It is unclear if humans will ever become redundant when applying ML to the CS domain, and if they do, under what circumstances is ML considered optimal? In the present stage of development, the skills of ML make it a scalable complement to citizens and experts, for example by structuring large amounts of unfiltered data into information or estimating the probability of an occurrence of an event based on input data (Ponti and Seredko 2022).

However, rather than considering optimization from a technical perspective, we relate it to the way the “problem of induction” from observations is handled. A long-standing belief holds those inductive approaches—now emphasised by ML which infers general rules from observations—are plagued by inescapable and insoluble problems (Popper 1934). Humans and machines cannot solve these problems, nor can some sophisticated hybrid approach combine the two. There is an important reason for this: no protocol can handle a lot of data without encountering anomalies, and these must be controlled for. Therefore, human epistemic agency will be needed for never-observed phenomena, because protocols and gold standards have a limited epistemic reach in those cases.

In this study, we focused on technologies using ML classification methods that infer patterns from training datasets consisting of labelled input–output pairs and classify new inputs into predefined output classes. These ML methods are rule-based systems that allow ML to represent experts’ know-what knowledge (Lebovitz et al. 2021). This type of knowledge is captured, for example, in the gold-standard labels used to train and validate ML models. Experts are key in the data-intensive projects we examined. Consonant with the relational notion of agency adopted in this work, we use a relational notion of expertise. Rather than being something experts possess, we define expertise as the expert’s ability to mediate between the production of knowledge and its application (Grundmann 2017). In this sense, in our sampled projects, experts define and interpret situations and set priorities for action. As mentioned in Sect. 4.2, experts are the humans mainly involved throughout the research process and in the loop, when models fail or are unreliable. It has been said that under conditions of epistemic uncertainty, official expertise and lay expertise should not be seen as antagonistic but as complementary (Funtowicz and Ravetz 1990). It is these different types of expertise, including that embedded in ML algorithms, which interact with each other to form an assemblage in a "program of action" (Latour 1992, p.152). Therefore, we suggest viewing ML optimization as a constructed assemblage in which citizens, experts, and machines play different roles and exert epistemic agency at different points in time to pursue CS project epistemic goals.

Trust in ML outputs and redundancy of citizens. In this study, we focused on technologies using ML classification methods that infer patterns from training datasets consisting of labelled input–output pairs and classify new inputs into predefined output classes. These ML methods are rule-based systems that allow ML to represent expert know-what knowledge (Lebovitz et al. 2021). This type of knowledge is captured, for example, in the gold standard labels used to train and validate ML models. It is worth considering that if ML tools are trusted and taken-for-granted, and experts rely on seemingly accurate ML outputs over volunteers (including expert volunteers), these outputs may influence the socio-technical assemblage that generates the data on which the tools are trained (Faraj et al. 2018; Pachidi et al. 2021, Lebowitz et al. 2021). Therefore, to consider machines as unproblematic and their output as immutable mobiles (Latour 1990) implies that citizen scientists will play only a minor role in the long run.

One can speculate that ML in CS will allow citizens to focus on higher-level tasks by automating boring tasks. However, as Franzoni et al. (2021) argued, “to the extent that such work is limited in volume or requires additional knowledge and resources that pose barriers for crowd participants, there is a risk that CS becomes less inclusive by focusing primarily on expert volunteers” (p. 17). They acknowledge the risk that CS could become less inclusive if projects rely primarily on experts. While this may not be a concern from a ‘productivity’ perspective, it may limit CS’s potential to advance the non-scientific goals highlighted by the “democratization view” (Franzoni et al. 2021). Relying mainly on expert volunteers could reduce the diversity of current and future citizen scientists by diminishing their range of motivations and disengaging those citizens who want to contribute to science in their spare time and have fun, help science, or spend time outdoors (Geoghegan et al. 2016). Deriving personal meaning and value from participating is important to citizen scientists, who typically volunteer time and effort driven by intrinsic or social motivations and not for financial compensation (Sauermann and Franzoni 2015). However, how CS projects have to be designed to actually cater to diverse needs and expectations seems to be very much in a conjectural stage (Kasperowski and Hagen 2022). Suggestions to avoid disengagement and redundancy include allowing participants to contribute their task of interest even if the task can be fully automated so that their contribution can help improve ML performance; or incorporating new forms of citizen contributions to fill the gaps created by automation (Lotfian et al. 2021).

While these suggestions can be regarded as motivational by some, they can be seen as “over-engaging”, bordering on the unethical. As part of the debate over the ethical problems evoked by citizen science, there is an issue of “over-engagement”, which means being available for free work for science indefinitely (Kasperowski and Hagen 2022). Therefore, we argue that the opposite of becoming redundant can actually happen with the growth of ML in CS. The problem of induction may repeatedly call upon humans, both experts and citizens, in the loop of the process. The fear of AI and ML creating “undemocratic”, hierarchical, or epistemically stratified projects, must of course be closely observed. However, from the perspective of epistemic agency, hierarchy or epistemic stratification could be said to occur constantly in projects on a microlevel, as different actors are endowed with more temporal epistemic power during the course of a project.

5.3 Participation of citizens—or the lack thereof—and the use of ML

The AI industry is showing interest in developing solutions to global problems using AI in combination with citizen scientists. An example is a recent partnership between a team of IBM data scientists and the UN Environment Programme (UNEP) to overcome the challenges associated with citizen data and create a unified, global baseline dataset for measuring plastic pollution in line with UNEP’s Sustainable Development Goal 14 (Clemente 2020). Sloane (2020) contends that ML extends the agenda of the tech industry, which is focused on scale and extraction. The use of ML may exacerbate an “extractive” approach to citizen participation (Sloane 2020), by which data collection and classification remains the primary way for volunteers to contribute to the scientific goals of CS projects. The increasing use of ML in CS classification projects must therefore be related to issues of power dynamics and inequalities in terms of engagement and retention of volunteer participants.

CS wishes to be inclusive in terms of age, gender, ethnicity, geography and social class. However, CS participation is currently skewed demographically and geographically, with biases in age, gender, ethnicity, and socioeconomic status (Pateman et al. 2021). Participants in long-term projects, such as eBird at the Cornell Lab of Ornithology, have been shown to be predominantly highly educated, upper-middle class, middle-aged or older, and white (Purcell et al. 2012). The results concerning gender composition are mixed; however, some projects show a strong bias towards men (Hobbs and White 2012; Crall et al. 2012; Raddick et al. 2009; Wright et al. 2015). Other studies indicate that some projects offer opportunities for disadvantaged groups not otherwise available (Khairunnisa et al. 2021). There is an ethical imperative to involve a diverse group of participants to inform CS projects and provide access to their benefits (Mor Barak 2020).

CS participation is skewed not only in terms of sociodemographics but also in terms of actual contributions. Most contributions are being made by a few (Seymour and Haklay 2017). This lack of diversity in CS reflects different motivations and capacities and raises concerns about the representativeness of data and whether individual, societal, and environmental benefits are evenly distributed (Pateman et al. 2021). Deriving personal meaning and value from participating can be important to citizen scientists who typically volunteer time and effort driven by intrinsic or social motivations and not for financial compensation (Sauermann and Franzoni 2015). Diversity in participation remains a challenge, which the use of ML may exacerbate. The involvement of ML seems to be a case of what we call “designing for”, where citizens are not integrated into the design process from the beginning, but relied on to make the model (ML design) successful ex-post. It has been suggested that the involvement of citizens through CS, particularly during the research design phase, may help reduce bias in data and training annotations for AI, enable public shaping of AI, and foster a lifelong interest in science (Shanley et al. 2021). The long-standing interest among STS scholars of whether new technologies solve problems or rather manage and move them about; displacing, and making problems and some actors and their agency invisible or redundant; seems to reappear as ML and AI are combined with CS (Glaser et al. 2021).

We are left to wonder whether inequality in participation detracts from the promise to make science more democratic, both in terms of including more diverse people in doing science and in making science better aligned with the public interest (Strasser and Hacklay 2018). However, it would be contrary to some current proponents of CS to claim there are “objective” public interests that science can tap into continuously (c.f. Brown 2009). This standpoint seems too often inform voices from both policy and science, when expectations and social imagery of the availability and readiness of citizens to be mobilized into CS projects is produced. Our suggestion would be that the pursuit of such interests must be viewed as acts of performance, thus they are made and cannot be taken for granted.

The non-linearity of the HITL. In our ANT-inspired view, the constructed assemblage including citizens, experts, and ML is a complex effect resulting from mutual interaction and feedback loops, as exemplified in Figs. 5, 6, and 7. These assemblages can be seen as complex triangular systems of citizens-experts-technology in which relationships and loops need to be repeatedly “performed” by all the actors involved, or the assemblages dissolve. ANT is not the only framework that attends to these connections and interactions. Other theoretical endeavors, such as cybernetics (Wiener 1948), have explored complex feedback processes within networking and self-organization of systems. Both ANT and cybernetics aim to conceptualize complexity, they are both sensitive to the hybrid nature of phenomena, and they both emphasize system effects. However, as pointed out by Fenwick and Edwards (2010), a core difference between cybernetics and ANT is the latter’s orientation toward contingent practices and multiplicity. ANT provides a conceptual framework for analysing how the diverse entities of a classification project—including technologies—take a role through specific situated material–semiotic redistributions of expertise and epistemic agency. Uncertainty characterizes contingent practices. Humans are unlikely to act as “controllers or processors” of classifications in a linear way in the loops, as they may not repeat or reproduce the same exact actions under the same input. Even such fixed things as standards and protocols can be uncertain in practice. The loops we exemplified are not expected to be seamless, whereby ML fails to identify a pattern, citizens identify it, and experts feed the correct answer into the training data. These are expectations and possibilities that ML will perform its tasks, but there is no guarantee that citizens will unquestioningly take the assigned epistemic role and dutifully engage in checking errors, or filling gaps. Nor, for that matter, does it mean that ML itself will comply with experts’ wishes. Checking the correctness of classification can happen in multiple ways, depending on the material–semiotic network in which this task is entangled. Tracing the material–semiotic assemblage of checking data classification could reveal the continuity from one version to the other and thus the simultaneous singularity and multiplicity of the assemblage.

5.4 Limitations

This study has two main limitations. First, we relied solely on secondary sources without incorporating other methods (e.g., interviews) that could help reduce bias and compensate for the dearth of documents and their incompleteness. Our study may be limited by the use of the narratives reported in the documents, which represent the authors’ perspectives. Since most research papers tend to report on successful rather than unsuccessful projects, we are likely to have been exposed to mostly successful divisions of labour instead of those that did not work. Being aware of this potential bias, we have been careful not to use documentary evidence as a stand-in for other kinds of evidence that we could not produce using this method.

Second, our study may be limited by the small number and type of projects examined. The sample we used is purposive. Note that the selected projects reflect those that were documented at a particular moment in time, rather than being a truly representative sample of the population.

6 Conclusion

AI tools have long been the subject of concerns such as the effects on human employment and the potential for dehumanizing (Boden 1987). As AI is used in CS increasingly, questions should be raised about its impact on citizen roles. ML in CS classification projects, far from being deterministic in its nature and effects, may be open to examination. There is no guarantee that these technologies will replace citizen scientists, nor any guarantee that these technologies will provide citizens with opportunities for more interesting tasks. However, to assume that ML and other AI computational technologies can replace humans entirely in CS overestimates their current limited autonomy and “smartness”, as they still require the human intervention of experts and engaged citizens (Authors 2022).

The use of ML raises the question of which tasks will be most affected, and which will be relatively unaffected. This paper offers a descriptive account of the distribution of tasks between humans and ML in CS classification projects. It also presents how epistemic agency is acted out in terms of whose knowledge shapes the distribution of tasks, who decides what knowledge is relevant to the classification, and who validates it. Citizens and experts in CS classification projects are already affected differently by the use of ML, depending on the tasks they perform and the epistemic agency they exercise. However, currently no task can be fully handled by ML. Our analysis leads to the conclusion that the integration of ML into the socio-technical system of a classification project requires some form of relationship with humans at one level or another. Regardless of the advancement of ML, humans are likely to have an active epistemic role to play in certain decision-making loops that will affect ML operations. However, it remains to be seen what the role of citizens will look like in the future, how they will be able to exert epistemic agency, and whether they will work on higher level tasks. For example, a future study could use ethnographic methods to examine in more depth whether AI and ML technologies empower experts and disenfranchise citizens. Further studies could examine how citizens, experts, and algorithms co-evolve in these projects, and whether the content of tasks assigned to actors early in the design of projects shifts over time. Another topic might be to examine whether human–machine integration leads to skill-biased technological changes, such as ML replacing low-skill tasks. As the boundaries and distinctions between humans and machines blur, we may face unexpected obstacles, opportunities and questions worth exploring in future research.