1 Introduction

Micro-task crowdsourcing platforms, e. g., Amazon Mechanical Turk (MTurk)Footnote 1 or Figure Eight (F8) (formerly known as CrowdFlower and currently as AppenFootnote 2), enable researchers to reach large pools of participants, also called contributors or workers, for their experiments (Howe, 2006). The main advantage of using crowdsourcing in research is the access to low-cost digital labour from the platform’s large pool of available crowdworkers (Heer and Bostock, 2010). Researchers, called requesters in the crowdsourcing context, design their experiments to be performed as a batch of atomic micro-tasks. These tasks are commonly called HITs (Human Intelligence Tasks), since they require human intelligence to be performed effectively. Such HITs include market survey or image annotation for training AI algorithmic models (Gadiraju et al., 2014). In essence, a HIT consists of a web page, typically a form, requiring workers to input specific information or perform actions. Based on responses to a survey of 1000 workers on CrowdFlower, Gadiraju et al. (2014) proposed a categorization scheme for the most popular HITs in the platform, having identified task categories including information finding, verification and validation, interpretation and analysis, content creation, surveys, and content access.

The micro-task crowdsourcing process takes place as follows: First, a researcher designs a HIT and deploys it into a crowdsourcing platform specifying the parameters for its execution, such as the number of workers required and the relative payment. Then, the platform allocates the batch of work to several workers according to specific policies. Finally, the platform collects and aggregates the results sending them back to the requester. In addition to the workers’ payments, the requester also pays a service fee corresponding typically to 20-25% of the experiment cost to the platform.

Crowdsourcing lends itself to being used successfully in different areas, such as psychology (Gosling et al., 2015), social science (Auer et al., 2021), economics (Jacques and Kristensson, 2019), cognitive science (Stewart et al., 2017) and medical science (Petrović et al., 2020). More recently, crowdsourcing attracted the interest of several AI communities. Vougiouklis et al. (2020) conducted a series of crowdsourcing experiments to evaluate the quality of texts generated by machines. Finin et al. (2010) designed crowdsourcing tasks to collect named entity annotations for Twitter. Deng et al. (2016) improved an image recognition model with crowdsourcing via an online game. In Wan et al. (2019)’s study, traffic related data were collected via crowdsourcing to train vehicle route planning algorithms. Another notable example is Chimera, a large-scale classification model developed with the help of crowdsourcing to classify tens of millions of products at WalmartLabs (Sun et al., 2014).

1.1 Pricing of Crowdsourcing Tasks

While designing a task, requesters are called to make several choices. One crucial decision concerns the monetary reward to be given to the workers who complete the task successfully. Indeed, while an exiguous payment might not motivate the workers enough (Auer et al., 2021), a conspicuously large one might attract spammers or reduce the scale of limited-budget experiments. Different factors influence the reward policies of a task, including the amount of time required for its completion, its difficulty, or the worker skill set required (workers with special qualifications have to be paid more). Finally, additional costs, e. g., platform fees, consisting typically of a percentage of task payment, need to be taken into account while deciding upon the reward strategies. In the literature, some efforts can be found to develop tools to assist price negotiation in crowdsourcing. For example, Horton and Zeckhauser (2010) proposed hagglebot, a bot to serve as an automated negotiating agent for requesters.

1.2 Ethical Issues in Crowdsourcing

One topic that has attracted the research community’s interest concerns the ethical issues about workers’ conditions reported over the years. Hara et al. (2018) showed that workers were paid a wage of 4-6 USD per hour on the MTurk platform whereas the US federal minimum wage corresponds to 7.25 USD per hour since 2009. Crowdworkers struggle with extremely low hourly income, and this issue has been used as the motivation for some studies (Whiting et al., 2019; Saito et al., 2019). Moreover, requesters might refuse to pay crowdworkers for a completed job either maliciously or because of badly-designed tasks and error-prone result assessment systems (Martin et al., 2016; Silberman et al., 2010). Lease et al. (2013) and Silberman et al. (2010) investigated crowdworkers’ vulnerability to fraudulent tasks targeting their privacy and assets. Platforms often provide very limited information to crowdworkers about the identity of requesters and the quality of their tasks. While the amount of payment for a HIT is communicated to the crowdworker, an estimate of the expected time to complete it is missing. Most platforms also lack an estimate of the rejection rate of the requester who published that HIT batch, as well as general feedback from workers who have already completed HITs from the same batch. It is therefore arduous for workers to evaluate requesters’ reputations and to assess the potential hourly wage before starting tasks (Fieseler et al., 2019; Martin et al., 2014). On the contrary, the information available for requesters is more detailed since they can typically access workers’ performance history and qualifications (Kingsley et al., 2015).

1.3 Research Standards in Crowdsourcing

The academic community has responded to the aforementioned ethical issues by calling for crowdsourcing research standards to be implemented by universities, journals, and grantmakers. In particular, university ethics boards have been encouraged to design guidelines for the use of crowdsourcing in research, considering the specific platform- and labour-related issues of crowdworkers, like the lack of access to traditional employment protections (Williamson, 2016). Even though ethical guidelines vary by institution and country, the fairness of the payment should be an essential condition that needs to be satisfied (Silberman et al., 2018a). These concerns are significant when using paid micro-task crowdsourcing platforms, where crowdworkers consider monetary rewards as the driving motivation for their participation (Martin et al., 2017), as opposed to e. g. in citizen science or volunteer-based crowdsourcing, where the driving motivation is intrinsic (Leimeister et al., 2009; Hossain, 2012). In particular, to guarantee transparency the amount of the participant reward should always be clearly specified in the design phase of a research project (Schmidt, 2013). As illustrated in this work, these issues are particularly pronounced in F8, where the compensation scheme seems to be unclear, changing over time, and linked to questionable activities such as gambling.

There have been several solutions in the existing literature that addressed the aforementioned ethical issues of crowdsourcing. Whiting et al. (2019) proposed Fair Work, a tool that computes payment to ensure a minimum wage for the workers. Saito et al. (2019) discussed how crowdworkers struggle with extremely low hourly income and proposed TurkScanner, a machine learning approach which predicts worker completion time to compute a fair hourly wage. Certainly, making crowdsourcing more ethical is not just about monitoring the fairness of rewards, but also about improving the reputation system. Gaikwad et al. (2016) attempted to encourage greater ethical standards among platform members by developing a reputation system that provides more honest private evaluations for both workers and requesters. In addition, reducing the risk of unethical behaviour is another approach. For instance, Fan et al. (2020) helped crowdworkers reach a more ethical hourly wage via a novel crowdsourcing reward mechanism, so their risks of being underpaid could be shared within a worker group.

1.4 Motivation - Opaqueness of Reward Schemes

The solutions discussed above represent an important first step in raising awareness about the importance of compensating workers fairly and in providing useful tools to achieve this goal. Unfortunately, when it comes to practice these solutions present some limitations. Indeed, when choosing a HIT reward scheme, requesters can only rely on the limited information provided by the platform about the actual reward process. Moreover, some platforms make use of recruitment channels, namely third-party services that work as an intermediary between platforms and workers (IG Metall, 2017). The cost of the recruitment channel service is then included in the worker’s cost reported by the platform to the requester. Consequently, the requester cannot have full knowledge about which percentage of the reward will effectively reach the crowdworker. For the reader’s convenience, we define in Table 1 the costs that requesters have to cover to run a crowdsourcing job.

Table 1 Cost breakdown of a crowdsourcing task.
Fig. 1
figure 1

F8 task launch panel. The cost indicated as “contributor judgments” does not consider the channel commissions: the crowdworker might receive a lower reward.

By observing F8, we noticed that the commission amount varies over time and across channels. More importantly, such commissions are not explicitly provided when launching a task (as shown in Figure 1), and the exact amount is only revealed after a task is completed, as discussed in Section 6. Requesters might not be aware of the effects of the reduced rate on the workers’ actual payments, which can fluctuate over time and potentially undermine the fairness of the compensation the workers receive. To the best of our knowledge, no study has analysed channel commissions in crowdsourcing platforms.

This paper investigates the distribution and variation over time of channel commissions in F8. Since the amount of such commissions is often not available to the requesters, we ran a survey task asking the workers in F8 how much they would be paid for the completion of the ongoing survey, and compared the payment amount for each channel. This data has been compared with the historical channel information in the marketplace to build a picture of the recruitment fee dynamics. We conducted a comparative analysis of the demographics, channel distribution, and reward scheme of 53065 tasks.

The rest of the paper is structured as follows: Section 2 presents a literature review of the ethical issues around payment in crowdsourcing tasks, as well as on motivation and incentive rewards effectiveness. Research questions are elaborated and explained in Section 3. In Section 4.1, a brief summary of the historical metadata is presented. Moreover, the reward schemes for each of the top five channels are explained in Section 4.2. The demographic information for the workers in the top five channels are discussed in Section 5, including their work experience and task acceptance criteria. The worker reward loss over time for each channel and task payment range is analysed in Section 6. In Section 7, the impact of unethical payment behaviour is explored from the perspective of the worker and the requester respectively in connection with our findings. Section 8 draws the conclusions of this work.

2 Literature Review

In this section, we illustrate related work on the ethical issues concerning rewards in paid micro-tasks. Furthermore, we also review the studies on the impacts of the reward scheme on workers’ motivation and quality of outcome.

2.1 Ethical Issues with Crowdsourcing Rewards

The globalisation and cross-specialisation of crowdsourcing expose it to complex ethical judgements, as different countries and domains have their own unique regulations and ethical policies. Specifically, the US and India dominate the crowd labour force (Difallah et al., 2018; Kazai et al., 2013), and the workers from these two countries have different subjective perceptions of the fairness of payments from doing micro-tasks due to their local economic levels, and crowdsourcing practices are governed by different laws (Martin et al., 2016; Gellman, 2015). In addition, crowdsourcing projects from academic institutions are subject to higher ethical standards than those from commercial institutions because they are reviewed by ethics committees and governed by strict ethics guidelines (Shmueli et al., 2021; Gleibs, 2017; Martin et al., 2017).

Using crowdsourcing to collect data in social science studies brings together the benefits of having a broad demographic distribution and fast responses. However, many researchers have raised the issues of improper payments and low rewards, especially in studies where researchers do not have an extensive track-record in the domain of crowdsourcing task design, and particularly for those studies that regard crowdsourcing as a means to collect data and not as the main subject of study. For example, Callison-Burch (2009) claimed to have paid crowdworkers “a grand total of 9.75 USD to complete nearly 1,000 HITs”. Andersen and Richard (2018) discussed the pay rate in social science experiments that had been carried out using crowdsourcing. Two experiments with different task lengths were conducted to measure the effect of payment rate on the quality of the workers’ output. The findings of this study confirmed that the variance of pay rates did not have a significant effect on workers’ output quality, but they did have a different kind of effect, such as the completion time. A similar study by Haug (2018) explored ethical issues in collecting data for survey research using crowdsourcing. Several studies considered using crowdsourcing as a fast, cheap, and effective tool for managing data for social science. However, others (Borromeo et al., 2017; Williamson, 2016; Fort et al., 2011) have raised concerns, claiming that low pay rates could challenge the ethics of the data collection process for such studies. Haug (2018) pointed out that in their experiments raising the payment increased the risk of having workers who were used to doing the same type of tasks, potentially affecting the bias of the collected data.

Paul and Lars (2018) developed a model to test the fairness of the payment during the task execution and after the task submission. Goel and Faltings (2019) discussed the fairness and the workers’ trust in crowdsourcing platforms. They proposed a mechanism that used peers’ answers to verify workers and reduced the number of gold questions needed in the task. Archambault et al. (2015, pp. 27-69) discussed the ethical issues around the use of crowdwork in academic research. The authors recommended following the guidelines provided by the Dynamo (Salehi et al., 2015) project and the Crowdworking Code of Conduct (Graham et al., 2020) as a guide for the researchers planning to use crowd tasks in their work.

In most of the studies around payment issues, researchers strive to pay attention to fair payments when using crowdsourcing tasks (Brawley et al., 2016; Ipeirotis, 2010a). Silberman et al. (2018b) noted the ethical responsibility of paying workers fair wages and discussed the importance of money as a motivating factor for most of the workers as it had been considered in previous studies (e.g., Ross et al., 2010; Ipeirotis, 2010b; Ho et al., 2013; Ye et al., 2017; Finnerty et al., 2013). Moreover, they pointed out that fair payment led to high-quality performance from the crowd. Researchers tried to develop models or implement criteria for calculating a fair payment depending on task type and expected completion time. However, even when the requesters pay a rate in accordance with the minimum wage, workers might still consider the payment unfair: we refer to Section 7.1 for a discussion of this phenomenon.

There is also an urgent need for transparent channel commissions of the platform. First, crowdworkers are considered independent contractors by the platforms. In other words, they are not granted the same protections as ‘traditional’ employees including minimum wage, employer-sponsored health care, or dismissal protection. As a result, their income is typically unstable and below the local minimum wage (Hara et al., 2018). Furthermore, while crowdworkers may be legally paid less than the local minimum wage, there is a growing consensus that paying too little for research-related crowdsourcing tasks is unethical, and that such tasks should pay at least the minimum wage (Shmueli et al., 2021; Qiu et al., 2019; Haug, 2018). In brief, as crowdworkers lack the security of a platform for their own basic income, researchers are expected to pay them with the full consideration that they are at far greater risk of low income than employees with guaranteed employment contracts.

In our study, we examined the existence of the above issues and focus on intermediary channels and the gap between the actual payment made by the requester and the payment received by the workers on the F8 platform.

2.2 Motivation vs. Reward in Crowdsourcing

Mason and Watts (2009) conducted one of the earliest studies that examined the effectiveness of financial incentives on crowdsourcing task outcomes. The authors discussed the impact of increasing the task rewards on the workers’ expectations of the task, and the high rewards were found to make the tasks more attractive to the workers but did not increase the quality of the outcome. A similar study by Borromeo and Toyama (2016) compared the performance of an unpaid crowdsourcing task (self-hosted) with a paid one (via F8). The results of the task used were highly similar in the paid and unpaid conditions, but it took longer to finish the unpaid tasks. In contrast, Kost et al. (2018) defined incentive rewards as one of the four sources of experience meaningfulness for the workers. It was discovered from their experiments that the level to which the payment affects the workers is dictated by their real-world employment status and how much they rely on the crowdsourcing work.

In summary, it can be shown that the impact of the payment cannot be ignored even if it may have only a slight effect on the workers’ performance. Ye et al. (2017) investigated the impact of the payment amount on the workers’ performance in two types of crowdsourcing tasks. They introduced the concept of Perceived Fairness in Pay (PFP) and measured it in their experiments. This study aimed to clarify the relationship between fair payment and the quality of the results.

More studies investigated extensively the effect of fair payments and the loss of time in crowdsourcing tasks. Researchers discovered a significant gap between earnings, amount of time, and effort required to accomplish a task. They warned the academics and all requesters in general that discarding these details could threaten the attractiveness of crowdsourcing jobs. Hara et al. (2018) discussed workers’ earnings on MTurk and considered the unpaid time, including the time spent finding a task and working on tasks that are later rejected. The authors expressed their concerns about such wasted time, which ultimately affects the hourly wage.

Borromeo et al. (2017) discuss the implementation and evaluation of transparency and fairness principles on a crowdsourcing platform. On the one hand, the authors discussed the fairness in task assignment, completion time, and payment. On the other hand, they recommended having a special framework to encourage a more transparent process for requesters and platform developers. Ho et al. (2015) suggested a different payment scheme such as payment per unit and a bonus for achieving a specific target.

Furthermore, other researchers showed that workers could be motivated and work on a task with low or unfair payment or even work as volunteers if the task has deep meaning to them. Some researchers claimed that workers would respond to good humanitarian causes such as tasks for World Health Organisation (WHO) or disaster responses. For example, Spatharioti et al. (2017) pointed out that workers tend to do more work in, as the authors refer to it, a “meaningful task” such as a disaster response task.

Most studies used MTurk to analyse the correlation between the quality of the results and payments. Most of the work in MTurk is “performance-based” which means workers tend to submit a high-quality piece of work because they are afraid of rejection if their work does not meet the task criteria or the requesters’ expectation (Ho et al., 2015). On the other hand, on the F8 platform low payment could affect the workers’ performance differently. Since workers know that they are getting paid regardless of the requesters’ job acceptance decision, low payment might not motivate them to expand efforts to submit high-performance results. Our focus in this study is the F8 platform and the variation of payment due to different commission rates taken by the channels. Based on an analysis of more than 53k HITs from previous crowdsourcing projects over four years, we identify the most common channels and explain how they operate.

3 Research Questions

In this paper, we focus on the transparency of the crowdsourcing marketplaces and the channels used to recruit crowdworkers. The business model of traditional crowdsourcing platforms relies on fees applied to the amount paid from requesters to workers for task completion. These fees are fixed by the platforms and are stated clearly to the requesters. In F8, recruitment channels are part of the value chain as well, and this has led to their unique business model. Such business models are not always transparent, and requesters struggle to comprehend how they operate to fairly compensate crowdworkers for completing their tasks. Our contribution aims to shed light on the policies of these channels, including recruitment rules, rates, and methods of payment. Our investigation focuses on F8 as a widely used crowdsourcing platform that offers its own in-house channel called Elite, and a number of external recruitment channels. The requesters can decide which recruitment channels to include for completing a task during its configuration. In addition, all channels are included by default.

Our research questions are:

RQ1:

What is the recruitment and reward model of such channels?

RQ2:

What is the demographic composition of such recruitment channels?

RQ3:

How do the recruitment commissions change over time and over the different channels?

RQ4:

What is the impact of the recruitment channel choice on working conditions, e. g., on the hourly wage?

The first research question is addressed in Section 4.2, which provides a description of the reward scheme across the top five recruitment channels. The second research question, concerning the demographic composition of the channels, is critical in understanding the reliability of sampling when conducting experiments via crowdsourcing, and it is explored in Section 5. The third research question is addressed in Section 6 and is of fundamental importance to assess potential ethical issues in academic research. In addition, to answer the fourth research question, in this section we further explore the impact of different levels of reward transparency in recruitment channels on the labour conditions of workers in F8 in this section.

4 Data Collection and Research Methodology

The F8 platform was chosen as the primary focus of the study because it is one of the most popular and widely used crowdsourcing platforms/marketplaces currently utilised by academics. In order to identify the most popular F8 channels and collect reliable information from them, two issues need to be tackled: (i) the fluctuation of channel commissions and (ii) the potential inconsistency of F8 channel commission reports over time. To address (i), we built a metadata archive from a historical collection of F8 tasks. To address (ii), we cross-checked and validated F8 metadata with an ad-hoc survey that allows us to validate the consistency of the reported data. This survey also allowed us to collect additional information on the worker per-channel demographics and working profile.

4.1 Historical Metadata

We analysed a collection of 53065 tasks from 133 different jobs carried out across 38 months (from June 2015 to August 2018) from 6803 unique workers and 110 different countries. To create such an archive, we put together job results collected from multiple requesters. The vast majority of these results were collected from tasks that did not have any restrictions on worker expertise level or geographic location. It is important to keep in mind that this data collection uses an opportunistic method to compile all tasks that were available to us at the time as a form of meta-analysis, and as a result, sampling discontinuity over time may be present. While we refer to other works for a systematic approach (e. g., Difallah et al., 2018) it is worth noting that the primary objective of this study is to identify issues and anomalies in the recruitment channels’ payment schemes as reported to requesters by the platform, thus requiring less stringent statistical requirements on the underlying population. As described in detail in Section 5, the presence of such anomalies has been revealed through validating the channel commissions reported by F8 using an ad-hoc survey.

Some of the metadata, like the channel commission, was not available directly from the task output. For this reason, we built a web scraper to download additional metadata that was only presented in the F8 requester web interface. This allowed us to collect the F8 reported channel commission over time. By scraping the channel commission metadata, the actual rewards received by workers for completing tasks could be generated in Table 4. In addition, the scraped channel commission metadata helped us to study the fluctuation of channel commissions over time (Figure 13) and the variation in the size of rewards (Figure 14, 15). A summary of the information contained in the Historical Metadata dataset is shown in Table 2.

Table 2 List of variables in the Historical Metadata dataset.

4.1.1 Most Popular Channels

Our main focus is the popularity of the channels among workers in relation to our Historical Metadata dataset. We computed the number of unique workers per channel, shown in Figure 2. To facilitate further analysis and ensure reliable statistics, in the remainder of this work we focused on the five most used channels, NeoBux, Elite, Clixsense, InstaGC, and Swagbucks, since all together these channels count 47998 units (93.6% of the total), and 6274 unique workers (97.2%).

Fig. 2
figure 2

Number of unique workers per channel.

Table 3 Third-party channels’ reward scheme summary.

The recruitment and reward model of the top-five channels are explained in Section 4.2. In addition, ethical issues of the Paid to Click (PTC) reward structure of some channels are discussed in Section 7.2.

4.2 Reward Model

In this section, summarised in Table 3, the reward model of the top third-party F8 channels is illustrated. We obtained this information by registering as workers in the channels and observing their (often opaque) reward model.

Some of these channels are Paid To Click (PTC) services, in which users are paid by clicking on ads banner, reading and interacting with advertising emails, or by watching video ads. In addition to traditional PTC services, other ways to earn money in these channels include playing games, cash-back systems, e-shop sign-up offers, and micro-tasks/online surveys. This study will only focus on the latter because it is the only section that is exposed to F8.

4.2.1 Elite Channel

The EliteFootnote 3 channel is the official channel of F8, making it the most straightforward way to access a task in F8 directly from the platform web page. Unlike the other four channels considered in this study, Elite  does not require any channel commission: workers are rewarded with the full amount paid by the requesters. Elite  channel has a qualification system that works as follows: first, workers have to complete successfully at least 100 test questions without rewards to get qualified for working on the paid tasks; then, they are assigned a qualification level based on their accuracy. The qualification levels are: level 1 for workers with at least 70% accuracy, level 2 for workers with at least 80% accuracy, and level 3 for workers with at least 85% accuracy. This effectively creates a barrier for workers struggling to complete such tasks to access this channel (e. g., non-native English speakers).

4.2.2 NeoBux Channel

As shown in Figure 2, NeoBuxFootnote 4 is the most used channel in terms of both units and workers. NeoBux  is a PTC platform established in 2008 which offers free registration for Standard membership. NeoBux  pays members for carrying out simple tasks like clicking on ads. The number of clicks a member can do daily is limited and workers are required to be active daily to avoid suspension or cancellation of their membership. Workers can earn more by upgrading to Golden membership for 90 USD per year, for which they will get up to 2000 clicks per month at 0.01 USD, and rent referrals/subcontracting crowd work (where workers can spend credit to hire other workers’ clicks). Workers can withdraw it to their PayPal and Payza accounts with a 2 USD minimum withdrawal limit for the first time. After that, they will be allowed to withdraw again when they reach a fixed minimum amount of 10 USD. The crowdsourcing tasks come as mini jobs for the workers to earn extra money, but it is typically not the main source of income for NeoBux  users.

4.2.3 ClixSense Channel

Established in 2007, ClixsenseFootnote 5 is one of the most popular online PTC platforms. On this platform, a weekly contest is held, in which the top ten workers (those who complete the most tasks) compete for a total prize pool of 100 USD, with 50 USD going to the best worker. The tasks include completing surveys, testing new products and downloading new apps, F8 tasks, watching videos, etc. Clixsense  offers a Standard and a Premium membership option. The difference between these two options is the percentage of money received from doing the daily checklist and the amount earned from their referrals. Members are assigned referral links, and a worker will receive a 20% commission on the earnings of their referrals at Clixsense. Payments are issued every Monday if the worker has earned more than 8 USD for Standard members and 6 USD for Premium members. As a motivation, this channel offers a 5 USD bonus if the worker earns 50 USD. The minimum reward for a task is 1 cent and if the worker completes a task worth less than 1 cent, they will not get paid in Clixsense  unless they complete another task for the same job.

4.2.4 InstaGC Channel

Established in 2011, InstaGCFootnote 6 channel, which is similar to Clixsense  and NeoBux  in terms of services and referral system, allows free registration. The only benefit of using it over the previous two is that the payout threshold is only 1 USD for 100 collection points. The payment is in the form of a gift card or a cash payment made through bitcoins or other electronic money transactions with a fee associated with the cash exchange process.

4.2.5 Swagbucks

This channel is related to SwagbucksFootnote 7 by Protege, a reward and loyalty operator that offers cashback and vouchers. Users can earn the so-called swagbucks (SBs), a virtual currency that can be used for online shopping directly or exchanged for US dollars. Users can earn SBs by using the Swagbucks  search engine, playing games, watching videos, shopping online, answering surveys, and completing F8 tasks (which is the means of income this study focuses on). The primary ways to redeem SBs are PayPal, Visa gift cards, and Merchant gift cards. For each 100 SBs, a worker can redeem 1 USD at the end of the month.

From the point of view of currency stability, as defined by Kumar (2009), SBs are classified as a gambling asset (Novotnỳ, 2018), meaning that the virtual currency possessed high idiosyncratic volatility, high idiosyncratic skewness and low price.

Swagbucks  associates with additional activities that potentially raise ethical issues when used for academic research: Swagbucks  reward users for subscribing and using gambling services. These types of rewards constitute the majority of the offers appearing in the discover section and in the inbox when using the Swagbucks  platform.

5 Survey

As the main experiment of this work, we restricted our focus to the five most popular channels as per the Historical Metadata dataset, as described in Section 4.1. We designed a survey, integrated it into a crowdsourcing task, and collected the answers from 60 workers for each of the five most popular channels. While this number might not be sufficient to reliably draw conclusions on the demographics, it has proven to ensure significance for the analysis of the channel commissions (as discussed in Figure 11 and related statistical tests), and a qualitative analysis that achieved saturation on the open-ended questions (Hennink and Kaiser, 2022; Fofana et al., 2020; Rowlands et al., 2016). We ran the survey as a F8 task and paid each respondent 0.5 USD: We chose this amount on the basis on the expected completion time obtained with a pilot experiment (3 minutes), with the goal of providing an equivalent hourly wage of 10 USD, above UK minimum wage. While we made sure to prevent single accounts to complete the survey multiple times, we cannot guarantee that two accounts in different channels belonged to different crowdworkers. This could be verified with forms of user tracking using permanent cookies (Klein and Pinkas, 2019). However, that is beyond the scope of this work and would require specific ethical safeguards. While the actual time spent on the survey could not be measured accurately (especially for channels where crowdworkers engage in different activities in parallel), the median difference between worker acceptance and submission of the survey was under 3 minutes for the channels InstaGC  and Swagbucks, and under 9 minutes for the other channels.

In the first part of the survey, we focused on the demographics of channels’ users, asking participants about their age, gender, education, experience as crowdworkers, employment, type of device used to perform crowdsourcing tasks, monthly income, and criteria of HIT acceptance. The second part of the survey investigated the rewarding factor, with questions about how and how much the workers get paid by each channel. These responses, together with the scraped historical data of F8 reported channel commissions described in Section 4.1, allowed us to reconstruct and validate the actual worker reward over time for each channel. Here is an illustration of the demographics information collected in our survey, please see Section 6 for a detailed analysis of the channel commissions.

5.1 Age

In the first question, we asked the participants to specify their ages by choosing from a list of six age ranges. Figure 3 shows that participants’ ages slightly differ across channels; e. g., 30% of NeoBux  users were younger than 25, whereas no participants in that age range were recorded for Swagbucks.

Fig. 3
figure 3

Survey responses on worker age.

Fig. 4
figure 4

Survey responses on worker gender.

5.2 Gender

We then investigated the participants’ gender. The responses we collected show different trends in terms of gender distribution over channels. As shown in Figure 4, InstaGC  and Swagbucks  workers are mostly females (more than 60% in both the channel), whereas users of Clixsense, Elite, and NeoBux  are mostly male (more than 70% of the users in each of these channels). This difference might be related to the target customer segment associated with InstaGC  and Swagbucks  voucher redemption schemes. It is important to consider these differences to avoid unintended sampling biases.

5.3 Education

The third question of our survey concerned workers’ education levels. We identified nine education levels, ranging from no education to doctorate. Figure 5 shows that for all channels the most common educational level is bachelor’s degree. The vast majority of the users is included between High school diploma and Professional degree, and the only noteworthy difference between channels is the ratio between bachelor’s and master’s degree (e. g., Clixsense  vs. Elite). It is worth noting that these results might contradict the narrative of a gig economy meant to provide extra income for workers at the early stage of their career, as also discussed in previous studies where this contested framing from platforms is referred as “beer money” (Bates et al., 2021; Berg, 2015; Tassinari and Maccarrone, 2020).

Fig. 5
figure 5

Survey responses on worker education.

Fig. 6
figure 6

Survey responses on worker experience as crowdworkers.

5.4 Experience as Crowdworker

We asked participants to indicate the number of years of experience as a crowdworker. Figure 6 shows that the vast majority of InstaGC  and Swagbucks  workers are experts, reporting more than two years of experience. The experience of the workers of the channels Clixsense, Elite, and NeoBux  is more evenly distributed.

5.5 Employment

We then asked respondents to specify their employment status. Figure 7 shows the distribution of the results. A general trend, consistent across channels, seems to indicate that the majority of participants reported being “Employed for wages” or “Self-employed”. A noteworthy difference is the case of InstaGC  and Swagbucks, where more than 58% of the respondent reported being “Employed for wages”.

Fig. 7
figure 7

Survey responses on worker employment.

5.6 Desktop vs. Mobile Users

Previous works have shown a trend in micro-task crowdsourcing in offering tasks optimised for desktop rather than mobile devices (Mea et al., 2015). Consequently, users find accessing their crowdsourcing platforms to perform their HITs through desktop devices more convenient. To investigate if this behaviour varies across the recruitment channels, we asked participants what kind of device they use to perform crowdsourcing micro-tasks. Results confirmed that only 2% to 5% of the workers perform micro-tasks from mobile devices, without noteworthy differences across channels.

5.7 Income

We asked participants to report their monthly earnings from crowdsourcing micro-tasks. Figure 8 shows the reported monthly income for each channel.

Fig. 8
figure 8

Reported monthly earning (log scale).

Fig. 9
figure 9

Survey responses concerning the criteria used to accept beginning to work on a task.

There was a statistically significant (with threshold \(p=0.05\)) difference between groups as determined by one-way ANOVA (\(F(4,295) = 14.885\), \(p < 0.001\)). A Tukey post hoc test revealed that Swagbucks  reported monthly earnings are statistically different from all other channels. Additional significant differences were reported only between InstaGC  with Clixsense  and Elite.

Fig. 10
figure 10

The F8 monitor shows details about a the contributions of workers in a survey’s assignments. The “Amount” column refers to the payment allocated by the platform for the resolution of the assignment, whereas the “Amount (in channel currency)” column concerns the amount due to workers. We blurred worker’s identifier and IP address for privacy reasons.

5.8 Task Acceptance Criteria

Then, we focused on workers’ task acceptance criteria. Participants could choose more than one answer. As shown in Figure 9, for all channels, the reward amount has the greatest influence on the decision to accept a task. Task difficulty, completion time, and the interest aroused by the task are also a factor taken into consideration. It is also noteworthy that for NeoBux  and Swagbucks  some participants reported that the task was provided to them by the channel, without the possibility of choosing another one. The criteria used by the channels to decide which tasks to assign are not publicly disclosed. This approach is in stark contrast with Elite  and MTurk, where a search engine is provided to the crowdworker to freely select the HITs to complete.

6 Channel Commission Analysis

In this section, we present the analysis of the actual worker reward over time per channel, obtained by first comparing the worker self-reported values with the amount paid by the requesters, and then using these values to validate the scraped historical data of the F8 reported channel commissions.

6.1 Survey Completion Reward

We paid 0.5 USD (plus the 20% for the F8 platform fee) for the completion of the survey. Every channel, with the exception of Elite, applied an additional channel commission.

The F8’s dashboard, depicted in Figure 10, shows the final amount to be paid to the worker for the completion of a task. The payment includes both the platform fee and the channel commission. It is worth noting that the latter cannot be accessed by the requester in advance, since they get published in the dashboard only after the completion of a task. Moreover, they change over time. Interestingly, the values reported present some inconsistencies: for example, at the time of the survey the platform reported for Clixsense  a worker compensation two order of magnitude smaller than the ones of the other channels.

To validate, and possibly correct, these misalignments, in the survey we asked the following question: “How much money will you earn carrying out this task?”. The responses are shown in Figure 11. The black bold line represents the median of the distribution of the workers’ answers. Despite some outliers, the worker-reported rewards match those reported on the F8 dashboard, after a multiplicative correction of the case of Clixsense  where the amount reported in the dashboard was 0.0035 instead of the worker-reported value of 0.35 USD. Moreover, the presence of noise in NeoBux  interquartile range indicates a slightly higher disagreement in workers’ responses, potentially caused by the more complicated reward procedure of these channels, as described in Section 4.2.

Fig. 11
figure 11

Payment per channel reported by the survey’s participants who reported the channel currency as USD (231 out of 300 participants). On top of each box, the number of valid responses collected is shown. The red circles indicate the amounts reported by F8 (net of channel commission). Circles overlap exactly the medians (and thus are well within the IQR), suggesting that the channel commissions reported by the participants reflect those declared by F8. It is also worth noting that the dispersion is very low, as the IQR is equal to zero for all channels except NeoBux, where it is lower than 5c.

Table 4 Worker reward (USD) net of the channel commission, also reported as the percentage of requester payment, as obtained from the F8 dashboard for the task corresponding to the survey.

The effective rewards received by the workers per channel are summarised in Table 4. The channel Elite, which does not apply any commission, is the only one where the workers get the full amount we paid consisting of 0.5 USD. Because of the channel commission applied, the worker payment by the other four channels is lower: 0.37 USD by NeoBux, 0.35 USD by Clixsense, 0.27 USD by InstaGC, and 0.2 USD by Swagbucks.

Fig. 12
figure 12

Workers reported different delays across channels in the withdrawal of their reward after completing a HIT.

6.2 Withdrawal Delays

We focused on the type of reward workers get while using different channels. Around 60% of workers in InstaGC  and Swagbucks  reported receiving the reward as an instant electronic payment (e. g., Paypal), while over 35% of workers of the rest of the channels claimed of receiving money with a bank transfer, thus with additional delay.

As shown in Figure 12, the majority of workers in Clixsense  and Elite  claimed that it took a few days for their withdraws to complete. In comparison, it took only a few minutes for most workers in InstaGC  to withdraw rewards from their accounts. This big difference on their withdrawal delays could be due to different reward methods that workers in each channel prefer (Table 3). The differences in reward delays is compounded by the use of multiple withdrawal methods and currencies (including cryptocurrencies) for some channels.

6.3 Reward Loss Over Time

As shown in Figure 13, channel commissions are subject to fluctuations, in some cases of considerable amounts e. g., for NeoBux  and InstaGC. Nevertheless, the overall trend is confirmed: NeoBux  compensates workers with 75-80% of the original payment, Clixsense  and InstaGC  the 60%, and Swagbucks  is the channel that takes the highest commission since it pays workers with only 40% of the money initially allocated.

Fig. 13
figure 13

Ratio of requester’s payment actually received by the worker observed from March 2015 to September 2019.

Fig. 14
figure 14

Reward ratio received by the worker vs amount paid by the requester

6.4 Worker Loss Per Channel by Reward Size

The channel commissions are also influenced by the amount of the reward. As shown in Figure 14, channels tend to retain a larger part for smaller rewards.

A use case scenario can simulate the workers’ loss based on which channel they choose to use and the average reward size. Using the values from the Historical Metadata dataset, we estimated the amount of money lost per year, grouped by channel and requester payment bracket. The results, shown in Figure 15, indicate a sizeable cumulative loss, ranging from about 20% to 60%.

6.5 Workers Feedback

Workers expressed an overall positive sentiment towards the goal of the survey, with remarks about the need of demonstrating the “injustice” and unfairness of the payment process, especially because workers often feel that the relationship between employers and workers is quite “unbalanced”. Despite such issues, some workers expressed their gratitude towards the opportunity of micro-task crowdsourcing, which has been a “lifeline” in moments of financial uncertainty.

Fig. 15
figure 15

Mean ratio of amount paid by the requester to actual payment received by the worker computed on Historical Metadata dataset from 2015 to 2019.

Moreover, several workers pointed out that, when working on crowdsourcing tasks, instructions sometimes are ambiguous and not clear, which leads them to abandon the task or submit wrong answers. Workers also pointed out that the number of tasks on the platform has decreased over time and the platform has been suffering from many technical issues.

7 Discussion

We discuss here our findings from both requester and crowdworker perspectives, consider their implications in the micro-task crowdsourcing ecosystem, and point out potential solutions and connections with related academic work.

7.1 Impacts of Unethical Crowdsourcing

7.1.1 Underpayment, Unfair Workload and Lack of Transparency for Workers

Crowdworkers are exploited by hidden rules on the platforms, such as being encouraged to complete a large number of zero-reward HITs in order to obtain higher reputations (including the number of tasks completed and task approval rate), which makes them more likely to receive high-reward tasks in the future (Gupta et al., 2014). Our finding also reveals another opaque aspect of crowdsourcing marketplaces, which is the widespread presence of channel commissions. Worse still, crowdworkers are generally unaware of the existence of channel commissions, and thus have been subject to hidden exploitation for a long time. Thinking in terms of the opposite of this study’s intention, it may be reasonable for channels to charge commissions as an incentive for recommending tasks to channel members. However, it is worth thinking more deeply about how much commission is ethical and its impact on the overall market.

As one of the reasons for underpayment, badly designed tasks, technical errors, and interface design errors (McInnis et al., 2016) made by the job requesters may confuse workers and result in extra time, efforts, and even failure of submission or increased risk of rejection (Gadiraju et al., 2017). Even though they are paid, crowdworkers may need to spend additional time and effort searching for tasks, learning how to do those they are not familiar with, and waiting for the response to their questions from the requesters. Our findings confirm the existence of payment delays within the channels such as Clixsense. This indicates that payment delays come not only from job requesters who decide to accept or reject the results but also from the channels’ own payment policies. A worker should be paid for the work done when the requester approves their submission. This delay contributes in deprecate the quality of the crowdwork experience.

The lack of understanding of channel commissions by crowdworkers identified in this study has led us to pay attention to the lack of transparency of information from the workers’ perspective on the platform. Crowdsourcing platforms have always tried to avoid the traditional ways of human interaction in the work environment, such as anonymising members and limiting the ways of interaction among them (Martin et al., 2017). As a result, the extremely low quality of communication directly undermines interpersonal trust, gradually causing a stripping away of the ethical guidelines of traditional work from crowdsourcing platforms. This could in turn contribute to a culture of extremely high channel commissions such as InstaGC and Swagbucks.

7.1.2 Reputation and Data Quality Risks for Requesters

Unethical crowdsourcing can create challenges for the requester or the institutions they belong to. As news of workers’ exploitation spread through communication and rating systems like Turkopticon and TurkerView, the reputation of the requester is harmed, making it more difficult for them to recruit workers in the future (ChrisTurk, 2022; Hanrahan et al., 2021; Gaikwad et al., 2016; Salehi et al., 2015; Irani and Six Silberman, 2013). Moreover, rating systems to keep crowdsourcing platforms accountable are gradually being developed (e. g., the Fairwork project by Fredman et al., 2020), potentially making virtuous platforms more attractive.

An increasing number of studies is looking at whether the use of data collected via crowdsourcing differs between business, public use and academia, and whether this difference creates ethical challenges for both the data provider and the demand side (Gleibs, 2017; Vayena et al., 2015). However, the market’s over-reliance on a single crowdsourcing platform leads to an inevitable rise in platform fees such as MTurk, thus challenging the fairness of payments to the crowdworkers in commercial and academic crowdsourcing projects (Haug, 2018; Gleibs, 2017). Therefore, equal attention needs to be paid to the fairness of the compensation given to workers or participants in commercial and academic crowdsourcing projects, and the difference in the treatment of this ethical challenge between the commercial and academic job requesters. Academic institutions keep a higher degree of ethical standards for worker compensation than commercial institutions (Shmueli et al., 2021; Gleibs, 2017). Moreover, academics have been actively looking for ways to improve payment fairness for crowdworkers (Fredman et al., 2020; Qiu et al., 2019; Whiting et al., 2019). As a result, academic job requesters often pay more than commercial job requesters (Rea et al., 2020).

A consensus agreement on the correlation between pay level of crowdworkers and data quality has still not been reached (Auer et al., 2021; Litman et al., 2015; Buhrmester et al., 2011). This is probably because the data quality is influenced by multiple factors, not just compensation. In other words, although the workers are not paid a fair reward, they might still maintain a high level of performance due to the penalties set by the platform (Auer et al., 2021). However, maintaining ethical rewards can help improve the data quality of those who treat the rewards as a primary source of income (Litman et al., 2015). In addition, the impact of pay level on worker satisfaction and turnover is clear, which in turn can affect the willingness of workers to continue working for the requesters or even refuse to continue working for the requesters altogether due to low rewards (Kees et al., 2017). And after being rejected by quality workers, job requesters may end up having no choice but to hire workers with lower-quality responses to complete the task, which in turn may reduce the quality of the work.

Based on the findings of this study, we encourage commercial job requesters to be fully aware of the risk of underpayment to workers arising from the floating channel commissions and to maintain a sufficient ethical standard of payment. This will also help to ensure the quality of the data and attract sufficient workers for continuous participation in the project (Auer et al., 2021; Litman et al., 2015).

7.1.3 Potential Solutions to Unethical Channel Commissions

One possible solution to help reduce unethical channel commissions could be to encourage workers and requesters to share the amounts they receive and pay for the same task through a browser plugin or script, similarly to other semi-automated “sousveillance” tools proposed in the literature (e. g., Checco et al., 2018). Designing such a tool would need to take into consideration potential issues in the data collection and reporting process, especially because channel commission change over time. This tool could calculate and share with the users the percentage of commission charged from a specific channel, which in turn could be monitored over time. It could be used to create a leaderboard for channel commissions based on monitoring data, thus encouraging both workers and requesters to choose the channels with more reasonable commissions and reducing the extent to which they are exploited. The potential of this solution is not only to help facilitate ethical channel commissions, but also to be a useful attempt to promote cooperation between workers, and even between workers and requesters, in sharing information and thus improving unreasonable policies on the platform.

7.2 Ethical Considerations of PTC Services

Another point worth mentioning is the questionable nature of the PTC hierarchical payment scheme: users can make use of rent referral, that is effectively similar to subcontracting, where they can bet on the productivity of other users by renting their clicks for a set period of time. However, users need to pay a membership fee to gain access to advanced rent referral options, as well as pay their own money in the hopes of a potential return, bringing the platform dangerously close to a Ponzi Scheme (George, 2018). The functionalities of these platforms are close to those of Traffic Monsoon, where the main source of revenue was coming from the users that joined the platform as PTC workers. This practice caused a legal action from the USA Securities and Exchange Commission (SEC) (Penman, 2019).

While we focus on the crowdsourcing revenue source of these channels, we cannot dismiss the potentially unethical nature of these companies, both towards the advertisement systems (by gaming the advertisement statistics with ungenuine clicks allowing them to market themselves as a successful advertisement service) and towards the workers, that inevitably lose money while working under a complex pyramid system (George, 2018).

8 Conclusions

When budgeting for crowdsourcing tasks, requesters need to consider the overall cost of the commissioned task, which is often represented by the crowdsourcing platform as the sum of the platform fee and the contributors’ reward.

However, crowdsourcing platforms like Figure Eight, can make use of outsourcing companies (external channels for crowdworkers recruitment). Some of these channels will withhold part of the reward as channel commissions and pose restrictions on accessing such rewards. While requesters can select which channels to include in their crowdsourcing tasks, no information about the channels’ policies is provided. Even more importantly, the crucial information about the amount of channel commission is only revealed by the platform after the job is completed, and its value fluctuates over time. These practices make it extremely difficult to provide guarantees on the amount and modalities of compensation provided to the workers. These guarantees are required in a variety of situations, including data collection that requires compliance with ethical guidelines.

In this paper, we combined four years of historical data and an ad-hoc survey to identify the currently most popular channels, collected information about their policies and demographics, and investigated the gap between the reward paid by the requester and the part actually received by the workers. The survey allowed us to highlight the differences among workers recruited by different channels in terms of gender, years of experience as crowdworkers, and monthly earnings. Such differences should be taken into account when planning a research project, as they could influence the sampling process and may cause unintended biases.

The results of our investigation into channel commissions indicate an imbalance in the treatment of workers due to differences in channel policies. We showed that out of the top five channels only one, Elite, does not charge additional platform fees because it is the one owned by the platform itself. Workers who were surveyed indicated that they were receiving unequal payments and that they were unaware of the discrepancy between the intended amount by the requester and the amount they actually received due to channel commissions. We observed that some channels provide a variety of services to the workers, and doing a crowdsourcing task is only one of the extra jobs that they can do to get extra rewards or points. Furthermore, it has been discovered that some of the most common channels are Paid To Click platforms, which have been connected to potentially unethical behaviours towards the workers such as the use of complex pyramid systems and the reward of gambling activities. Regarding the worker earnings, our analysis shows a sizeable cumulative loss due to channel commissions, ranging from about 20% to 60% depending on the channels they belong to. This, in turn, leads us to discuss the potential impacts of unethical payments arising from opaque channel commission schemes.

We can generalise the lessons learned from our study and group them by the three main paid micro-task crowdsourcing actors: (i) Workers, who are the weak link in the chain, should be made aware of the different policies of the recruitment channels. They could increase their participation in dedicated online discussions, forums, and initiatives aimed at identifying unethical channels, boycotting them, and reporting them to the crowdsourcing platforms and to the requesters who might be unaware of the issue; (ii) platforms should develop public and transparent policies to guarantee that the recruitment channels operate fairly and ethically, excluding those that do not adhere to the stated policies; (iii) the requesters should become aware of the problems associated with the various recruitment channels, and prefer official ones (e. g., Elite  in case of Figure Eight) when it is not possible to verify that other channels operate fairly and ethically.

In the future, we will investigate how economic changes in particular countries have affected workers’ willingness to work on crowdsourcing platforms over the last four years. Moreover, some of the workers involved in our study will be interviewed for more details on the assumptions generated from this study. In addition, workers from other channels and crowdsourcing platforms will be surveyed as an extension of this research.