Efficient and adaptive incentive selection for crowdsourcing contests

The success of crowdsourcing projects relies critically on motivating a crowd to contribute. One particularly effective method for incentivising participants to perform tasks is to run contests where participants compete against each other for rewards. However, there are numerous ways to implement such contests in specific projects, that vary in how performance is evaluated, how participants are rewarded, and the sizes of the prizes. Also, the best way to implement contests in a particular project is still an open challenge, as the effectiveness of each contest implementation (henceforth, incentive) is unknown in advance. Hence, in a crowdsourcing project, a practical approach to maximise the overall utility of the requester (which can be measured by the total number of completed tasks or the quality of the task submissions) is to choose a set of incentives suggested by previous studies from the literature or from the requester’s experience. Then, an effective mechanism can be applied to automatically select appropriate incentives from this set over different time intervals so as to maximise the cumulative utility within a given financial budget and a time limit. To this end, we present a novel approach to this incentive selection problem. Specifically, we formalise it as an online decision making problem, where each action corresponds to offering a specific incentive. After that, we detail and evaluate a novel algorithm, HAIS, to solve the incentive selection problem efficiently and adaptively. In theory, in the case that all the estimates in HAIS (except the estimates of the effectiveness of each incentive) are correct, we show that the algorithm achieves the regret bound of O(B/c)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {O}(\sqrt {B/c})$\end{document}, where B denotes the financial budget and c is the average cost of the incentives. In experiments, the performance of HAIS is about 93% (up to 98%) of the optimal solution and about 9% (up to 40%) better than state-of-the-art algorithms in a broad range of settings, which vary in budget sizes, time limits, numbers of incentives, values of the standard deviation of the incentives’ utilities, and group sizes of the contests (i.e., the numbers of participants in a contest).


Introduction
Crowdsourcing has emerged as an efficient approach for obtaining solutions to a wide variety of problems by engaging a large number of Internet users from many places in the world [2][3][4][5][6][7]. Crowdsourcing is attractive to organisations and companies not only because it provides cheap labour, but also because it helps to solve problems quickly and effectively [8][9][10]. To cope with this increased This paper is a significant extension of our paper published as an extended abstract at AAMAS 2018 [1].
A key challenge in these settings is that the success of crowdsourcing projects relies critically on motivating a crowd to contribute [2,10,22]. Given this, contests 1 have been shown to be an efficient approach in these projects to motivate a crowd, as they are effective and cheap. Actually, by rewarding participants in a contest, task requesters do not necessarily have to pay for every task completed as in other types of financial rewarding schemes, such as paying for performance [23] or using bonuses [24,25]. Indeed, they have to pay mainly for a certain number of participants, e.g., the top two who have completed the most tasks or the top participant who has completed the tasks with the highest quality. 99 Designs (www.99designs.com), TopCoder (www.topcoder.com), and Taskcn (www.taskcn. com) are some well-known crowdsourcing platforms that use contests to attract participants.
Nevertheless, the effectiveness of the contests is likely to be different between crowdsourcing projects based on specific properties of those projects, such as the project purpose (e.g., building data for scientific studies or collecting data for a company), the task nature (e.g., interesting or boring), or the participant community (e.g., the extent to which they are in contact with each other or the extent to which the information of a participant is exposed to the others in the community). These differences reflect the fact that participants have different motivations [26][27][28]. Additionally, the implementation of the contests can make a significant difference in performance [29]. In more detail, each contest has a certain number of parameters, such as the base payment (a fixed payment for every participant), the group size (the maximum number of participants in a contest) and the amount of prize money for the best participant. For ease of presentation, we refer to a contest corresponding to specific parameter values as an incentive. 2 Furthermore, currently on many crowdsourcing platforms such as Amazon Mechanical Turk (www.mturk.com), appen (https://appen.com/), and Clickworker (www.clickworker. com), the requesters can manage the tasks (e.g., creating a task with descriptions or uploading related data for a task) and the submissions (e.g., downloading the submissions from the participants or sending bonuses to participants with high quality submissions) in an autonomous manner using a programmable Application Programming Interface (API). This makes it possible to build autonomous agents to monitor and adaptively switch incentives when appropriate. Indeed, it is inconvenient or practically impossible in many cases to switch between incentives manually to identify 2 We can see that two incentives can be different in the parameters or the parameter values For example, we might have the following three incentives. Incentive 1 corresponds to contests where the base payment (the first parameter) is £1.00, the group size (the second parameter) is 10, and the amount of prize money for the best participant (the third parameter) is £5.00. Incentive 2 is the same as incentive 1 except the base payment is £1. 50. Incentive 3 is the same as incentive 1 except it has one more parameter, the amount of prize money for the second participant is £5.00. In this example, incentives 1 and 2 have the same parameters but correspond to different values of the parameters. Also, incentive 3 has one parameter more than incentives 1 and 2.
Although the incentives focused on in this paper relate to contests, the problem stated and the algorithms discussed can be used with any other types of incentive in the literature, such as paying for performance or using bonus payments. Hence, to keep the problem general, we use the term incentives instead of contests. the best one. Therefore, finding an appropriate way for an autonomous agent to select an effective incentive in a crowdsourcing project is a key problem. We refer to this as the incentive selection problem (ISP) [1,20].
As mentioned above, the effectiveness of the incentives in a specific crowdsourcing project is unknown in advance. Thus, in order to utilise the most effective one (i.e., exploit), the agent has to try each incentive several times to evaluate its respective effectiveness (i.e., explore). Given this need to balance exploitation and exploration, budgeted multiarmed bandits (budgeted MABs) are a suitable approach for this problem [30,31]. In more detail, this approach models the problem as a machine with k arms (corresponding to k incentives), pulling an arm (providing the corresponding incentive to a group of participants) incurs a fixed cost (attached to the arm) and delivers a utility (e.g., the number of tasks completed) drawn from an unknown stochastic distribution. The objective in a budgeted MAB problem is to find a pulling policy (how many times each arm is pulled at each time step) that maximises the expected total utility within a given budget (e.g., £500).
A number of algorithms have been proposed to solve the budgeted MAB problem [18,[30][31][32]. However, these algorithms are not designed to work with the time budget (i.e., the deadline) of the ISP. Moreover, they do not consider the group-based nature of the incentives we consider here (i.e., contests); that is, after pulling an arm, we receive the performance of all the individuals in the corresponding contest group (i.e., a number of data points) rather than only the overall performance of the whole group (i.e., one data point). Thus, as we will show in Section 6, they are not efficient when dealing with the ISP. To illustrate the importance of the group-based nature, consider the two cases when the group size is 5 (i.e., 5 participants per contest) and 20 respectively. Current MAB algorithms would not treat these cases differently. However, the latter clearly provides us with more information on each pull (as it has more samples, i.e., participants). As a result, the second case requires fewer pulls of exploration in order to achieve the same level of understanding of the participants' performance (e.g., after 5 pulls of an arm in the second case, we have effectively sampled the performance of 100 individuals, but would require 20 pulls of an arm in the first case to reach that sample size). Hence, it is necessary to consider the group-based nature to determine the appropriate numbers of pulls for the arms. This is a non-trivial extension that requires new bandit algorithms.
In order to address this gap, in this paper we introduce an algorithm to deal with the ISP. The ultimate purpose of this work is to build an autonomous agent that can automatically and effectively select the right incentives, so that we can easily deploy projects on crowdsourcing platforms by using the provided APIs. To this end, our main contributions are: (1) We formalise the ISP and then introduce HAIS, a novel adaptive algorithm to solve the ISP effectively by utilising the limited financial and time budgets while considering the group-based nature of the incentives. Specifically, HAIS is designed to have (a) a better exploration-exploitation strategy together with (b) an efficient way of using the time budget in the exploitation phase, and (c) an effective approach for spending more of the budget on highly effective incentives in the exploration phase. (2) We theoretically show that in the case all estimates conducted in HAIS (except the estimates of the incentives' effectiveness) are correct, the algorithm benefits from a regret bound of O( √ B/c), where B denotes the financial budget and c is average cost of the incentives.
(3) We empirically demonstrate in synthetic environments that HAIS is more effective compared to the state-ofthe-art approaches in an extensive series of simulations. Specifically, the performance of HAIS is about 93% (up to 98%) of the optimal solution and about 9% (up to 40%) better than state-of-the-art benchmarks.
The paper is organised as follows: next we discuss related work in Section 2. We then describe the ISP as a batched 2d-budgeted group-based MAB problem in Section 3. After that, we introduce HAIS in Section 4. We present a theoretical analysis of the algorithm by inspecting its regret bound in Section 5 and we conduct an empirical evaluation of the algorithm by running simulations in Section 6. We conclude in Section 7.

Related work
Much work has taken a game-theoretic approach to investigate the optimal (or efficient) design of contests in general and crowdsourcing contests in particular. Such work often tries to answer the questions of how to distribute the prizes (number of prizes and their values) in contests [33][34][35][36][37][38][39][40][41]. However, applying this body of research to building efficient contests for real-world crowdsourcing projects is still challenging since (1) these studies are based on the rationality assumption 3 , whereas real participants in crowdsourcing might be partly rational or indeed irrational, as they might lack information, knowledge, or time; and (2) the studies do not consider other factors related to the participants' intrinsic motivation that might affect their behaviour, such as the project purpose, the task nature, or the participant community, as described in Section 1.
An alternative approach to deal with providing appropriate incentives is to design incentives that are thought to be effective based on previous studies and then empirically select the most effective one. Specifically, the abovementioned studies can be used to design several contest implementations (i.e., candidate incentives). Other relevant studies in psychology, sociology, or computer science (e.g., [23,26,27,42,43]) can also be used for this task as they help us better understand possible interactions between the factors (e.g., the incentives, the level of autonomy and interestingness of the tasks, or the purpose of the projects) related to motivations of the participants. Then, based on the proposed candidates, an adaptive approach can be used to identify the most effective candidate efficiently.
Following this direction, as noted in Section 1, multiarmed bandits (MABs)(mab) are a promising approach. Some algorithms are shown to be robust in many cases, such as Exp3 [44] and Thompson Sampling [45]. Yet, as they do not consider the time and financial budgets, these algorithms cannot be used to solve the ISP. For example, with Exp3 and Thompson Sampling, in a time period, they draw an arm based on estimated probabilistic models corresponding to the arms. This is effective when the time budget is much larger than the number of arms. However, this is often not the case in the ISP. Therefore, budgeted MABs (MABs where the overall budget is limited, [30])(budgeted-mab) are a better approach. A number of studies considering budgeted MABs have been conducted. Specifically, the (financial) budget-limited MAB was first introduced by [30], where the number of times an arm can be pulled (in both exploration and exploitation phases) is constrained by a single budget B (without a deadline). Their algorithm ( Budget-limited -first, orfirst for short) spends B (where is specified in advance, e.g., 0.3) for sequentially pulling the arms in the exploration phase and (1 − )B for pulling the arms with the highest estimated outcomes in the exploitation phase. [18] consider the uniform pulling approach of -first and argue that it might be inefficient in some cases when some ineffective arms can easily be identified and eliminated. From this argument, they develop three algorithms with three different approaches for eliminating the ineffective arms (l-split, PEEF, and SOAAv). Simulations on the trust problem of a supply chain 4 show these algorithms are effective, especially SOAAv with its adaptive approach. Taking a different approach, [31] use the idea of upperconfidence bounds from [46] to build an algorithm called Fractional KUBE, or fKUBE for short, that combines exploration and exploitation in one process.
However, as we will show in Section 6, the algorithms developed by [30,31] and [18] are not efficient when dealing with the ISP. First, regarding the group-based nature of the incentives (i.e., arms), the performance of these algorithms can vary significantly in different group sizes (i.e., the number of participants in a contest). For example, withfirst, in the exploration phase, it pulls the arms evenly with a given budget, so the arms with smaller group sizes are explored less than larger group sizes. Indeed, as the total number of participants in each arm is the group size of the arm times the number of times this arm is pulled, when the arms are pulled the same number of times, the arm with a smaller group size has fewer participants. This will affect its performance when the group sizes of the best arms are small compared to those of the worst arms, as the best arms will be pulled less than the others in the exploration phase. Second, regarding the time limit, as they are designed to work with an unlimited deadline, their performance can drop significantly when running under strict time constraints. For example, with fKUBE, as it spends only one round to obtain initial estimates of the arms' effectiveness, when the deadline is tight, pulling the current best arm once in a period is too slow to identify the real best arm before the deadline. Third, regarding the explorationexploitation balance, they do not have an effective and adaptive mechanism for distributing the financial budget across the two phases. For instance, with -first, it is not easy to specify an appropriate value for in advance, when there is little information about the performance of participants in the projects. Also, when differences in the effectiveness of the incentives are low (i.e., it is difficult to differentiate the incentives' effectiveness), the elimination mechanism of SOAAv might not be effective and the exploitation-exploration process of fKUBE might be slow in identifying the best arm. Despite these shortcomings, each algorithm also has its own strength, thus they are still good candidates for the ISP. Therefore, we implement these algorithms (with some modifications for the time constraint, when possible) to not only evaluate their performance but also to benchmark our new algorithm.
The work of [32] approaches budgeted MABs more generally by dealing with multi-dimensional bandits (each dimension corresponds to a resource, such as financial budget or time budget). However, their algorithms (PDBwK and BalanceBwK) cannot be applied to the ISP because the resources in their model cannot be shared, whereas in the ISP, the time can be shared, i.e., pulling one or more arms several times (providing one or more incentives to several groups) can happen in a given time period. Recently, [12] have attempted to use MABs to deal with the ISP. However, their model is difficult to use in practice, as they do not consider the time constraint (i.e., the tasks are provided one by one) and their algorithms are not adaptive (i.e., they require tuning appropriate situation-specific parameters).
Finally, [47] approach the incentive problem (finding an efficient way to incentivise participants so as to maximise the overall utility of the requesters) by combining the classical principal-agent model and MABs. In particular, they formalise the incentive problem as a multiple-round process. In each round, one participant (i.e., worker) completes a task based on a contract designed in advance by the requester. From the performance of the participants so far, their algorithm (AgnosticZooming) helps the requester adaptively adjust the contracts to be used in a round so that the requester's utility is maximised. Each potential contract is treated as an arm in their algorithm. However, they only consider financial incentives in the form of monotone contracts where the outcomes are not lower with higher payments. This prevents the algorithm from being used effectively in crowdsourcing projects where the motivation for participation is not only money, but can be human capital advancement or community identification [48]. In these projects, the outcomes might not be proportional to payments [27,43]. This might affect the performance of their algorithm in crowdsourcing projects whose time budgets are critical, as it takes a long time to identify a good contract.
Our preliminary work on the ISP [20] focuses on the case where candidate incentives are many (compared to the budget) and there exist correlations between the incentives. Specifically, as presented in Section 1, some candidate incentives might have the same parameters but the values of the parameters are different, so their performance is likely to be correlated. In fact, these incentives are chosen from the same incentive method (thus they have the same parameters) but are different in the way the method is implemented (i.e., different values of the parameters). Here, an incentive method can be any incentive in the literature, such as paying for performance, using bonuses, or using contests. Also, correlations between incentives means that the difference in the effectiveness between two adjacent incentives (i.e., the values of their parameters are slightly different) is assumed to be small. Indeed, it is likely that when the parameter values of an incentive method change slightly, the effectiveness of the corresponding incentives also changes gradually [42,49].
In the paper, correlated incentives are grouped into clusters. Also, the proposed model is for the ISP where we have no or very little prior knowledge about the performance of participants in the project of interest and the budget for the project is large compared to the chosen incentives.
Although the algorithm proposed (BOIS) is shown to solve the ISP effectively, in reality, many crowdsourcing projects do not have such large budgets. Also, in some projects, we might have good prior knowledge about participant performance, so in each cluster, we can continue choosing some candidate incentives with a high confidence that they are good. In some other projects, the budgets are not large enough (compared to the chosen candidate incentives).
Thus, we need to continue choosing some incentives in each cluster so that after exploring the incentives, we have sufficient budget to exploit the best incentive explored. BOIS is not effective in these projects, since the incentives are not correlated. This is because BOIS focuses more on the correlations to quickly identify the best incentive in each cluster. So, when there are only a small number of candidate incentives in each cluster, the algorithm does not have a good exploration-exploitation balance compared to the algorithm proposed in this paper (HAIS). Generally, in these projects, BOIS is somewhat similar to Stepped fKUBE. 5 More specifically, Stepped fKUBE spends the first period obtaining initial estimates of the incentives. Then, it spreads a certain portion (specified by 2 ) of the residual budget across the next periods (except the last one) to apply the best incentive so far (identified by their upper confidence bounds). After that, in the last period, it simply applies the best incentive with the remaining budget. As will be shown in Section 6.3, HAIS outperforms Stepped fKUBE in these projects.

The incentive selection problem
In this section, we first describe the incentive selection problem (ISP). Then, we formalise it as a batched 2dbudgeted group-based MAB problem.
Suppose a requester wants to run a crowdsourcing project. The objective is typically to maximise the requester's overall utility with a given budget before a given time. We can include task quantity, task quality, task completion time, or some subset of them in the utility function. 6 For 5 The Stepped fKUBE algorithm will be presented in detail in Section 6.1. 6 A note when choosing an aspect to be included in the metric is that it should not only measure the effectiveness of contest implementations (i.e., incentives) appropriately, but it should also be possible to evaluate the effectiveness easily and ideally in an automatic manner. That is because after deploying a contest of a specific implementation in a period of time, the effectiveness of this implementation should be calculated quickly before deploying another contest (to another group of participants) in the next period. One example of such tasks is drawing an evacuation route from a building to the nearest road as described in [29] and the corresponding metric is the number of valid evacuation routes completed. A valid evacuation route can be simply defined as the one that really connects a building to a road. This can be done automatically by checking if the route starts somewhere inside the outline of the building and ends at a road. We can manage a higher level of quality by defining a valid evacuation route as one that really connects a building through an entrance to a road with a walkway or an open space. However, to do this, we have to spend more time on manually validating submitted routes. We can do this by, for example, majority voting. In detail, we can ask some other participants to check if a route is valid or not and then choose the decision from the majority. This can be done in the form of another task. So, the total time will be expanded significantly. example, [25] consider the quantity and quality of the tasks. To achieve this objective, we spend the available budget on providing incentives to encourage participants (referred to as users) to perform tasks. The incentives can be in the form of contests or individual-based (i.e., non-contests). They can be different in terms of the number of users in a contest (note that with individual-based incentives, this value is always one), the performance evaluation method, or the prize distribution. Since the effectiveness of incentives is usually unknown in advance, we are interested in finding an efficient means of selecting candidate incentives (i.e., exploring their effectiveness and then exploiting the most effective one) to maximise the requester's utility. We refer to this as the incentive selection problem (ISP).
Formally, let I = {1, 2, . . . , I } denote a set of incentives that are being considered for use in a crowdsourcing project. Each incentive has a group size (the number of users in a group that is offered this incentive) and a cost (of offering the incentive to a group of users). The cost of each incentive is deterministic. For example, there may be 3 incentives. Incentive 1 can be contests of five users, where the base payment is £0.50 and the prize for the best user (who performs the most tasks in a contest) is £2. 50. That means this incentive has 5 as the group size and £5.00 as the cost (£2.50 for the base payments and £2.50 for the prize). Incentive 2 is almost the same as incentive 1, but the base payment and the prize for the best user are £0.70 and £1. 50. Similarly, incentive 3 can also be contests, whereby there are ten users in a contest, the base payment is £0.50, the prize for the best user is £1.50, and the prize for the second best one is £1.00.
The number of incentives (I ) also corresponds to the number of arms in a MAB problem. Pulling arm i corresponds to offering incentive i to a group of g i (referred to as group size) users in a specific time period (or period for short, e.g., five hours or one day). The periods do not overlap and are denoted by t = 1, 2, . . .. Each incentive can be applied to different groups in the same or different periods. We can only start period t (i.e., applying incentives to other groups in period t) when all groups in period t − 1 are finished. To illustrate this, Fig. 1 shows an example where three incentives are applied to various groups over five periods (corresponding to days here). On the first day (t = 1), we apply incentive 1 to four groups, incentive 2 to two groups, and incentive 3 to three groups. Here, the group sizes of incentives 1, 2, and 3 are 2, 4, and 4 respectively. cost of c i and enjoys a utility which is drawn independently from a fixed unknown distribution with an unknown mean (i.e., expectation or expected value) μ i . Let r (t) i be the total utility of applying incentive i n (t) i times in period t. Note that r (t) i is the total utility of users in all groups of incentive i in period t. The objective is to find an applying policy that maximises the expectation of the overall utility with a given financial budget B and time budget T : From the definition above, we can see that the ISP is a batched 2d-budgeted group-based MAB. Indeed, in each period, each arm can be pulled several times (i.e., an incentive can be offered to different groups) and multiple arms can be pulled (i.e., several incentives can be offered). Hence, there is batched in the name. Also, the pullings are constrained by a financial budget B (a dimension) and a time budget T (another dimension). Thus, the problem is 2d-budgeted, where "d" means "dimension". Moreover, one characteristic that makes the ISP different from other MAB models studied in the literature is the group-based nature of the arms. For ease of presentation, in some places, when it does not lead to confusion between the two types of budget, we use budget for the financial one.
Also in this work, we only consider the contests where the submissions of all users (not only the best user) in a contest are useful to the requester. In other words, the utility is additive. This assumption prevents the model being applied in projects where the requesters only consider the best submission in every contest. Yet, this normally happens in design contests. For example in crowdsourcing systems for design tasks such as 99 Designs (99designs. com), Design Crowd (www.designcrowd.com), and Crowd Spring (www.crowdspring.com), only the best submissions are used, the other submissions are discarded. However in microtask crowdsourcing projects, for example, every task completed is typically useful to the requesters [29,50].
Thus, although the assumption may limit the applications of the model, it is reasonable in many crowdsourcing projects.

The HAIS algorithm
Here we introduce Hoeffding-based Adaptive Incentive Selection (henceforth, HAIS), an adaptive algorithm for the ISP. HAIS uses several heuristics 7 (as will be presented in Section 4.5) to help solve the ISP effectively.
However, we first detail how the algorithm and the benchmarks measure the effectiveness of the incentives (Section 4.1). We then briefly present Hoeffding's inequality and discuss how we will utilise this inequality when dealing with the ISP (Section 4.2). After that, we give an overview of the algorithm (Subsection 4.3). Finally, we detail how HAIS is built and how it acts in the exploration (Sections 4.4 and 4.5) and exploitation (Sections 4.6 and 4.7) phases.

Measuring the effectiveness of the incentives
To measure the effectiveness of the incentives, we use density (i.e., the utility-cost ratio) [30], as it reflects the average utility (i.e., reward in the context of MABs) obtained per cost unit. The density of incentive i is defined as δ i = μ i /c i , where c i is the cost of applying the incentive once and μ i is the mean utility. However, as the real densities of the incentives are unknown a priori, we have to 7 HAIS is not a metaheuristics algorithm, since it is designed to deal with the ISP specifically, while a metaheuristics algorithm should provide a generic framework to solve many different problems such as simulated annealing, tabu search, or genetic algorithms [51]. HAIS has exploration and exploitation concepts which are similar to diversification and intensification as in metaheuristics algorithms. However, the exploration-exploitation trade-off in HAIS is inherently from the MAB problem itself, which is to identify the best arms. This is different from the diversification-intensification balance in metaheuristics, which is to find the best solution in the combinatorial search space. estimate them. Right after period t, the estimate of incentive i's density is: i is the number of times incentive i has been applied until the end of period t). With all algorithms examined in this work, each arm will be pulled at least once in period 1 and the estimates will be conducted from period 2. So, in (1), m (t) To keep the presentation simple, we use the best (worst) incentive to denote the incentive with the highest (lowest) estimate, as opposed to the real best (worst) incentive. Also, we use the estimate of an incentive instead of the estimate of an incentive's density (or effectiveness).

The Hoeffding's inequality
In general, this inequality is used to determine the number of samples needed to obtain a certain level of confidence for a confidence interval around the expected value of is the real best incentive, incentive 3 is the real worst the samples. In particular, let Y 1 , . . . , Y n be independent random variables with Y i ∈ [y min , y max ] for all i, where −∞ < y min ≤ y max < +∞. Then, Hoeffding's inequality [52] states that: is the expected value ofȲ (Fig. 2).
Applying this to the ISP, we can determine the number of sampled users needed in particular incentives to obtain good estimates of the incentives. And hence, we can identify the best incentive with high enough confidence, i.e., greater than or equal to a certain level of confidence which is specified in advance (e.g., 50%). We call this confidence level L h (h for Hoeffding). We choose Hoeffding's inequality as it helps us identify the appropriate numbers of times the incentives should be applied in the exploration phase dynamically based on the estimates of the incentives so far. Concretely, the inequality is applied to determine the number of additional users needed on each incentive and then based on the group size of the incentive to identify the number of times the corresponding incentive is applied.
For example, in a crowdsourcing project, there are two incentives which have the group sizes of 10 and 5 respectively. And suppose the chosen value of L h is 50% (i.e., the confidence level of identifying the best incentive after applying the incentives so that each incentive has the target number of sampled users is 50%). In order to obtain initial estimates, the target number of sampled users in each incentive is at least 30 (a parameter which is chosen in advance 8 ). Thus, in the first period, incentive 1 is applied three times (to have 3 * 10 = 30 sampled users) and incentive 2 is applied six times (to have 6 * 5 = 30 sampled users). Then, after applying Hoeffding's inequality, suppose the result suggests that to obtain a confidence level of L h = 50% in being sure that the current best incentive is the real best one, each incentive needs to have at least 60 sampled users. 9 Since currently incentive 1 (with group size of 10) already has 30 sampled users, it needs 30 more. That means we need to apply this incentive three more times. Similarly, as incentive 2 (with group size of 5) already has 30 sampled users, we need to apply this incentive six times to have 30 more.
In terms of the value of L h , it should be chosen to be high enough (e.g., 50%, rather than only 10%), so that the current best incentive is likely to be a highly effective incentive. However, it should not be too high, as the algorithm might spend the budget on applying less effective incentives. The advantage of using the predefined parameter L h is that we could choose a fixed value for it (e.g., 50%) in all crowdsourcing projects. With each project, based on L h and the estimates of the incentives so far, HAIS will adaptively identify an appropriate number of sampled users needed on each incentive.

Algorithm overview
HAIS splits the application of the incentives into two phases: exploration and exploitation. In the first phase, it has two steps: sampling and Hoeffding. In the second phase, it also has two steps: stepped exploitation and pure exploitation. We next provide an overview of HAIS over the four steps following an illustrative example of how the algorithm works over these steps. The diagram in Fig. 3 shows the connections between the steps in the full process of the algorithm.
The sampling step is conducted in the first period. The purpose of this step is to obtain initial estimates of the incentives in order to apply Hoeffding's inequality in the next period. Specifically, in this step, HAIS applies the incentives so that each incentive has a minimum number of sampled users. This number, which is referred to as U 1 , is specified in advance. U 1 should be large enough (e.g., 20) to obtain significant estimates of the incentives. Yet, it should not be too large to take up a large portion of the budget, e.g., 200. After that, the algorithm eliminates clearly ineffective incentives by comparing the confidence intervals of the estimates. Concretely, an incentive i will be eliminated if there exists another incentive j whose lower bound of the confidence interval is larger than the upper bound of the confidence interval of this incentive (i.e., d (1) i,upper < d (1) j,lower ). Eliminated incentives will not be applied in the Hoeffding step. Beside the estimates of the incentives, to calculate the corresponding confidence intervals, we need to set a value for the level of confidence. This value is a predefined parameter for the algorithm which is referred to as L e (e means elimination) 10 , e.g., 95%.
In the second period, HAIS applies Hoeffding's inequality as described in Section 4.2 to have better estimates of the incentives, preparing for exploitation after that. One issue that might occur in the exploration phase is that, as the performance of users in each incentive is stochastic, the number of sampled users suggested by Hoeffding's inequality can Fig. 3 High-level overview of the HAIS algorithm. See Algorithm 2 for details of the input, output, predefined parameters, steps 1-4, and decisions a-b. "Time is up" in (b) means all provided (T s ) periods for stepped exploitation have been used be very large in some cases and this might use up a large portion of B. This is ineffective since although it better estimates the incentives, it does not have much budget left to exploit the best incentives explored. Thus, we adapt the idea from -first of using a limited financial budget for exploration (specified by a predefined parameter which is referred to as 1 ∈ (0, 1)). This budget bound for exploration (i.e., 1 B) is applied to both sampling and Hoeffding steps. Although both HAIS and -first use the same parameter 1 , they have different purposes. In -first, 1 is used to identify the budget for exploration. So, it should be changed appropriately based on specific situations. In particular, with -first, in projects where the financial budget is large, we should choose small values of 1 , such as 0.02, to prevent spending a large proportion of the budget on exploring the incentives. And in projects where the financial budgets are small, we should choose large values of 1 , such as 0.1, to have sufficient budget to explore. In contrast to this, HAIS uses L h as the main parameter to control the budget for exploration. L h (as mentioned in Sub-Section 4.2), is the level of confidence to identify the best incentive. HAIS only uses 1 as an upper bound for the budget for exploration. Hence, the parameter 1 in HAIS can be chosen intuitively, such as 0.1, and it does not have to be changed in different projects.
In the next periods (except the last one), it conducts stepped exploitation, which takes advantage of the remaining periods to exploit effectively. More specifically, it splits the residual budget (b) into two parts based on a predefined To illustrate the algorithm, Fig. 2 shows an example of how HAIS acts in a simple case. In the first period of the example (day 1), incentives 2 and 3 are applied four times, while incentive 1 is applied only twice, to have enough U 1 = 8 users (Fig. 2a (1) ). Note that the numbers chosen in this example (e.g., U 1 or g i ) are for illustrative purposes only. After this period, the estimate of incentive 3 is significantly lower than that of incentive 1, i.e., d (1) 3,upper < d (1) 2,lower ( Fig. 2b (1) ). Incentive 3 is therefore eliminated. Hence, in the Hoeffding step conducted in period 2, HAIS decides to apply incentives 1 and 2, so that it has an additional 4 users for each incentive with an expectation to differentiate the incentives' effectiveness with at least L h = 50% confidence (Fig. 2a (2) ). After the exploration phase, the estimate of incentive 2 (29) is higher than that of incentive 1 (20) (Fig. 2b (2) ). Thus, incentive 2 is applied in the third period (day 3), followed by an update to this incentive's estimate. Note that, on day 3, incentives 1 and 3 were not applied (Fig. 2b (3) ). In the next period (day 4), as the estimate of incentive 1 (30) still appears to be the highest, HAIS just applies this incentive with the given budget (£12). In the last period, it applies the best incentive (incentive 2) 12 times with the remaining budget £24 (Fig. 2a (5) and Fig. 2b (5) ).
To summarise, the key novelty of HAIS is that it combines three techniques that together result in an adaptive and efficient way to balance exploration and exploitation.
First, it uses Hoeffding's inequality to identify how much exploration is sufficient to find the real best incentive with a certain level of confidence. This allows HAIS to adaptively distribute the budget for exploration without tuning any situation-specific parameters.
Second, the algorithm applies each incentive several times in the first round to obtain initial estimates of the densities of the incentives, together with using confidence intervals to eliminate clearly ineffective incentives after this period.
Third, it makes use of the time budget to continue exploring while exploiting the incentives by spreading the residual budget across the remaining periods.
In the following subsections, details of the four steps will be discussed. The explanations will be linked to the corresponding parts of the pseudocode of HAIS shown in Algorithm 2.

The sampling step
As discussed above, the objective of this step is twofold: to obtain initial estimates of the incentives and to preclude clearly ineffective incentives from being used in the next step (Hoeffding). Regarding the implementation of this step, it first determines a target number of users that should be sampled on each incentive after this step (i.e., after the first period), u 1 (Line 3). If the budget is large enough, this number can be set to U 1 (the expected number of sampled users in the sampling step). However, as discussed in the previous subsection, when the budget to have U 1 users sampled on each incentive exceeds the maximum budget for exploration 1 B, u 1 will be set to a smaller value so that the budget to have u 1 users sampled on each incentive is about 1 B. If this happens, the Hoeffding step will be skipped as the budget for exploration is exceeded. Since the group sizes of the incentives are different, we approximate the limited number of users corresponding to this budget bound by dividing 1 B by the total cost of one user on each incentive, which is I i=1 c i /g i (g i is the group size of incentive i). The purpose of the budget bound for exploration is to prevent spending too much budget on exploration. So, with this purpose in mind, the actual cost for exploring does not need to be strictly within the bound. This means it can be slightly more than this number. Given this, the above-mentioned approximation is acceptable.
Based on the target number of users u 1 and the group size of each incentive g i , the number of times each incentive should be applied is calculated by rounding the division u 1 /g i to the nearest integer (Line 5). Then the incentive is applied (Line 5) followed by an update on the estimate of this incentive (Line 6) and an update on the confidence interval of the incentive's estimate (Line 7). The confidence interval of incentive i's estimate d (1) i,lower , d (1) i,upper is: In this equation, -t = 1 as the calculation is at the end of the first period; -z e is the critical value (z-value) corresponding to the confidence level L e ; -n * (1) i = n (1) i g i is the number of sampled users of incentive i at the end of the first period; is the estimate of the standard deviation of incentive i's density at the end of period 1; where c * i = c i /g i is the average cost of a user in incentive i, r (1) i,u is the utility created by the uth user in incentive i in period 1, andr is the average of the utility received from all users in incentive i at the end of period 1.
Finally, based on the confidence intervals of the estimates, HAIS determines the set of incentives to be applied in the Hoeffding step, A (Line 8). The incentives that belong to A are referred to as active incentives. The others are eliminated and will not be applied in the Hoeffding step. Although the eliminated incentives will not be applied in the Hoeffding step, these incentives can be applied afterwards (Line 16). This helps us ensure we do not miss the real best incentive which is eliminated in the first period because of a low estimate compared to other incentives.

The Hoeffding step
We now describe how HAIS uses Hoeffding's inequality to calculate the number of times each active incentive should be applied in the subsequent period so that a level of confidence of at least L h can be obtained in identifying the real best incentive.
Let ]. According to Hoeffding's inequality, we have: where (5) and (6), we have: Applying (7) to the worst (active) incentive i 1 11 , the resulting confidence level thatX Similarly, applying (8) to the best incentive i 2 11 , the resulting confidence level thatX (t) i 2 − δ i 2 > −γ i 2 is: To differentiate the effectiveness of the two incentives, the confidence intervals γ i 1 and γ i 2 must be small enough 11 To keep the presentation simple, we use i 1 and i 2 to denote the worst and best incentives respectively at the end of period compared to the distance between the expected values of the two incentives' densities: This is illustrated in Fig. 4. The intuition about finding the real best incentive by comparing the best and worst incentives is that the purpose of the exploration phase is to quickly identify an incentive which has a high density (compared to others), not the real best incentive. Then, in the exploitation phase, the algorithm can gradually find the real best incentive with higher confidence by continuously updating the incentives' estimates. In contrast, if it focuses on finding the real best incentive in the exploration phase (by comparing the best incentive to the second best incentive, for example), it is likely to apply the incentives more. This means it would waste the budget on the less effective incentives. From (9), (10), and (11), we have: We assume that (X (t) are two independent events. This is acceptable because we can prevent a user from participating in more than one group in a period. Thus, the performance of users in different incentives are unrelated to each other. In more detail, in crowdsourcing platforms such as Amazon Mechanical Turk, Clickworker, or Figure Eight, the number of users is large. And, when submitting new tasks we can easily filter out the users who already participated in the project (by using the provided APIs). Even with crowdsourcing projects whose potential number of users is not large or it is difficult to re-recruit users, a small number of users recruited more than once is not likely to change the result significantly. However, a larger number of these might do and hence is not considered in this work. Therefore, the confidence level of both these events occurring is l To keep our analysis simple, we choose the same confidence level in (9) and (10), i.e., l . Additionally, despite the fact that the numbers of users on the worst and best active incentives after period 1 (u (t) i 1 and u (t) i 2 ) might be different (because of different group sizes), the target number of sampled users to obtain in this step (i.e., until the end of period 2) is expected to be the same (i.e., u (2) (2) ). Thus, from (12) we have: Since δ i (∀i = 1, . . . , I ), β i 1 , and β i 2 are unknown in advance, we use the estimates after the sampling step to approximate these values: Therefore, from (13) we have: Similar to the sampling step, this step is also constrained by the budget bound 1 B. Hence, HAIS uses the approach applied in the sampling step to deal with this (Line 10) by approximating the maximum number of users based on the total cost of applying the active incentives ( i∈A c i /g i ).
Based on the new target number of sampled users u 2 , each active incentive will be applied followed by an update to its estimate (Lines 13-15).

The stepped exploitation step
An important benefit of HAIS is that it can consider stopping sooner, i.e., using fewer periods (e.g., 7 days) than the time budget (e.g., 10 days). Actually, the algorithm will stop stepped exploiting when it reaches a certain level of confidence which is referred to as L s (s is short for stepped exploitation). L s can be set in advance as a predefined Fig. 4 Illustration for (11) parameter, as it is independent of the actual estimates of the incentives when the algorithm is running. L s should be set close to 1 (e.g., 90% or 95%) so that we have a high confidence that in the last period (period T ) the current best incentive is the best one. The confidence level in finding the real best incentive at the end of period t can be calculated from (12). To keep the algorithm simple, we choose the same confidence level l (t) . Moreover, we also approximate δ i (∀i = 1, . . . , I ), β i 1 , and β i 2 with the estimates so far: Thus, from (12), we have the maximum confidence level in finding the real best incentive at the end of period t: Equation (17) is used before each period in the stepped exploitation step (Lines 17 and 23). to decide whether to continue stepped exploiting or not (the third condition in Line 19). Additional information can also be used together with the condition about L s to consider stopping stepped exploiting sooner. Specifically, we can use the number of consecutive periods that the current best incentive has been applied. We refer to this as N s (a predefined parameter). If in this step, an incentive has been applied consecutively in the last N s (e.g., 10) periods, this incentive is highly likely to be the real best one. Thus, we can immediately move to the last step (pure exploitation), even if the confidence is still less than L s because the number of sampled users is not large enough. Therefore, HAIS also uses this information (the fourth condition in Line 19) to decide when to stop stepped exploiting. In this condition, ns (t) i is the number of consecutive periods that incentive i has been applied at the end of period t.

The pure exploitation step
In the last period, HAIS exploits the incentives (with the residual budget) by using the density ordered greedy approach described in [30], as it is simple and efficient. It is referred to as pure exploiting in this work. In detail, it applies the best incentive as many times as it can without exceeding the residual budget. With the remaining budget, it applies the next best incentive, whose cost is not larger than the budget, in the same manner. Note that the incentives to be applied in this step can be the ones which were eliminated after the sampling step (when the residual budget is not enough to apply any other active incentive). This continues until the budget is not enough to apply any other incentives.

A regret bound for HAIS
In this section, we provide a regret bound for the HAIS algorithm. In the development of HAIS, there are several estimates and heuristics (such as (11), (14) and (16)). Hence, in order to analyse the regret bound of the algorithm we assume all these estimates are correct. Also, without loss of generality, we assume that incentive 1 is the best incentive, i.e., the incentive with the highest density. We consider a normalised version of the ISP where the cost of pulling each incentive is the same, and the mean utility of each incentive will be changed accordingly. We adjust the mean values of all incentives so that the incentives have the same cost c = 1 I I i=1 c i but the density of each incentive is still unchanged. Specifically, in the normalised ISP each incentive i (with mean μ i and cost c i ) will have the adjusted mean μ i = μ i c/c i and the normalised cost c. Thus, the best incentive is now the one with the highest adjusted mean, which is μ 1 . We find a regret bound by measuring the performance of HAIS against the best incentive: We have the following theorem:

Theorem 1 Let the agent follow the HAIS algorithm. Then, the regret of the agent can be bounded by
where u 1 = min U 1 , then the regret of the agent will be Proof The full proof is given in Appendix A.
Remark 1 From the regret bound in Theorem 1, the regret of HAIS will depend on the parameter γ * . If γ * is chosen optimally as shown in Theorem 1, then HAIS is a noregret algorithm with the regret bound depending linearly on the square root of the number of times each incentive is applied, which is √ B/c. Intuitively, in the cases when the financial budget B is large, HAIS allows the agent to explore sufficiently every incentive in steps 1 and 2, thus the agent can exploit the best incentive in steps 3 and 4 with high probability. In the cases where T is large, by using −greedy in the stepped exploitation step, our algorithm can guarantee to find the best incentive by the property of −greedy [46]. However, in the cases of the ISP, T tends to be small. Thus, applying normal bandit algorithms cannot provide an efficient regret bound. Instead, HAIS can maintain the state-of-the-art regret bound while adapting efficiently to the cases of the ISP where the time budget T is small.

Experimental evaluation
To systematically evaluate the performance of HAIS, we use simulations in a wide range of controlled settings. Our aim in so doing is to ascertain the key determinants of performance and how they relate to one another. This is a necessary pre-cursor to real-world deployment. This initial evaluation cannot be undertaken in a real crowdsourcing project as we would have to deploy the project multiple times with different financial budgets, time budgets, number of incentives, and group sizes. Even then we could not guarantee that we have explored the main cases in a comprehensive fashion. In the following, we present the benchmarks (Section 6.1), the experimental settings (Section 6.2), and then discuss the corresponding results (Section 6.3).

Benchmarks
As the state-of-the-art algorithms discussed in Section 2 are not specifically designed to deal with the time constraints of the ISP, we make a number of modifications to these algorithms.
(1) -first: This algorithm [30] spends 1 B (where 1 is specified in advance) in the first period to explore by applying the incentives evenly until this budget is exceeded [30]. Then, it spends the subsequent period purely exploiting the best incentives with the residual budget, i.e., (1 − 1 )B as mentioned in Section 4.7.

The purpose of running this algorithm in addition to
Stepped -first (as described below) is to see how effective the stepped exploitation step is.
(2) Stepped -first (or s -first for short): This algorithm is a modified version of -first that is designed to run more effectively under a time limit.
-first does not make use of the time budget to exploit effectively, as after the exploration phase, the best incentive might not be the real best one, and this may only be discovered by further exploration. Thus, we apply the stepped exploitation of HAIS to this algorithm to make use of the periods before the deadline to conduct a more effective exploitation (i.e., exploitation together with further exploration). Like HAIS, it spends the last period purely exploiting. An illustration of how this algorithm works is presented in Appendix B.
(3) Stepped fKUBE (or sfKUBE for short): This algorithm [31] applies all the incentives once to obtain initial estimates of the incentives. This can be considered as an initial exploration step. Then, it applies stepped and pure exploitation techniques as per HAIS. The only difference is that Stepped fKUBE uses the upper confidence bounds (UCBs) of the estimates instead of the estimates that HAIS uses. The UCB of incentive i's estimate is: In each period before the last period, it applies the incentive with the highest UCB once followed by an update to the estimate of this incentive.(ucb) In (22), is the total number of users in all incentives until the end of period t and r min (r max ) is the minimum (maximum) density of the incentives, which is specified in Table 1. We will discuss this table in Section 6.2.1. In this step (stepped exploitation), by using the UCBs of the estimates, fKUBE integrates further exploration into the exploitation phase. In fact, as the estimates are uncertain, instead of looking at the estimate of an incentive based only on the current estimate of its expected utility and the cost (μ More specifically, when an incentive is applied, this square root term (representing the uncertainty of this incentive's estimate) will decrease. Therefore, regarding this term, the incentives which are applied less (hence, are more uncertain) have more opportunity to be applied in the next period. 12 Finally, in the last period, Stepped fKUBE purely exploits.
(4) Survival of the Above Average (SOAAv): This algorithm [18] applies different incentives from round to round. In each round, it applies the incentives that have estimates above (1 + ξ) times the average of incentives' estimates in the previous period once. The predefined parameter ξ is to help adjust the threshold to eliminate incentives after each period. That means, it only applies the incentives whose estimates are greater than this threshold. If ξ = 0, the threshold is the average of the estimates of the incentives. Note that, an eliminated incentive in period i can become active again in period j (j > i) if at the end of period j − 1, its estimate is above the threshold. It then updates these incentives' estimates. This happens until the financial budget is exceeded. In the last period, it conducts pure exploitation as in HAIS to exhaust the residual budget. (5) Exp3: This algorithm [44] maintains a weighted list where each item corresponds to an incentive. The weights are used to randomly choose an incentive in the next periods. After applying an incentive and receiving a utility, the algorithm updates the weight of this incentive based on the received utility. More specifically, at the beginning the weights of the incentives (w (1) i ) are all 1. In the first period, the algorithm applies each incentive once to obtain initial estimates of the incentives. Then, it updates the weights of all the incentives. The way Exp3 updates the weights at the end of period 1 is the same as in the other periods before the deadline, which is shown in (24). In period t = 2, . . . , T − 1, the probability of choosing incentive i (i = 1, . . . , I ) is: where γ ∈ (0, 1] is a predefined parameter to specify the level of exploration to be used. Specifically, when γ = 1, the first term on the right hand side of (23) is 0. Hence, the algorithm ignores the incentives' weights (i.e., it completely explores). When γ is closer to 0, this term is greater. That means, the probability of choosing an incentive is based more on its weight (i.e., more exploitation). At the end of period t, the received utility (r (t) i ) will be used to update the weights of the incentives to prepare for the next period: where r min (r max ) is the minimum (maximum) density of the incentives, which is specified in Table 1. In the last period, Exp3 conducts pure exploration as in HAIS. (6) Optimal: It simply applies the real best incentive all the time. To do so, we have to know the utility means μ i (∀i = 1, . . . , I ) in advance, which are unknowable in our practice. Thus, it is unachievable for real-world development.

Simulation settings
To evaluate the performance of the algorithms we run simulations in seven different settings where the independent variables are financial budget, time budget, number of incentives, standard deviation of the incentives' utilities, and maximum group size. Regarding the latter, we run three settings and in each setting, we draw the group size of each incentive in each simulation from a discrete uniform distribution from 1 to the maximum group size. We will describe these three settings later in the section. The simulations in these seven settings help us compare the algorithms in terms of performance (i.e., the average density). Based on these simulations, we cannot readily see why one algorithm performs better (or worse) than the others. Therefore, we run other simulations on a representative case so that we can better understand the behaviour of each algorithm (other cases give broadly the same outcomes). Specifically, based on the simulations, we want to examine how the algorithms spread the budget across the phases and steps and over the incentives.
Regarding the seven settings, in the simulations of each setting, the related quantities, i.e., B, T , I , g i , c i , μ i , δ i ∀i = 1, . . . , I (except the corresponding independent variable) are generated uniformly in specific ranges. The ranges of the quantities are shown in Table 1 and will be discussed in more detail in Section 6.2.1.
In terms of the maximum group size settings, we run one setting to examine the performance of the algorithms with different values for the maximum group size. Specifically, in the simulations of this setting, group sizes of the incentives are generated uniformly from 1 to the value of the independent variable. In addition, we also run two more settings in two special cases. Concretely, since the algorithms (excluding HAIS) apply the incentives without considering the group sizes, when the group size of the real best (worst) incentive is largest, these algorithms have an advantage (disadvantage) over HAIS. For example, if the group size of the real best incentive is largest, by applying the incentives evenly in the exploration phase, -first and Stepped -first also partially exploit the best incentive as it has more sampled users on this incentive. However, HAIS does not have that exploitation while exploring as in its exploration phase it tries to apply the incentives so that the number of sampled users on each incentive is almost the same. Additionally, by having more sampled users in the exploration phase, -first and Stepped -first have a better estimate of the real best incentive and hence they are likely to recognise that this is indeed the real best incentive after exploring. Therefore, we want to investigate how HAIS performs compared to other algorithms in these two special cases. In the simulations of these two settings, we keep the group size of the real best (worst) incentive fixed with the value of the independent variable (x). The group sizes of the other incentives are generated randomly from 1 to x − 1 (to ensure they are always smaller).
For each value of the independent variable, we run 20,000 simulations to achieve statistically significant results at the 99% confidence level. In Figs. 5-13 and 16, the confidence intervals are small. So, for better image clarity, the error bars representing the confidence intervals are omitted. To better understand the algorithms' behaviours, we run with six incentives where the densities of incentives 1 to 6 are 90, 80, 75, 75, 70, and 60 respectively. That means incentive 1 is the best, while incentive 6 is the worst. In the simulation, the budgets are £3,000 and 10 periods, and the standard deviation of incentive i is 0.4μ i ∀i = 1 . . . 6 (the mean value of the range presented in Table 1 which will be discussed in the next subsection). The group size of each incentive in each period is generated uniformly in the range from 1 to 10. We also run the simulation 20,000 times as with the above-mentioned simulations.
Next, in Section 6.2.1, we detail the ranges of the quantities used for randomisation in the simulations. Then, in Section 6.2.2, we detail the values of the algorithms' predefined parameters. Finally, in Section 6.2.3, we present how the performance of a group is generated in the simulations (based on the performance of the individuals of the group).

Ranges of the quantities for randomisation
The ranges of the quantities are described in Table 1. The values are chosen to represent realistic settings from a number of real crowdsourcing projects. The projects will be presented in the corresponding parameters. As the crowdsourcing projects found in the literature are not run using MABs, based on the figures in these projects (such as budgets or group sizes), we infer the ranges for the related quantities in our simulations. The papers used for inferring the ranges will be stated when possible. In more detail, regarding the number of incentives, as will be shown later in Section 6.3, the more incentives the worse the performance of the algorithms becomes. This is reasonable because the more incentives the more budget spent on exploring their effectiveness. Hence, in a real crowdsourcing project, the chosen number of incentives should be as small as possible. For this reason, we choose 20 as the maximum value of I . We can have 20 separate incentives or 5 group sizes with 4 payment structures per group size.
Regarding the group sizes, according to the figures from [54], the popular group sizes on Taskcn are from 1 to about 100. However, it is more difficult to recruit many users (for a contest), especially with crowdsourcing projects that are not run on other platforms (such as Amazon Mechanical Turk or Clickworker) and hence they have to recruit users by themselves [29]. Additionally, when users get experience with crowdsourcing contests, they tend to participate in the contests with small group sizes so that they can have a better chance to win the competition [54]. Because of this, the chosen maximum value for group sizes is 50 (instead of 100).
Regarding the densities and utility means, since each crowdsourcing project can use a different way to measure the utility (as discussed in Section 3), the range of densities can be very different. In our simulations, we combine both the quantity and quality of the tasks (i.e., number of tasks completed and their corresponding quality) in the metric. 13 In the simulation settings, we know the real density of the worst incentive (which is 60 utility per £). So, to have a better comparison between the algorithms, the effectiveness of each algorithm is measured by the increase in utility over the worst algorithm. The worst algorithm is the algorithm which simply applies the worst incentive as many times as possible. We refer to this increase in utility as normalised utility.
So, we choose [60..90] as the possible utility means and [60..90] as the possible density values of the incentives. The maximum difference between the best and worst incentives is 30 but not larger because in real crowdsourcing projects, by using prior knowledge about the projects (if possible) together with existing studies, we can build good-enough incentives. Although some of the designed incentives may be relatively poor (e.g., their densities are 30 or 40), they do not result in a significant difference in the results. Hence, we skip these cases and concentrate on the more challenging settings where performance differences are relatively small. Moreover, to observe the performance of the algorithms more clearly with different values of the independent variables, the density of the best incentive is always 90.
Regarding the utility standard deviations, when these values are too small (e.g., 0.05μ i ), the algorithms can easily identify the real densities of the incentives. Similarly, when they are too large (e.g., 0.9μ i ), it is very challenging for all the algorithms to estimate the incentives, as they need a much higher budget to obtain better estimates. This is infeasible in real crowdsourcing projects, where the budgets are usually limited. Therefore, as the purpose of the simulations is to compare the performance of the algorithms, we use an average range of the standard deviations, that is from 0.2μ i to 0.6μ i (∀i).
Regarding the financial budget, to allow us to carry out a meaningful performance comparison, the budget should not be too small. If the algorithms do not have a sufficient budget for exploring, then all of their performances will be low. Also, as the number of incentives and group sizes are generated uniformly, to be sure the budget is not too small, its value should be proportional to these quantities. Therefore, we use round cost to control the minimum value of the budgets. Here, round cost (denoted by round cost) is the cost of applying all incentives where each incentive has U 1 sampled users.
According to our calculation, the budgets used for the first crowdsourcing project in [23] (experiment 1: image ordering) and the crowdsourcing project in [25] (the experiment with the word puzzle) are about about 94 and 58 times the round cost. In these studies, as they use individual-based incentives, the round cost is the cost of one user in all treatments of the corresponding experiment. Moreover, since these two crowdsourcing projects are running behavioural experiments, the real crowdsourcing projects might use larger budgets. Thus, we choose the possible range of the generated financial budgets to be from 10 to 200 times the round cost.
This mechanism is applied to the simulations of all the settings except the three related to the maximum group sizes. Choosing a different mechanism for generating financial budgets in the three settings is because we want to investigate the performance of the algorithms with different values of the maximum group sizes. If this mechanism is also applied to the three settings, the trends can be affected by the financial budgets. Actually, when the maximum value of the group sizes is large, with this mechanism, the financial budget is also large. Thus, the budget for exploitation in HAIS, -first, and Stepped -first is large. This might affect the general performance of the algorithms. Therefore, in these three settings, we use the above-mentioned generating mechanism with one change. The round cost is replaced with the median value of the range of the group sizes as described in Table 1, that is 25.5. By doing so, different values of the independent variable x (i.e., the maximum group sizes) do not affect the generated financial budgets. Hence, the performance of the algorithms is influenced by x only.
Regarding the time budget, as the result of 1 period is uninteresting (i.e., nothing can be learnt), we choose 2 periods as the minimum value of T . We also choose 30 as the maximum value of T . Depending on the characteristics of specific crowdsourcing projects and how long of a period, the most likely time budgets are believed to be in this range. For example, if a period is 1 week, then several (e.g., 8) weeks is a reasonable deadline. Or, if a period is 1 day, then 30 days for the time budget is feasible.

Values of the predefined parameters of the algorithms
We run the algorithms with different values of the predefined parameters and then choose appropriate values for the parameters. For example, with 1 of -first, we first run this algorithm with different values (such as 0.05, 0.1, 0.2, 0.3, and 0.4). Then we choose one value that helps -first perform well in different settings. A similar process is used for the other predefined parameters such as 2 of Stepped -first and L h of HAIS. We can automate the process of choosing appropriate values for these predefined parameters by using Baysian optimisation [55,56].
As changing these values slightly does not result in a significant difference (i.e., the trends of the algorithms' performance are broadly the same), in Section 6.3, we only present the results on the simulations with the values of the algorithms' predefined parameters as described in Table 2.
Regarding the predefined parameters of HAIS, as most of them are self-explanatory and some of them are already discussed in Section 4.3, we do not explain them here.

The model of group performance
In the simulations, we assume that the performance of a group (i.e., the total utility of all users in the group) is proportional to the group size. This means the more Stepped -first 1 0.10 Budget limit for exploration.  SOAAv ξ 0 ξ = 0 means the incentives to be applied in a period are the ones whose estimates are above the average of the estimates of the ones in the previous period.
users there are in a group, the better the performance of the whole group. In the literature, there are very few papers investigating the performance of a group of users in crowdsourcing contests. This assumption is based on an empirical study conducted by [49]. In their work, they investigate the data collected from 99designs, a crowdsourcing platform where users submit their designs and compete with others for a financial reward. They found that the quality of the designs in a contest is almost linear in the number of users who participated in the contest.

Results
In general, HAIS performs best in most cases (Figs. 5-11). In more detail, HAIS performs better with a larger financial budget (Fig. 5), with a looser deadline (Fig. 6), with fewer   (Fig. 7), and with smaller values of the standard deviation of the incentives' utilities (Fig. 8). Its performance is reasonably stable with different group sizes (Fig. 9), even when the group size of the best incentive is the largest, i.e., when other algorithms (especially Stepped -first) have an advantage over HAIS (Fig. 10). Additionally, as shown in Fig. 9, when all incentives are individual-based (i.e., their group sizes are all one), HAIS performs much better than the benchmarks. This emphasises the performance of HAIS in traditional settings where the group-based nature of the arms is omitted.
Moreover, as we can see, Stepped -first performs much better when the group size of the best incentive is the largest (Fig. 10) than when the group size of the worst incentive is the largest (Fig. 11). The difference is clearer when the maximum group size of the best/worst incentive becomes larger. This is because in Fig. 10 they are likely to have more sampled users in the best incentive. Hence, they can quickly identify this incentive. In both settings, HAIS remains almost at the same level of performance.
The reason that HAIS can do this effectively is that it has (1) a better exploration-exploitation strategy together with (2) an efficient way of using the time budget in the exploitation phase, and (3) an effective approach for spending more of the budget on highly effective incentives in the exploration phase. We will discuss each of these issues in the following subsections. Then, we will continue with effective ways to use HAIS in a specific crowdsourcing project.
However, as in general Exp3 does not perform well and does not relate to the analysis, we first discuss its performance here and will not consider this algorithm in the remaining subsections. Specifically, Exp3 does not perform well in any settings . This is because choosing the incentives randomly based on their weights does not work well when the time budget is small. As reflected in Fig. 6, the performance of Exp3 becomes better when the time budget becomes larger. Yet, in most crowdsourcing projects, the time budgets are usually not large (e.g., several days or months instead of several years). Additionally, with respect to -first, as the purpose of running this algorithm is to examine the effectiveness of the stepped exploitation step, we only discuss this algorithm in Section 6.3.2 when explaining the importance and the usage of stepped exploitation.

Exploration-exploitation balance
Regarding the exploration-exploitation strategy, as both financial and time budgets are limited in the ISP, an algorithm that takes advantage of the budgets can enhance the overall performance significantly. That is, sufficient exploration should be conducted to identify highly effective incentives so that the algorithm has enough budget and time to exploit these incentives effectively.
In general, Stepped fKUBE's performance is low. This is because one round for initial exploration is not enough to have good estimates for the next step (stepped exploitation). Actually, as can be seen from Figs. 6 and 9, the performance of this algorithm improves significantly when the time budget or the group sizes become large. This is due to the more time available for the algorithm to identify the best incentive. Also, with larger group sizes, it has more sampled users, and thus the initial estimates become better.
In most cases, Stepped -first performs better than Stepped fKUBE (Figs. [5][6][7][8][9][10][11]. This is because Stepped -first spends more of its budget (to have more rounds) for exploring (which is identified by 1 ); so it has better estimates of the incentives. However, the performance of Stepped -first depends on choosing an appropriate value of 1 .
On the other hand, as HAIS uses Hoeffding's inequality, it is more flexible in determining an appropriate budget for exploration. Actually, Fig. 12 shows that when the budget for the crowdsourcing project (B) is large, instead of using all 1 B as in Stepped -first, HAIS tends to use less of the budget (than Stepped -first) to explore. Note that although less of the budget is used for exploring, the total cost for applying the best incentive in the exploration phase tends to be larger than that of Stepped -first. This will be discussed in detail in Section 6.3.3.

Taking advantage of the time budget
By comparing the performance of Stepped -first with the original -first, we can see that stepped exploitation helps take advantage of the time budget, and   Fig. 6 hence improves the overall performance of the algorithm significantly. As Figs. 5-11 show, Stepped -first performs significantly better than -first, especially when the budget is not large or when the time budget is large. Specifically, in Fig. 5, with low budgets (e.g., from £1,000 to £10,000), -first does not explore sufficiently; so, its performance is rather low. Meanwhile, although with the same budgets (i.e., not exploring enough in the exploration phase), Stepped -first performs much better, as it makes use of the time budget to conduct further exploration while exploiting the incentives. In Fig. 6, this difference in performance between the two algorithms is clearer when the time budget is large (e.g., more than 10 periods). As shown in this figure, since -first always uses two periods, its performance is almost the same with different values of the time budget.
Although using the same exploitation mechanism, HAIS makes use of stepped exploitation better than Stepped -first (Figs. [5][6][7][8][9][10][11]. This is especially the case when the financial budget is large (Fig. 5). In particular, as Stepped -first has more exploration rounds when the financial budget becomes larger, after the exploration phase, it can identify the highly effective incentives better (i.e., the estimated best incentive is likely to be the real best incentive). Thus, the effect of stepped exploitation on Stepped -first becomes smaller. Note that, by doing this, Stepped -first also wastes the budget on applying ineffective incentives in the exploration phase. This is shown in Fig. 5 where -first's effectiveness In addition, as discussed in Section 4.6, HAIS is able to stop stepped exploiting sooner without significantly affecting the results. Hence, setting a loose deadline is better for its performance as it has enough time to conduct stepped exploitation effectively. To this end, Fig. 13 shows the average number of periods used by HAIS in the setting corresponding to Fig. 6. This figure shows that although the time budget is large, HAIS tends to use a lot less of it. This suggests that when applying the algorithm to a real crowdsourcing project, if the time is not very important, it is better to set a longer deadline. The algorithm will then automatically select an appropriate time to stop.

Effective elimination
By eliminating clearly ineffective incentives right after having initial estimates and before conducting more exploration, HAIS can distinguish highly effective incentives more quickly. The advantage of elimination is that it has more of the budget to continue exploring these incentives (to find the real best one) in the Hoeffding step. Because of this, the Hoeffding step can be considered as not only exploring but also partially exploiting, as it applies only highly effective incentives. The effectiveness of the elimination is shown in Figs. 14 and 15. In more detail, Fig. 14a shows that, compared with other algorithms, HAIS spends more of its budget on the best incentive (incentive 1) and less of Fig. 15 Cost distribution across the periods incurred by each algorithm its budget on the others. By looking more closely at how the cost is distributed over the incentives across the phases of HAIS (Fig. 14b), we can see that after the sampling step, HAIS identifies ineffective incentives effectively. This figure therefore clearly shows that in the Hoeffding step, HAIS spends more of its budget on highly effective incentives. In contrast to this, Stepped -first spends the same amount of budget to explore each incentive. This helps HAIS not only partially exploit highly effective incentives while exploring, but also increasing the chance of identifying the real best incentive in the exploitation phase (as the best incentives are likely to be applied more than the others). Indeed, by looking at how the cost is distributed across the periods, we can see that HAIS spends the most on the real best incentive in all exploitation periods, i.e., the periods in the exploitation phase, including the last period (Fig. 15a).
Additionally, Fig. 15b shows that HAIS spends less than Stepped -first on ineffective incentives in all periods. Note that Stepped -first uses only one period to explore, while HAIS uses two periods. So, Stepped -first starts exploiting one period sooner than HAIS. Therefore, when comparing the total spent until the end of a certain period (i) of HAIS in the exploitation phase, we need to compare it with that until the end of period i − 1 of Stepped -first. For example, we need to compare the total spent from period 1 to period 3 of HAIS with that of periods 1 and 2 of Stepped -first.
Although using an elimination technique like HAIS, SOAAv does not perform well. Specifically, Fig. 15 shows that SOAAv under-explores the incentives, especially the highly effective ones in the first (e.g., 3) periods. This results in exploiting ineffective incentives in the remaining periods. More specifically, in the first periods SOAAv eliminates the incentives based on the estimates so far (of the incentives' densities). However, in these periods, the algorithm does not have enough sampled users to make good elimination decisions. Hence, the real best incentive may be eliminated with a probability that is not insignificant. Therefore, in the later periods (e.g., from period 4 to period 9), SOAAv tends to apply the ineffective incentives much more than HAIS and Stepped -first (Fig. 15). One exception is that SOAAv performs better than HAIS in the case when the difference in the effectiveness of the incentives is small. In detail, Fig. 8 shows that when the standard deviation of the utility of each incentive is less than about 20 per cent of the mean utility of the incentive, SOAAv has slightly higher overall utility than HAIS. The reason is that right after the first period, the estimate of the real best incentive is clearly better than those of the other incentives. Thus, it is likely that the estimated densities of the incentives other than the best one are smaller than the average. Hence, in the remaining periods, these incentives will be eliminated. However, in crowsourcing projects the performance of users tends to be large from user to user depending on their motivations. So, the standard deviations tend to be not too small as in this case. Therefore, we do not include this case in the simulations. Instead, we focus on more realistic cases where the differences are large enough.

Practical usage of the HAIS algorithm
The above-mentioned results suggest several guidelines for using the HAIS algorithm in practice. First, the larger the budget, the better (Fig. 5). It is reasonable that when the budget is larger, HAIS can spend more on exploring the incentives so that it can identify the best incentives before exploiting.
Second, the fewer incentives, the better (Fig. 7). Specifically, when there are more incentives, HAIS has to spend more of the budget exploring ineffective incentives. But, as the requesters might be uncertain about the effectiveness of the incentives in specific crowdsourcing Budget ratio is B/round cost. This is the corresponding result of the simulations shown in Fig. 5 projects, they may not have good reasons to eliminate some chosen candidate incentives so as to improve the overall performance of HAIS. Therefore, Fig. 16 can be used to have a clearer view of how the current number of candidate incentives affects the overall performance. Indeed, the performance of HAIS increases significantly when the budget ratio (i.e., B/round cost) is from 1 to about 10. After that, it still improves, but slowly. This suggests that the budget should be at least 10 times the round cost. Based on this, with a given financial budget, we can easily determine an appropriate maximum number of incentives.
Third, the time budget should be large enough (e.g., from 15 to 20 periods), but does not need to be very large (e.g., 100 periods) so that HAIS has enough time to conduct stepped exploitation effectively (Fig. 6). Also, if the time budget is set to be larger than necessary, the algorithm will choose an appropriate time to stop.
Fourth, the runtime of the HAIS algorithm to select incentives in a period depends on the number of incentives and the time budget. In the above-mentioned settings, the runtime is less than a second on a desktop computer (2.2 GHz quad-core processor and 16GB internal memory). So, it is feasible for an autonomous agent to quickly identify an appropriate applying policy.

Conclusions and future work
We discussed the incentive selection problem and outlined an approach that helps requesters in crowdsourcing projects with a fixed budget maximise their utility. Then, we formalised the problem as a batched 2d-budgeted groupbased MAB and introduced an algorithm (HAIS) to solve this effectively. Our algorithm is adaptive and performs efficiently in a wide range of different cases without the need to tune its predefined parameters. Although HAIS is specifically designed for incentives in the form of contests, it can also be used with other types of incentives where the group size is 1 (i.e., there are no contests, such as paying for performance or using bonuses). HAIS significantly outperforms the state-of-the-art approaches in simulations. Additionally, our results also suggest several guidelines for using this algorithm in practice. Regarding other applications of our work, the model proposed and the algorithm developed can be applied in other domains with a group-based nature such as in schools, companies, or organisations (i.e., finding the most effective groups of students or employees to work or study together).
Although HAIS is an important initial step towards solving the incentive selection problem, there are a number of areas of further work. From a practical perspective, we have systematically explored the key determinants of the behaviour in a series of controlled experiments. Such experiments are a necessary first step to understanding this complex design space. They help us discover the key influences on behaviour and performance. These insights can then be deployed in real-world environments and applications. This is a significant undertaking, but our results provide an excellent foundation for this work. From a more conceptual perspective, there are a number of areas to explore. First, our current model assumes that time steps are homogeneous and a new incentive can be started only when all previous ones have completed. However in some real world settings, the durations of the incentives (e.g., the time to run a contest) might be heterogeneous and variable. The difference in the durations might be large when some incentives are individual-based (e.g., paying for performance) and some others are contests with large group sizes (e.g., 20 users). So, within a given period, groups that finished early will have to wait until all other groups in the period are finished so that the algorithm can move to the next period. Thus, addressing this limitation would shorten waiting times and thereby the total time used by the algorithm. Additionally, this could improve the overall performance as the algorithm has more time to conduct stepped exploitation, especially when the time budget is limited.
A second issue is that the model assumes that the cost of applying an incentive is the same at all times. This may be limiting in more general settings. To motivate top users to continue performing tasks, for example, requesters in crowdsourcing projects might use contests whose prize values depend on the performance of the winners. For instance, instead of paying a fixed £3 to the best user, they could pay from £2 to £4 to the best user depending on the number of tasks completed by this user. This might encourage the best user to do tasks even whey they have a steady top position in the leader board. Moreover, some incentives are inherently designed with variable payment such as paying for performance [23] or using bonuses [25]. Actually, in paying for performance, the more tasks a user completes the more money they earn. And in using bonuses, the bonuses provided depend on the algorithms used and might be different at different times. Therefore, expanding the model to cover the case of variable costs of applying an incentive will provide requesters with more options in choosing incentives to be used in the ISP.
To cope with these limitations, in the ongoing and future work, we will consider using other approaches such as MABs with delayed feedback or reinforcement learning. In more detail, to deal with the homogeneous time steps, MABs with delayed feedback [57] may be a good approach. They focus on MABs where the feedback (i.e., utility) of applying an incentive is not known immediately. By using this approach, in a time step, we do not have to wait for the feedback of all the incentives (pulled in this step). We can continue applying other incentives and consider the feedback of incentives being applied as delayed. Additionally, reinforcement learning [53] appears to be a promising approach as it can deal with not only the homogeneous and variable time steps but also the variable costs of the incentives. This is because reinforcement learning is designed to work with learning with delayed feedback and in a non-stationary environment (i.e., variable pulling costs). 1 B I i=1 c i /g i ), the regret in this step is: After the sampling step, suppose we only have I incentives left to move to the Hoeffding step (step 2). That is, I − I incentives are eliminated with L e level of confidence. We can calculate the probability of the best incentive (incentive 1) being eliminated after the sampling step. Suppose there exists incentive j such that d (1) 1,upper < d (1) j,lower . We then have: (1) j,lower ≤ μ j = 1 + L e 2 =⇒ P μ 1 ≤ d (1) 1,upper and d (1) j,lower ≤ μ j = 1 + L e 2 2 =⇒ P d (1) j,lower ≤ d (1) 1,upper ≥ 1 + L e 2 2 =⇒ P d (1) 1,upper < d (1) j,lower = 1 − P d (1) 1,upper ≥ d (1) j,lower ≤ 1 − 1,upper and d (1) j,lower ≤ μ j and d (1) j,lower ≤ d (1) 1,upper respectively. Thus, with probability 1 − 1+L e 2 2 (i.e., a small probability), the best incentive (incentive 1) is eliminated after step 1 . Now we consider step 2 when we use the Hoeffding's inequality for the remaining I incentives, which includes the best incentive with high probability. In HAIS, using a small γ and increasing the number of times the incentive is applied to reduce the probability bound (i.e., exp −2u i (t)γ 2 /β 2 i ) is the same as using a small probability bound and increasing the number of samples to reduce the number γ t . Suppose that after step 1, with a small probability bound (i.e., = exp −2u i (t)γ 2 /β 2 i ), the difference between the mean and the estimate of each incentive i will have lower bound and upper bound of (−γ 1 i , γ 1 i ). We will apply each incentive more times to make sure that with the same probability, the upper bound of each incentive i will be less than a certain level of accuracy, γ * . Therefore, after the Hoeffding step, with high probability (i.e., 1−2 ), we have the estimate of the mean value of each incentive i as follows: Thus, the regret in step 2 will be: where n (2) i is the number of times incentive i is applied in step 2, which can be specified by: i .
After this step, the predicted best incentive j (i.e., j = arg max i∈ [1,I ]μi ), will have the following property with high probability (i.e., (1 − ) 2 ): μ 1 − μ j = μ 1 −μ 1 +μ 1 −μ j −( μ j −μ j ) ≤ γ * +0+γ * = 2γ * . (27) In the stepped exploitation step (step 3), the algorithm results in reducing the gap ( μ j −μ j ) (i.e., the current predicted best incentive) by increasing the number of samples of incentive j . However, if the best incentive does not appear in the current predicted best incentive set, then step 3 does not improve the estimate of the best incentive. Thus the best bound which our algorithm guarantees in steps 3 and 4 (pure exploitation) will be μ 1 − μ j ≤ μ 1 −μ 1 +μ 1 −μ j − ( μ j −μ j ) ≤ γ * + 0 + 0 = γ * , (28) when we get the exact estimate for the predicted best incentive j (e.g., μ j −μ j = 0). Let b 1,2 be the residual budget after steps 1 and 2. Then the regret in steps 3 and 4 will be bounded by: Note that in the above regret, we ignore the fact that the estimateμ j will slowly converge to μ j in step 3 as the difference in Inequality (25) will always be O(γ * ). From (25) and (26) and Inequality (29) we have: Figure 17 shows an example of how Stepped -first runs in a simple case. The setting in this figure is the same as the one in Fig. 2. In the exploration phase (day 1) of this example, as it does not look at the group sizes (as per HAIS), it applies the incentives evenly (4 times each). After this period, suppose that the estimate of incentive 1 is higher than those of the other incentives. So, based on these estimates, Stepped -first identifies that incentive 1 is the best one. Compared with -first, it is better as instead of applying incentive 1 (the best incentive) 12 times with the residual budget of $48 as in -first, it distributes half ( 2 = 0.50) of this budget (that is $24) equally across the next three periods ($8 each on days 2, 3, and 4). Then, on day 2, it applies the best incentive (incentive 1) and updates this incentive's estimate. In so doing, it identifies that incentive 1 is not the best any more. Hence, on day 3, it applies incentive 2 (the new best incentive). We also suppose that the estimates after periods 3 and 4 are consistent with incentive 2 being the best, thus it simply applies this incentive in periods 4 and 5. Compared with the example of HAIS in Fig. 2, as HAIS eliminates the worst incentive (incentive 3) after the sampling step, it can spend more of its budget on exploring incentives 1 and 2. It applies the real best incentive (incentive 2) 6 times compared to Stepped -first, which only applies it 4 times. Hence, it better estimates incentive 2 and finds that this is the best incentive. In the exploitation phase, HAIS applies this incentive all the time. However, with Stepped -first, after the exploration phase, it applies incentive 1 twice (in period 2) before identifying and applying the real best incentive (incentive 2) with the residual budget in periods 3, 4, and 5. From this example, it can be seen that by exploring the incentives evenly without looking at their group sizes, Stepped -first over-explores the incentives with large group sizes and under-explores the other ones. Hence, it is easier to miss the best incentive with a small group size compared to HAIS. However, compared to -first, Stepped -first is likely to be better because it takes advantage of the residual time budget to exploit.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as