Learning and Drop Out in Contests: An Experimental Approach

We design an experiment to study investment behavior in different repeated contest settings, varying the uncertainty of the outcomes and the number of participants in contests. We find decreasing over-expenditures and a higher rate of ‘drop out’ in contests with high uncertainty over outcomes (winner-take-all contests), while we detect a quick convergence towards equilibrium predictions and a near to full participation when this type of uncertainty vanishes (proportional-prize contests). These results are robust to changes in the number of contestants. A learning parameter estimation using the experience-weighted attraction (EWA) model suggests that subjects adopt different learning modes across different contest structures and helps to explain expenditure patterns deviating from theoretical predictions. JEL Codes: C92, C91, C72, C73, D83


Introduction
Settings characterized by high uncertainty on outcomes are likely to impair fully rational decisions, which renders learning from previous attempts oftentimes not fruitful. As a consequence, this process may lead to an inefficient allocation of efforts in a trial-and-error search regime, that can easily result in losses (Dosi et al., 2001). A prominent example of such setting is the pharmaceutical industry (Pammolli et al., 2011), where the races for drug discovery lead to enormous investments by companies, that are not always fruitful and may induce companies to abstain from investing. 1 Similar industries, characterized by a high investment in research and development (R&D), may vary in the structure of earnings: if the competition is based on incremental innovation, then the successful agents manage to capture a larger market share (Breitmoser et al., 2010). In other cases, the successful competitor gains the whole market similar to a lottery (Sutton, 1998), e.g., by patenting new first-in-class drugs. The aim of this study is to identify, by means of experiments, how the payoff structure and the level of competition affect the learning and participation dynamics of agents.
Our candidate setting is the Tullock rent-seeking contest in which subjects compete for a single prize, whose assignment probability depends on the relative share of subjects' efforts (Tullock, 1980). In rent-seeking contests, subjects persistently deviate from what standard game theoretical models predict. A survey of the experimental contest literature by Dechenaux et al. (2015) has highlighted that contestants spend on average considerably more than the theoretical equilibrium. 2 Most studies find an overall decrease in expenditures when subjects repeatedly play this game (e.g. Cason et al., 2012), which is usually attributed to learning, without further specifying the process behind it. Another empirical regularity, that has not yet received much attention, is that many participants choose not to spend any resources to win the contest. By looking across a subset of experimental studies on winner-take-all contests (Abbink et al., 2010;Cason et al., 2012;Mago et al., 2016;Sheremeta, 2010;Sheremeta and Zhang, 2010;Price and Sheremeta, 2011;Sheremeta, 2011), we find that zero expenditures are indeed a frequent, sometimes modal, choice of participants. The fraction of zeros is higher in larger competing groups and persistent even in later stages of the experiments. If both stylized facts are the consequence of learning, then we should investigate more carefully this process, forming the main contribution of this paper.
To explore how the uncertainty on outcomes and the number of opponents affect learning strategies and behavioral patterns, such as zero expenditures, in contest settings over a long time horizon, we set up a laboratory experiment in which we compare, over sixty periods, participants' expenditures choices in the standard winner-take-all (WTA) Tullock contest versus a non-probabilistic equivalent proportional-prize (PP) contest.
The WTA contest allows only for one winner of the prize, whose winning probability is proportional to the share of own investments over the total group investments. Applications range from the seminal rent-seeking hypothesis by Tullock (1980), to political polls (Snyder, 1989), sport tournaments (Szymanski, 2003), patent races (Fudenberg et al., 1983;Harris and Vickers, 1985) and cryptocurrency mining (Dimitri, 2017). In the deterministic PP contest, contestants receive a fraction of the prize proportional to their share of total group investments. The PP contest provides a 'replication' of standard oligopoly settings by varying the payoff structure. The early work by Friedman (1958) constitutes a first attempt to use the PP contest to model the allocation of advertisement budget across media. Proportional-prize assignments are also observed in electoral schemes (Schram and Sonnemans, 1996), lobbying (Krueger, 1974) and labor compensation (Kruse, 1992). Under the assumption of risk-neutrality, both contest settings are equivalent in terms of equilibrium predictions. Varying the contest type and the group size of three and five contestants, we create a 2x2 experimental design, which allows us to test how styles of learning, and consequently contestant behavior, change in environments characterized by uncertainty over outcomes compared to those with a tighter link between effort and outcome.
We find that the average levels of effort in PP contests are well described by the standard game theoretical predictions. Conversely, in the WTA contests we are unable to distinguish total group expenditures between the two group sizes. The decline of average expenditures in the five-player WTA contests coincides with a significant increase in what we label 'drop outs', i.e. zero expenditures that are not justified either by the myopic best-response or weighted fictitious play. Drop outs are instead significantly less frequent in PP contests and, if anything, decrease over time. The distinct expenditure patterns found between the two settings suggest that differences in the contests' payoff structures affect subjects' learning process. As our main contribution to the literature on learning in games, we test this hypothesis by estimating the experience-weighted attraction model (EWA, Camerer and Ho, 1999). The results reveal that WTA contestants learn significantly more from their own past payoffs than players in the PP contests (experiential or reinforcement learning Roth and Erev, 1995). Moreover, our results support recent findings by Alós-Ferrer and Ritschel (2018) on subjects' frequent use of the reinforcement heuristic 'win-stay, lose-shift' rather than a more reasoned approach based on myopic best-response. Therefore, the strong reliance on experiential learning in WTA settings can explain both the decreasing expenditures over time and the increasing propensity of zero expenditures choices. The more often a WTA participant loses, the more she will discourage positive expenditures up to non-participation. Further analyses confirm that expenditures decline significantly with an increase in prior accumulated losses. Since experienced losses are more frequent in larger competing groups, expenditures are expected to decrease at a faster rate. These results carry practical relevance for contest designers, who wish to maximize repeated participation, and are comparable to empirical regularities found in industrial dynamics, so-called 'industry shakeouts', in which many firm's decide to drop out of an industry during its competitive expansion phase.
Previous experimental studies have explored subjects' behavior in contests, whose design or methods partially overlap with ours by varying the group size (e.g. Lim et al., 2014), the payout function (Chowdhury et al., 2014;Ghosh and Hummel, 2018), the matching protocol (Baik et al., 2015), but no one has so far explored the interaction between treatments varying the group size as well as the outcome uncertainty. More importantly, differences in behavior across contest structures have been proposed to stem from differences in learning (Fallucchi et al., 2013), which has not been rigorously tested yet. Similarly, the high fraction of zero expenditures has often been attributed to myopic best-reply without proper analysis. In this paper we investigate experimentally how expenditures dynamics, such as frequently observed zero expenditure choices, can be explained by distinct learning mechanisms across contest structures.
The paper is organized as follows: section 2 introduces the contest forms and offers a brief review of the related experimental literature on contests. We analyze subjects' behavior in the two contest structures and their equivalence in expected payoff terms. Section 3 presents the experimental design and procedures. The experimental results are presented and discussed in section 4. The first, descriptive, part of our result section highlights differences in group expenditures and in the fraction of zero expenditures. The second part presents the EWA model estimations and shows support for different learning modes across contest games. We conclude in section 5 with a discussion of our findings and highlight how these results may well represent some empirical regularities in winner-take-all settings outside the experimental literature.

Theoretical Background and Experiments
The Tullock model of rent-seeking (Tullock, 1980) is extensively used to model a variety of contests (Konrad, 2009). In the simplified model, often referred to as winner-take-all (WTA) or lottery contest, N agents compete for a prize of size V , where x i is the amount of expenditure of agent i and X is the aggregate expenditure. The individual profits π i depend on all agents' expenditures, the prize assignment and a homogeneous initial endowment denoted by e: (1) Therefore, the probability of one agent to receive the prize increases with own expenditures but decreases with the expenditures of others.
In an alternative version of the contest, also known as proportional-prize (PP) or share contest, the prize is not assigned to one agent only, but it is divided across all agents N proportionally to their own expenditures x i and the aggregate expenditures X. Thus, each agent with positive expenditure receives a share of the prize. The payoff function in this case is equal to: The two contests share the same expected payoffs and, under the assumption of risk neutrality, the same equilibrium predictions, where x * i = V (N − 1)/N 2 . However, the realized payoff in the WTA contest differs from the one in the PP contest due to the stochastic winner-take-all nature of the game.
As it is often difficult to capture the expenditures' dynamics with field data, laboratory experiments have become increasingly popular in recent years to characterize behavior in different contest settings. 3 Many experiments support pervasive over-dissipation in WTA contests paired with high heterogeneity in effort levels across contestants (e.g. Millner and Pratt, 1989;Sheremeta and Zhang, 2010;Mago et al., 2016). Based on a sample of thirty studies, Sheremeta (2013) report a median overbidding rate of 72% compared to the equilibrium predictions. Contrary to the WTA contests, PP contests display less variation in individual spending behavior and a quicker convergence over time towards to the predicted equilibrium level (Fallucchi et al., 2013;Chowdhury et al., 2014;Cason et al., 2018Cason et al., , 2010. Since it is common to focus on mean expenditures when analyzing overbidding in contests, the choice of zero expenditures has often been overlooked. We summarize the data of seven contest experiments, considering in total ten independent standard repeated WTA treatments (as specified in equation 1). Table 1 shows that zero expenditures are the modal choice in four-player WTA contests making up 12% of the total choices and are increasing over time (see figure 1). In the two-player settings the share of zero expenditures is lower (3.9%) and stable over time. Yet, most of them, especially in later periods, are not a myopic best-response to previous opponent choices. The meta-analysis incorporates data from ten treatments of seven publications using repeated standard WTA contest treatments. Two players: Abbink et al. (2010), Cason et al. (2012). Four players: Mago et al. (2016), Sheremeta (2010), Sheremeta and Zhang (2010), Sheremeta (2011), Sheremeta (2011).
We refer to these zero expenditures as 'drop out'. In four-player treatments on average 50% of the zero expenditures are 'drop outs' -a share that is increasing over time.  (Shupp et al., 2013), and utility from winning (Schmitt et al., 2004). Therefore, zero expenditures, are usually associated with the best-response to over-dissipation. This cannot be an exclusive explanation, given the collected evidence from prior studies. An alternative motivation, that we explore in this paper, is that WTA contestants choose zero investments because they encounter difficulties to adapt optimal expenditures levels due to the stochastic nature of the outcome.
We are aware of a handful of studies that analyze learning in repeated games with stochastic outcomes. They differ from our experimental setting and learning identification strategy in many aspects. Yet, they support the use of simple learning heuristics by decision makers. Gunnthorsdottir and Rapoport (2006) find that reinforcement learning (Roth and Erev, 1995) explains aggregate efforts in a two-stage group game with an inter-group lottery in the first stage. Reinforcement learning combined with directional learning (Selten and Stoecker, 1986) describe well individuals' behavior of a Tullock contest with group size uncertainty (Boosey et al., 2017). In addition, learning spillovers between PP and WTA contests are found in a within-subjects experiment by Masiliunas (2017).
Even though learning behavior in the PP contests has so far only received minor attention in the literature, the expected payoff structure provides a useful benchmark to observe behavior in the standard contest, and allows us to treat them as a special case of the more commonly studied Cournot oligopoly. 4 Evidence on learning from oligopoly experiments suggests that players employ a mix of sophisticated and imitative learning. For example, Bigoni and Fort (2013), with an application of a modified EWA model to a Cournot game under endogenous information disclosure, find that participants use a mixture of reinforcement, imitation and belief learning, with the latter accounting for the major share. 5 Lastly, learning models have been used to explain behavior in repeated auction experiments. From the bidders' perspective, auctions look similarly stochastic to lotteries, since the value of the prize is usually drawn for each bidder from a random distribution, thus bids submitted by rivals appear uncertain. In addition, overbidding is commonly observed in first-price auctions (Filiz-Ozbay and Ozbay, 2007). Experiential and observational learning is found to reduce overbidding in first-price common value auctions (Garvin and Kagel, 1994) while directional learning can explain repeated individual bids (Neugebauer and Selten, 2006).
Through comparison of the behavior in both contest structures in a laboratory setting, we can identify how the randomness of the outcome inhibits learning, and therefore impacts the expenditures' dynamics that we observe in stochastic environments. Our analysis extends to competition under different group size. The reason for this choice is twofold: firstly, previous evidence has shown that varying the number of active firms in an oligopoly industry has important implications on the level of competitiveness (e.g Huck et al., 2004); secondly, we check if the increase in the level of competition may exacerbate zero expenditures, as suggested by our meta-analysis, linked to lower earnings in proportional-prize contests or more frequent expected losses in winner-take-all contests. 4 The payoff structure of the PP contest resembles a Cournot oligopoly with iso-elastic inverse demand function plus a constant. See Engel (2007) for a meta-analysis of over 500 experimental studies of oligopoly settings. 5 Other studies on learning in oligopolies find evidence for imitative learning (Vega-Redondo, 1997) (Huck et al., 1999;Oechssler et al., 2016;Friedman et al., 2015), myopic best-response dynamics (Bigoni, 2010) and reinforcement learning (Jiao and Nax, 2017).

Experimental Design and Procedures
The experiment was conducted at the University of Nottingham using the software z-tree (Fischbacher, 2007) where 140 students from a wide range of disciplines were recruited through the on-line recruiting system ORSEE (Greiner, 2015). No participant took part in more than one session or had taken part in any previous contest experiments.
At the beginning of each session, participants were randomly matched into groups that remained the same for the whole experiment. We opted for this fixed group matching as it allows us to test whether participants myopically best respond to the choices of their opponents. 6 Moreover, this is the standard adopted in experiments that test for learning in other settings (e.g., Huck et al., 1999;Bigoni and Fort, 2013) and therefore allows us to compare the results of our PP contests with previous evidence of learning in oligopoly settings.
Participants did not know the identities of the other subjects in the room with whom they were grouped. They were given instructions for the experiment (reproduced in Appendix C) which were read aloud by the experimenter. Any questions were answered by the experimenter in private, and no communication between participants was allowed. No information passed across groups during the entire session.
In all sessions the decision-making part of the experiment consisted of sixty periods. 7 In each period subjects were endowed with 1000 points and competed to win a prize of 1000 points. Subjects simultaneously chose how many contest tokens to purchase, at the price of one point per contest token, and any points not used to purchase tokens were added to their total balance. At the end of the period each subject possibly received contest earnings which were added to their total balance. If none of the subjects bought any tokens the prize was not assigned. We adopt a 2x2 design where treatments differ in the group size, with three (3) or five (5) contestants, 8 and the contest structure, proportional-prize (P ) and winner-take-all (W ). 9 We conducted two sessions for each treatment, either with 15 or 20 subjects, resulting in ten independent observations in treatments with three-player groups and eight independent observations in treatments with five-player groups. A summary of the treatments is reported in table 2.
After each period, subjects were reminded of their own choice and informed of the total expenditures of the other members of the group to which they belong and their own earnings. We opted for this partial feedback disclosure to rule out imitative behavior among contestants, although we 6 Since the contest literature commonly liquidates zero expenditures as the best-response to high expenditures of opponents, the fixed matching protocol is suited to test this line of thought. 7 The period length in most reviewed experimental studies does not exceed sixty periods. Exceptions are the studies by Friedman et al. (2015) and Oechssler et al. (2016) of continuous time games for 1200 periods. 8 Baik et al. (2015) show that the fixed matching protocol leads to collusive behavior when groups are equal to 2, but not for larger groups. 9 3P, 5P, 3W and 5W are the treatment notations used throughout the rest of the paper.
could not rule out the other simple behavioral rule of imitating the average expenditures by opponents in the previous round. 10 Subjects accumulated points across the sixty periods and at the end of each session were paid in private and cash. Earnings averaged £9.30 for a session lasting about 60 minutes. At the end of the experiment, we conducted a socio-demographic questionnaire in which we also elicited risk attitude using a survey measure validated in a representative subject pool (see Dohmen et al., 2011). 11 The two contest structures share the same expected payoff and, under the assumption of risk neutrality, the same Nash equilibria. Introducing risk aversion could potentially alter theoretical predictions, however the direction and extend of risk aversion on contest expenditures remains ambiguous under general conditions (Skaperdas and Gan, 1995;Konrad and Schlesinger, 1997). Also in the experimental literature there seems to exist no consensus regarding the effect of risk attitude on contest expenditures (see for example Shupp et al., 2013;Mago et al., 2013). To be able to compare both contest structures, we thus need to maintain the assumption of risk neutral contestants. The theoretical prediction of symmetric group expenditures under risk neutrality, given by x * = N V (N − 1)/N 2 , corresponds to 666.6 points for three-player contests and to 800 points for five-player contests. Hence, predicted equilibrium expenditures at individual level are 222 and 160, respectively.

Results
We lay out the results in two sub sections. In section 4.1 we illustrate the spending-and participation behavior of contestants of all treatments. In section 4.2 we illustrate how EWA estimation 10 We check the fraction of players whose choice imitates average opponents' expenditures of previous rounds in Appendix B table 9. The fraction of imitation is significantly lower in WTA treatments and not increasing over time. A possible explanation for such difference is that imitating the average in PP contests provides an intuitive strategy that reduces inequity concerns. In WTA treatments imitating the average does not guarantee such outcome and might suppose not having the higher likelihood of winning in the group. 11 Subjects answered the following question on a Likert scale from 1 to 7: 'How willing are you to take risks, in general? Unwilling to take risks (1) Fully prepared to risks (7)'. We do not find significant differences of risk scores across treatments. Appendix A addresses the relevance of risk attitudes on expenditures in more detail.
results differ across treatments and analyze expenditures using Tobit mixed effect regressions. Reported p-values (p) for within-treatment comparisons between the two halves of the experiment are based on Wilcoxon matched-pairs signed-rank tests, while for between-treatment comparisons we report p-values of two-sided Wilcoxon rank-sum tests, treating each group as a single, independent observation.

Group Expenditures and Participation
Result 1. (a) Average total expenditures in proportional-prize contests increase significantly with an increase in the group size. (b) Average total expenditures in winner-take-all contests, contrary to predictions, do not significantly increase with an increase in the group size. Figure 2 shows the average group expenditure patterns of all treatments relative to their predicted theoretical equilibria. In all cases, the initial expenditures lie substantially above the Nash equilibrium predictions, and are higher in larger groups. In the PP treatments, mean expenditures decline quickly to a level close to the equilibrium and exhibit no noticeable time trend thereafter. This result is in line with previous experimental evidence. Instead, we find the results of the WTA contest surprising: over-expenditures compared to the predicted equilibrium levels are persistent throughout the experiment, but average expenditures do not differ across different group size. Moreover, over the longer horizon, total expenditures are lower in larger groups, sometimes even below the level predicted by the Nash Equilibrium. We report in table 3 the average group expenditures for the first, second half and overall periods and p-values of within-and between-treatment comparisons. In the PP contests we find that average total expenditures are significantly higher in 5P than in 3P for all intervals considered (all p 0.01). Contrary to the PP contests, between-treatment comparisons of WTA contests at group level confirm the pattern observed in figure 2, with an overall similar level of expenditures for all intervals considered (all p 0.25). 12 This last finding contradicts theoretical results that predict higher group expenditures for larger groups. Previous studies that explored behavior in contests with different group size have supported the theoretical claim, yet considered only one-shot decisions (Anderson and Stafford, 2003) or ten Result 2 (a) The fraction of zero expenditures is significantly higher in the winner-take-all contests than in the proportional-prize contests and increases significantly over time for larger winner-takeall groups.
Our results question the hypothesis that expenditures in the WTA contests converge towards group size dependent equilibria. We thus look for other justifications that explain the decrease in expenditures in another common finding from contest experiments: the zero expenditures. To get a first glimpse of the prevalence and dynamics of zero expenditures in contests, we compute the total fraction of zero expenditures across treatments. The total share of zeros increases with group size (3P vs. 5P: p = 0.01, 3W vs. 5W: p = 0.01) and is more pronounced in the WTA contest (3P vs. 3W: p < 0.01, 5P vs. 5W: p < 0.01) reaching up to 40% in the late game of 5W. Moreover, as shown by the black lines in figure 3, the fraction of zero expenditures is stable across time in 3W (periods 1-30 vs. 31-60 p = 0.92) and increases in 5W (periods 1-30 vs. 31-60 p = 0.09). Result 1 can thus be explained by the increasing fraction of zero expenditures in 5W which lead to a faster decrease in average group efforts than in 3W.
The first justification given to the pronounced fraction of zero expenditures in the WTA contests could be that players expect their opponents to overbid. 14 Since we cannot observe players' ex- pectations on future opponent expenditures, we assume that expectations are formed based on past opponent behavior. Thus, we assess if zero expenditures are a best response (BR) given the history of opponents' decisions using two forms of 'weighted-fictitious play'. A choice j of player i in period t + 1 is justified under weighted fictitious play if j maximizes the following expression: The parameter φ acts as a discount factor. If φ = 0, then the expression reduces to the myopic best-response case of π i s j i , s −i (t − 1) which denotes the hypothetical payoff of player i choosing an expenditures level j given the choices of its opponents s − i at time t (reported as the dashed lines in figure 3). At the other extreme, when φ = 1, all hypothetical past payoffs from strategy j are weighted equally for each period. In this case the best choice is the one that would have resulted in the highest average payoff across all rounds played, also known as 'fictitious play' (reported as the gray lines in figure 3). Figures 3a-d show the fraction of zero expenditures over time for each of the four treatments. Most zero choices can neither be justified by myopic best-responses nor by fictitious play (average fraction of zeros not justified by myopic best-responses: 66% in 3W, 62% in 5W; by fictitious play: 87% in 3W, 81% in 5W). 15 Yet, myopic best-responses account for more choices than fictitious play, consistent with the previous findings by Rockenbach and Waligora (2016) that WTA contestants hold myopic beliefs. We hence focus on the zero expenditures that are not explained by a myopic best-response and define them as 'drop outs'. In case of a 'drop out' a player decides to spend nothing even if it is payoff maximizing to bid a positive amount based on a myopic best-response. The average fraction of drop out per round can be assessed from figure 3 as the difference between the fraction of total zero expenditures and the fraction of zero expenditures under a myopic best-response.
Result 2 (b) The share of drop outs is higher in winner-take-all contests and increases in the fiveplayer treatment.
Drop outs in PP contests are more frequent in larger groups (3P vs. 5P: p = 0.04) yet their fraction is relatively low when compared to WTA treatments and stable over time (average fraction of drop out over all choices in 3P: 1%, 5P: 8%, 3W: 11%, 5W: 21%). In the WTA contests the drop out fractions are higher and differ not only in group size but also with respect to the PP contests (3P vs. 3W: p < 0.01, 5P vs. 5W: p = 0.01, 3W vs. 5W: p = 0.02). The quota of drop out is highest in 5W and increases significantly over time (periods 1-30 vs. 31-60 p = 0.04) while we do not find such an increase in 3W (periods 1-30 vs. 31-60 p = 0.22).
To characterize the strength of zero bids on the individual level, we show in figure 4 the percentage of periods in which contestants choose to bid zero after their initial zero effort choice. For each treatment, players that display at least one zero bid are ordered based on the frequency of their subsequent zero effort choices in the remaining periods. The frequency of zero efforts varies across contestants implying that we cannot equate period-specific zero expenditure choices with contestants abstaining from the participation throughout the remaining part of the experiment. Yet, for some contestants choosing not to bid becomes an important strategy, especially in the five-player WTA treatment, where fourteen players continue to bid zero in more than half of the remaining periods.
From the previous analysis we deduce that the common decreasing pattern in average expenditures, which we observe in all treatments, may be driven by different behavior, depending on the contest structure. Although in proportional-prize contests we are far from having the whole contestants managing to achieve the equilibrium level, the decrease in average expenditures hints towards a process of learning to play optimal strategies. 16 Conversely, the decrease of expenditures in the winner-take-all contests can largely be attributed to an overall 'drop out' effect, that is not explained by forms of fictitious play. We hypothesize that the differences in the contests' payoff structures have non-negligible effects on the learning process. The results from the EWA estima- 16 The heterogeneity of plays are in line with the previous findings in oligopoly experiments: e.g. Rassenti et al. (2000).

EWA Model Estimation and Interpretation
In the previous section we have shown that the structure and group size of the contest affect total expenditures. Overall, the decrease in expenditures over time suggests that subjects display different behaviors across contests. We provide further insights to verify whether differences across treatments are driven by a different learning path. We estimate for each treatment the EWA model (Camerer and Ho, 1999). In our estimations, we group the 1001 expenditure choices into K = 11 bins of equal distance and round all choices to the closest bin to facilitate comparability. 17 Every contestant i forms a set of 'attractions' A j i (t), which get recursively reinforced or weakened after every round. Attractions are updated as follows: where s j i (t) refers to the actual strategy j chosen by player i in period t while s −j i (t) denote all the possible strategies in the same period. Defining s −i (t) as the strategy vector chosen by all other players, the payoff of player i choosing j in t is given by π i (s j i (t), s −i (t)). Similarly, E[π i (s −j i (t), s −i (t))] denotes the hypothetical payoff of player i that would have been expected for any possible strategy j given the strategies of all other players. Since the prize assignment in the PP contest is deterministic, this expression simplifies to π i (s −j i (t), s −i (t)) and is equivalent to the expected hypothetical payoff of the WTA contest given risk neutrality. N (t) is a weight on past experience. The faster N (t) increases in t, the less players focus on immediate current payoffs at time t. The weights update with the following rule: The parameter κ determines the growth rate of attractions, which reflects how quickly players lock into a strategy. The current attractions of the array of possible strategies J determine the probability of player i choosing strategy j in the next period t + 1. A logistic transformation links previous attractions to the choice probabilities (equation 6). Thus, the higher a contestant's past attraction for a specific strategy, the higher the probability that this strategy will be pursued.
Attractions for each strategy are updated via weighting the previous experience, the current forgone payoffs, and the current received payoff (given by the summands in equation 4). Current forgone payoffs are the payoffs that could have been expected if the contestant had chosen differently by keeping the opponents' strategies fixed. Formation of attractions via evaluating forgone payoffs is equivalent to belief learning (a version of weighted fictitious play, Brown, 1951) whereas focusing exclusively on realized payoffs (δ = 0) reduces the model to reinforcement learning (Roth and Erev, 1995). Thus, the EWA model incorporates two canonical learning models via the parameter delta (δ). 18 We show in table 5 for each treatment the simulation results of the EWA expenditures distribution using varying deltas (0, 0.5, 1) and the true expenditures frequencies. As the delta increases towards belief learning the simulated choices show less variation and roughly resemble the true expenditures frequencies in the PP contests. The true distribution of expenditures in the WTA Weight of hypothetical payoffs. The higher δ, the more (less) weight is assigned to own hypothetical (realized) payoffs. δ = 0 (δ = 1) corresponds to pure reinforcement (belief) learning.
N 0 ∈ 0, Weight of pre-game attractions. Indicates how many periods of experience are required to offset pre-game attractions. The boundaries ensure that the weights of N (t) are increasing in t.
Growth property of attractions. If κ = 0, then current attractions are a weighted average of past attractions and past payoffs. If κ = 1, attractions accumulate and can grow larger than present payoffs.
φ ∈ [0, 1] Decay rate of past observations. The smaller φ, the faster are past observations discounted and the more weight is assigned to the present.
λ ∈ [0, ∞) Sensitivity measure of attractions. The higher λ, the more do present attractions matter to determine choice probabilities of future actions. λ = 0 implies that choice probabilities are not influenced by attractions.
contest is more disperse which is why we assume that these contestants rely less on belief learning than PP contestants.
Result 3. Proportional-prize contests allow for a mixture of reinforcement and adaptive learning.
In winner-take-all contests, learning is mostly driven by previous own payoffs.
For each treatment the EWA model is estimated over the first half and the complete sample via Maximum Likelihood. 19 Estimated parameters and their theoretical domains are summarized in table 4. We refrain from freely estimating initial attractions by following the approach used in Ho et al. (2008) and choose the initial attractions to maximize the likelihood of observing first period choice frequencies. 20 19 In the EWA model we abstain from estimating separately the second half of the game due to possible information loss (the estimation of the second half would require initial attractions of period 31 to reflect the complete knowledge formed in the first half of the game). 20 The traditional EWA model considers only choices shaped by past attractions while later versions add sophistication to allow for choices based on expectations of opponents behavior (e.g. Camerer et al., 2002). A sophisticated player forms a best-response from forecasting the actions of all other players. Since including additional free EWA parameters increases the danger of overfitting the model, we instead check for sophistication via the Quantal Response Equilibrium (QRE) model of McKelvey and Palfrey (1995). In the QRE model, sophistication is captured via the precision parameter λ Q . The higher λ Q , the more sophisticated are the overall choices. We find significantly lower λ Q for WTA treatments. The estimation results of the QRE model can be found in Appendix D.
The first three columns show the relative frequency of expenditures for the EWA simulations, the forth column shows the true expenditures distribution. For each setting 600 players were simulated and experimental specifications were kept. Each treatment is simulated for δ = 0 (reinforcement learning), δ = 0.5 (mix between reinforcement-and belief learning), and δ = 1 (belief learning). All other EWA parameters were kept constant at moderate levels for each simulation (λ = 1, φ = 0.8, N 0 = 0, κ = 0.8, initial attractions=0).
We report in table 6 the estimated parameters with their clustered standard errors and confidence intervals in parentheses. 21 The main parameter of interest in our estimation analysis is delta, that indicates the degree of belief learning used in the game. In the PP contests the deltas are significantly greater than zero, between 0.59 and 0.67, suggesting that players adopt a mixture between reinforcement learning (considering own realized payoffs) and belief learning (considering all own hypothetical payoffs) in the game. On the contrary, in WTA contests players rely mostly on reinforcement (experiential) learning, as deltas are significantly closer to zero. This is especially true for the early stage of the game (0.02 for 3W and 0.13 for 5W). As players get more familiar with the game, they shift slightly towards belief learning in all treatments. Nevertheless, the difference between PP and WTA contests remains substantial. One can argue that belief learning is more complex, since it requires the player to evaluate for each possible strategy the expected payoffs given the opponents' set of strategies. The evaluation of hypothetical scenarios could be more difficult in WTA contests due to the discrepancy between expected and realized payoffs. In fact, it has been shown theoretically that individuals lock-in at inefficient levels of expenditures in low delta scenarios (Pangallo et al., 2017).
Other parameter estimates are similar across various estimations. We find N 0 below one in all estimations, which indicates that pre-game attractions are offset completely by first period attractions. 22 Kappa (κ), which measures the growth property of attractions, is significantly different from zero in the first half of the game, indicating that the importance of past attractions grows over time. The decay rates of past attractions indicated by phi (φ) are significantly different from zero, but similar across and within treatments, on average between 75% in the PP contest and 81% in the WTA contest. Although the difference is negligible, this indicates that players may rely more on past experience in WTA contests. This result is consistent with the idea that present payoffs in these treatments reveal less useful information. Finally, lambda (λ) is significantly different from zero for all specifications, indicating that contestants' choices are influenced by attractions formed in past rounds.
The EWA parameter estimation uncovers a noticeable difference in learning between the PP and WTA contests. A strong reliance on own realized payoffs by subjects in WTA contests conveys that current decisions depend on the success of the previous ones. Victories strongly reinforce the probability of playing the corresponding level of expenditures, while losses make the corresponding expenditures level less attractive in the forthcoming periods. If a subject experiences frequent losses, positive expenditures levels will, over time, become less appealing to the advantage of zero 21 Standard errors are clustered at individual level. Group level clustering is not necessary, due to random assignment of individuals to groups (see Abadie et al., 2017). A set of less conservative estimations without clustering adjustments at individual level is proposed in Appendix B table 10. 22 An influence of pre-game attractions is not expected from the experimental design, since none of the participants was previously familiar with the contest setting. expenditures. Following this thought, the fraction of zero expenditures should be higher in contests with a higher share of non-winners. This reconciles with our results in section 4.1 where the fraction of drop outs is significantly higher in WTA treatments than in PP treatments and increases significantly over time in larger groups. If WTA contestants rely indeed on previous own experiences, then we should observe a drop of expenditures after a series of losses. We expect a negative relationship between the series of losses prior to time t and the expenditure level at time t. We assess this relationship, using a set of Tobit mixed effect models. 23 The model assumes that expenditures levels are left censored at 0 with random effects at individual and group level. In all models we regress the expenditures at t on previous own expenditures, previous opponents' expenditures (linear and squared) and a variable capturing time trend. 24 We check the relationship between prior losses and current expenditures via the variable loss streak defined as the accumulated, (negative) payoff from consecutive losses prior to time t relative to contestants' endowment. After every incurred loss, the variable decreases by the foregone profits that would have been received by choosing to not spend the endowment. Consequently, a loss streak remains unchanged for zero expenditures. If the contest has been won in the previous period, the contestant's loss streak is reset to zero. Models (1), for 3W, and (4), for 5W, in table 7 contain only control variables. In both lotteries the influence of prior expenditures on current choices is positive and significant, while the effect of period advancement is negative and significant. In models (2) and (5) we find a positive and significant effect of the loss streak, indicating that the accumulation of losses indeed leads to a decrease in expenditures. This effect is further analyzed in (3) and (6), where we additionally control for the contestants' gender. As a an exploratory result, we find that women spend on average more than men 25 . From (3) we find that splitting the loss streak variable with respect to gender does not lead to significant effects on expenditures. Yet when increasing the group size (6), the accumulation of prior losses significantly decreases the expenditures for both genders, for women more sharply than for men. This last result has been similarly observed in experimental tournaments (Buser, 2016).
The regression results support the claim that decreasing expenditures in the WTA contest are driven by previous lottery outcomes. With the cumulation of losses, contestants tend to lower their expenditures and may, over time, drop out of the contest. 26 Since individual losses accumulate longer when facing more opponents, the decrease of expenditures is more pronounced in 5W. Although loss aversion may be thought as the mechanism that leads to lower expenditures (as shown by Kong, 2008;Shupp et al., 2013;Chowdhury et al., 2018), we should be careful in distinguishing the role of loss aversion from repeatedly experiencing losses. As pointed out in many studies summarized by Kermer et al. (2006), individuals overestimate the impact of losses in prospect compared to losses they realize, and learn from experience that losses have a less emotional impact than estimated ex-ante. Therefore, while loss aversion certainly has an impact on expenditures, 24 The use of lagged expenditures as explanatory variable requires stationary panels, which we test for using the Levin-Lin-Chu unit root test for panel data (Levin et al., 2002). We reject the hypothesis that WTA expenditures follow a unit root process (p<0.00). 25 In line with findings from prior contests experiments such as Price and Sheremeta (2015) and Brookins and Ryvkin (2014). 26 To analyze the effect of loss cumulation on the likelihood of zero expenditures we report a random effects logit regression in Appendix B table 11. We find that a prior loss streak significantly increases the probability of zero expenditures in 5W. For 3W we find that the effect goes in the same direction but is not significant, which might be due to the lower loss cumulation in 3W. this is not the only effect that impacts the decline over-time. The EWA estimations and the results in table 7 incorporate in addition the effect of experiencing losses. Multilevel Tobit mixed effects model; Standard error in parenthesis. P-values: * 0.10, ** 0.05, *** 0.01, Obs. is the number of observations.

Final Discussion
Our contribution to the literature is two-pronged. First, we offer a clean comparison of how subjects behave across different contest structures. Similarly to previous studies of the PP contest and oligopolies, we find that the expenditures in this setting converge well to standard predictions. Unlike the PP contests, we observe that group size changes between from three to five players does not affect total investments in the WTA contests. Secondly, we assessed the role of learning as one of the possible explanations for behavioral differences across different contest structures.
The behavioral discrepancy between contest types might be connected to the probabilistic prize assignment in the WTA treatment that influences how contestants form their choices. We hypothesize that PP and WTA contestants use distinct learning strategies that may also explain another expenditure peculiarity, that tends to be overlooked: the modal choice in WTA contests is oftentimes zero.
We find that varying the group size from three to five players does not affect total investments in WTA contests, in discord with theoretical predictions. The decrease of investments in large-group WTA contests is influenced by an increasing fraction of zero expenditures. A substantial share of these zero expenditures is not justified by myopic best-responses (or other forms of fictitious play) and defined by us as 'drop out'. Even though the drop out rate is lower for PP contests, the average expenditures converge to theoretical predictions which suggests that spending behavior across contests is formed by distinct learning processes. A parameter estimation of the EWA model in all treatments indicates that WTA contestants decide mostly based on the information gathered from their own realized payoffs. Since success in the WTA contest is stochastic, subjects who base their investment decision entirely on their previous decisions are less able to adapt their strategies in a payoff-optimizing fashion. On the contrary, participants in the PP contests rely on a mixture of own realized payoffs as well as foregone payoffs. This may be facilitated by the deterministic nature of the payoffs in the PP contest. The distinct learning patterns estimated in the two contest environments, do not significantly change over time and are robust to changes in the number of players.
Repeated losses, that subjects face in the WTA contests, decrease over time the reinforcement of positive expenditure levels and consequentially zero expenditures, irregardless of being a myopic best-response, become more appealing. Our regression results add to this thought by showing that the cumulation of prior losses leads to a significant decrease in expenditures more pronounced in bigger groups. The higher drop out rate in the five-player WTA treatment is presumably driven by the faster accumulation of individual losses. As a consequence, an increase in the group size does not necessarily increase total rent-seeking.
In a society, full of embedded winner-take-all contests, our results obtain practical relevance. First, it may be beneficial in rent-seeking situations, such as lobbying, that an increase in WTA contestants does not significantly change total rent-seeking effort. Contrary, in case a high sum of efforts is favored, such as in a philanthropic fund-raising lottery, increasing the pool of participants might not lead to the desired effect if the individual loss probability is increasing simultaneously. Second, decisions not to invest can be aggregated on a macro-level to the so called 'industry shakeout', i.e. a significant reduction in the number of active firms during the expansion of new industries (see Gort and Klepper, 1982;Klepper, 1996Klepper, , 1997. One traditional explanation for firm exit dynamics postulates that market participants use Bayesian updating to learn their true ability of operating in the market (Jovanovic, 1982;Jovanovic and Nyarko, 1995). Thus, firms decide to exit assessing their past performance. Our finding, that participants invest less after losses and potentially drop out of the contest due to non-reinforcement of positive payoffs, takes a similar line. At the same time we stress the difficulty of players to form rewarding learning strategies in highly uncertain domains, such as pharmaceutical R&D.
The presented work calls for a better understanding on how WTA contestants learn in highly uncertain environments with large group sizes, such as pharmaceutical R&D, and how their decisionmaking abilities can be improved. In addition, the observed 'drop out' effect and its relationship to group size and possibly other contest characteristics deserve increased attention to better bridge the gap between experimental findings and theoretical explanations of contestants' behavior. Huck, S., Normann, H.-T., and Oechssler, J. (2004). Two are few and four are many: number effects in experimental oligopolies. Journal of Economic Behavior & Organization, 53(4): 435-446. Jiao, P. and Nax, H. H. (2017). Heuristics adoption and abandonment: Experimental evidence from cournot contests. Technical report.

Appendix A. Risk Assessment
The equilibrium predictions are derived in both contest types under the assumption of risk neutrality. However, the role of risk aversion might alter the expected effort levels and hence the conclusions drawn from a direct comparison between contest types. Theoretical evidence of the effect of risk aversion on efforts is ambiguous (Konrad and Schlesinger, 1997), since results vary with the assumptions on risk aversion, contest success functions and homogeneity of rent-seekers (Treich, 2010). Early experimental work by Millner and Pratt (1991) on WTA contests finds lower mean dissipation rates for risk-averse contest groups, not significantly different from risk-neutral equilibrium predictions. Other authors find risk aversion to significantly reduce efforts in WTA contests, but averages remain still above the risk-neutral equilibrium predictions (e.g. Anderson and Freeborn (2010); Mago et al. (2013); Sheremeta (2011)).
A direct comparison of WTA and PP contests does not find risk aversion to significantly drive down efforts (Shupp et al., 2013). In contrast, Cason et al. (2018) find negative effect of risk aversion on effort in various Tullock contests (with additive noise). Therefore, we believe that more empirical support that risk-aversion matters is needed.
To control for risk aversion in our experiment, we ask subjects to evaluate how willing they are to take risks, in general, using a Likert scale from 1 (unwilling to take risks) to 7 (fully prepared to take risks). 27 The average 'Risk score', reported in table 8, is higher in PP contests than in WTA contests, although not significantly different across treatments. Hence, differences in efforts between treatments of equal group size seem not depend on different risk scores across treatments.  Dohmen et al. (2011) validate this measure in a representative subject pool to measure individual risk inclination.  EWA Maximum Likelihood estimation with restricted parameters reported. Standard errors of the restricted metric, in parenthesis; 95% confidence interval of restricted estimates in brackets. P-values (H 0 = 0): * 0.10, ** 0.05, *** 0.01, Obs. is the number of observations, -LL is the negative log-pseudolikelihood, AIC and BIC are the Aikaike-and Bayesian information criterion, respectively. Random effects logit regression; Standard error clustered at player level in parenthesis. P-values: * 0.10, ** 0.05, *** 0.01, Obs. is the number of observations. All effort related regressors are normalized (divided by 1000).

Appendix C. Instructions
Welcome! You are about to participate in an experiment in the economics of decision making. Please do not talk to any of the other participants until the experiment is over. If you have a question at any time please raise your hand and an experimenter will come to your desk to answer it. The experiment will consist of 60 periods. In each period you will have the chance to earn points. At the end of the experiment each participants accumulated point earnings from all periods will be converted into cash at the exchange rate of 0.015 pence per point. Each participant will be paid in cash and in private. At the beginning of the experiment you will be matched with two [four] other people, randomly selected from the participants in this room, to form a group of three [five]. The composition of the group will stay the same throughout the experiment, i.e. you will form a group with the same two [four] other participants during the whole experiment. Your earnings will depend on the decisions made within your group, as described below. Your earnings will not be affected by decisions made in other groups. All decisions are made anonymously and you will not learn the identity of the other participants in your group.

Decision task in each period
Each period has the same structure. In each period the three [five] participants in each group will be competing for a prize of 1000 points. At the beginning of the period each participant will be given an endowment of 1000 points. Each participant has to decide how many of these points they want to use to buy contest tokens. Each contest token costs 1 point, so each participant can purchase up to 1000 of these tokens. Any part of the endowment that is not spent on contest tokens is kept by the participant. Each participant must enter his or her decision via the computer. An example screenshot is shown below.
Once everybody has chosen how many contest tokens to purchase, the computer will calculate each participants share of the prize of 1000 points. Your share of the prize will depend on how many contest tokens you have purchased and the total number of contest tokens purchased in your group.

[In Proportional-Prize Contest Description]
If nobody in your group purchases any contest tokens, none of you will receive a share of the prize. Otherwise, the computer will calculate each participants share of the prize so that your share of the prize will be equal to the number of contest tokens that you have purchased divided by the total number of contest tokens purchased in your group. That is, if you buy a number of X contest tokens and if the other two participants in your group buy Y and Z contest tokens each, then your [That is, if you buy a number of V contest tokens and if the other four participants in your group buy W, X, Y and Z contest tokens each, then your share of the prize will be V/(V+W+X+Y+Z).] Your contest earnings will be your share times 1000 points (rounded to the nearest point).
[In Winner-Take-All Contest Description] If nobody in your group purchases any contest tokens, none of you will win the prize. Otherwise, the computer will determine which participant wins the prize in a way that will ensure that the probability that you will win the prize is equal to the number of contest tokens that you have purchased divided by the total number of contest tokens purchased in your group. That is, if you buy a number of V contest tokens and if the other two participants in your group buy W and X contest tokens each, then the probability that you win the prize will be V/(V+W+X). [That is, if you buy a number of V contest tokens and if the other four participants in your group buy W, X, Y, Z contest tokens each, then the probability that you win the prize will be V/(V+W+X+Y+Z).] Your contest earnings will be either 0 (if you do not win the prize), or 1000 (if you win the prize).
Your point earnings for the period will be calculated as follows: point earnings = 1000 contest tokens purchased + contest earnings After all participants have made a decision, a result screen will appear. An example screenshot is shown below. This is like the screen you will see during the experiment except that the blacked out fields will be filled in according to the decisions made in that round. Each participant will be informed of the points remaining from their endowment after making their purchase, the number of contest tokens they have purchased, the sum of tokens purchased by the other participants in their group, their contest earnings and their point earnings for the period. In addition, the results screen will inform each participant of his or her accumulated points from all periods so far.

Beginning of the experiment
If you have any questions please raise your hand and an experimenter will come to your desk to answer it.
We are now ready to begin the decision-making part of the experiment. Please look at your computer screen and begin making your decisions.

Appendix D. QRE Model
In result section 4.2 we report that choices made by WTA contestants rely more on own past experience than the ones of PP contestants. This appendix adds to the behavioral differences across treatments by exploring the degree of contestants' 'sophistication' using the homogeneous Quantal Response Equilibrium (QRE) model of McKelvey and Palfrey (1995). Sophisticated players base their choices on the evaluation of expectations on future actions of opponents (Camerer et al., 2002). The estimation results complement the EWA findings: sophistication increases over time, yet PP contestants act more sophisticated than WTA contestants.
for k = 1 · · · 11 (7) Let p(x k ) denote the probability of choosing the k th bin x k for an arbitrary player. Let E p [π(x k )] be the expected payoff from choosing bin x k conditional on all N − 1 opponents playing the mixed strategy p over the K bins. 28 The K probabilities are the solutions of a logit QRE specification in equation 7. λ Q ∈ [0, ∞) is called precision parameter. While λ in the EWA model measures the impact of past attractions, λ Q measures the impact of expectations on choice probabilities. Table  12 describes all reported QRE parameters.
The estimation results in table 13 show that the WTA treatments exhibit a lower value of λ Q , indicating that players' choices depend less on payoff expectations given the anticipation of opponents mixed strategies and are therefore, on average, less sophisticated.
We use likelihood ratio tests to verify heterogeneity in λ Q within-and between-treatments. To test whether the impact of expected payoffs on choice changes over time, we compare the loglikelihood of a constant λ Q across all periods with the log-likelihood of a differing λ Q between the two halves of the session (equation 8).
Similarly, for between-treatment comparisons (3W-5W, 3P-5P, 3P-3W and 5P-5W) we take for each λ Q the sum of the log-likelihood of both treatments and find the new optimal λ Q and its corresponding log-likelihood (equation 9). We then compare the resulting LL T 1+T 2 with the sum of the log-likelihoods from estimating separate λ Q parameters for separate treatments (LL T 1 + LL T 2 ). Under the null hypothesis that λ Q is the same across treatments, both test statistics are 28 To be coherent with the EWA formulation, we use 11 bins of equal distance. λ Q Sensitivity measure of attractions. The higher λ Q , the more best-responses rely on forecasts of opponents behavior. For λ Q = 0 the solution is p(x k ) = 1/K, i.e. all choices are uniformly distributed over the strategy space and do not depend on the relative expected payoff. Contrariwise, as λ Q → ∞ the QRE prediction converges to the Nash equilibrium.
−LL λ Q is chosen such that it maximizes the log-likelihood of the theoretical choice probabilities of the QRE model given real choice frequencies. The corresponding log-likelihood value is given by −LL.
−LL U ni Expected log-likelihood in case of a uniform distribution of choices.

−LL M ax
Estimated maximum log-likelihood in case of an exact match between the model predictions and the empirical distribution.
When comparing the results for different group sizes of the same contest, one might expect that an increase in group size could make the game more complex since the possible set of opponents expenditures increases, leading thus to a lower lambda in larger groups. 29 We find support for this assumption in the overall periods of the PP contest (D = 18.6, p = 0.00), yet in the first half of the PP contests results are not significant (D = 1.0, p = 0.31). In the WTA contest we find significantly higher λ Q for the five-player treatment ( Using the same approach for a between contest type comparison, we discover that λ Q is significantly higher for the PP contest in all evaluated treatments and subsamples (1 st half threeplayer: D = 185.5, all three-player: D = 509.1, 1 st half five-player: D = 85.4, all five-player: D = 178.1). This suggests that expected payoffs matter less for WTA contest players when deciding on their next investment especially during early periods. Similar to Chowdhury et al. (2014) we find a higher Q-statistic for the PP contest in all treatments suggesting that the estimated QRE model better fits the PP contest expenditures. Expenditures in the WTA contain a higher share of zero expenditures, which may explain the lower goodness of fit.
Stylizing the main results of the QRE model, we find that players across treatments become more sophisticated over time meaning that they increasingly rely on forecasts of opponents' behavior when determining their choices. Additionally, a higher number of players leads to an increase in λ Q for WTA treatments, while the effect in the PP contest is ambiguous. The choice patterns observed in WTA treatments display less sophistication than in the PP contest which might be due to the probabilistic prize assignment that makes it challenging to form sophisticated expectations.