Learning and dropout in contests: an experimental approach

We design an experiment to study investment behavior in different repeated contest settings, varying the uncertainty of the outcomes and the number of participants in contests. We find decreasing over-expenditures and a higher rate of ‘dropout’ in contests with high uncertainty over outcomes (winner-take-all contests), while we detect a quick convergence toward equilibrium predictions and a near to full participation when this type of uncertainty vanishes (proportional-prize contests). These results are robust to changes in the number of contestants. A learning parameter estimation using the experience-weighted attraction (EWA) model suggests that subjects adopt different learning modes across different contest structures and helps to explain expenditure patterns deviating from theoretical predictions.


Introduction
Settings characterized by high uncertainty on outcomes are likely to impair fully rational decisions, which renders learning from previous attempts oftentimes not fruitful. As a consequence, this process may lead to an inefficient allocation of efforts in a trial-and-error search regime, that can easily result in losses (Dosi et al. 2001). A prominent example of such setting is the pharmaceutical industry (Pammolli et al. 2011), where the races for drug discovery lead to enormous investments by companies that are not always fruitful and may induce companies to abstain from investing. 1 Similar industries, characterized by a high investment in research and development (R&D), may vary in the structure of earnings: if the competition is based on incremental innovation, then the successful agents manage to capture a larger market share (Breitmoser et al. 2010). In other cases, the successful competitor gains the whole market similar to a lottery (Sutton 1998), e.g., by patenting new first-in-class drugs. The aim of this study is to identify, by means of experiments, how the payoff structure and the level of competition affect the learning and participation dynamics of agents.
Our candidate setting is the Tullock rent-seeking contest in which subjects compete for a single prize, whose assignment probability depends on the relative share of subjects' efforts (Tullock 1980). In rent-seeking contests, subjects persistently deviate from what standard game theoretical models predict. A survey of the experimental contest literature by Dechenaux et al. (2015) has highlighted that contestants spend on average considerably more than the theoretical equilibrium. 2 Most studies find an overall decrease in expenditures when subjects repeatedly play this game (e.g., Cason et al. 2012), which is usually attributed to learning, without further specifying the process behind it. Another empirical regularity that has not yet received much attention is that many participants choose not to spend any resources to win the contest. By looking across a subset of experimental studies on winner-take-all contests (Abbink et al. 2010;Cason et al. 2012;Mago et al. 2016;Sheremeta 2010;Sheremeta and Zhang 2010;Price and Sheremeta 2011;Sheremeta 2011), we find that zero expenditures are indeed a frequent, sometimes modal, choice of participants. The fraction of zeros is higher in larger competing groups and persistent even in later stages of the experiments. If both stylized facts are the consequence of learning, then we should investigate more carefully this process, forming the main contribution of this paper.
To explore how the uncertainty on outcomes and the number of opponents affect learning strategies and behavioral patterns, such as zero expenditures, in contest settings over a long time horizon, we set up a laboratory experiment in which we compare, over 60 periods, participants' expenditures choices in the standard winnertake-all (WTA) Tullock contest versus a non-probabilistic equivalent proportionalprize (PP) contest.
The WTA contest allows only one winner of the prize, whose winning probability is proportional to the share of own investments over the total group investments. Applications range from the seminal rent-seeking hypothesis by 1 For example, in January 2018 the multinational company Pfizer announced its withdrawal from the R&D intensive Alzheimer market, making hundreds of neuroscience discovery jobs obsolete. It was a setback for the whole sector which permanently suffers launching successful cures for a market affecting approximately 44 million patients worldwide (Crow, D. 2018). Other setbacks in the pharmaceutical industry due to the inefficient allocation of efforts are discussed by Hopkins et al. (2007) and Jones and Wilsdon (2018). 2 Different motivations have been proposed to explain this phenomenon, such as the joy of winning, probability distortion and impulsive behavior (see Sheremeta 2018). Tullock (1980), to political polls (Snyder 1989), sport tournaments (Szymanski 2003), patent races (Fudenberg et al. 1983;Harris and Vickers 1985) and cryptocurrency mining (Dimitri 2017). In the deterministic PP contest, contestants receive a fraction of the prize proportional to their share of total group investments. The PP contest provides a 'replication' of standard oligopoly settings by varying the payoff structure. The early work by Friedman (1958) constitutes a first attempt to use the PP contest to model the allocation of advertisement budget across media. Proportional-prize assignments are also observed in electoral schemes (Schram and Sonnemans 1996), lobbying (Krueger 1974) and labor compensation (Kruse 1992). Under the assumption of risk neutrality, both contest settings are equivalent in terms of equilibrium predictions. Varying the contest type and the group size of three and five contestants, we create a 2 Â 2 experimental design, which allows us to test how styles of learning, and consequently contestant behavior, change in environments characterized by uncertainty over outcomes compared to those with a tighter link between effort and outcome.
We find that the average levels of effort in PP contests are well described by the standard game theoretical predictions. Conversely, in the WTA contests we are unable to distinguish total group expenditures between the two group sizes. The decline of average expenditures in the five-player WTA contests coincides with a significant increase in what we label 'dropouts', i.e., zero expenditures that are not justified either by the myopic best response or weighted fictitious play. Dropouts are instead significantly less frequent in PP contests and, if anything, decrease over time. The distinct expenditure patterns found between the two settings suggest that differences in the contests' payoff structures affect subjects' learning process. As our main contribution to the literature on learning in games, we test this hypothesis by estimating the experience-weighted attraction model (EWA, Camerer and Ho 1999). The results reveal that WTA contestants learn significantly more from their own past payoffs than players in the PP contests (experiential or reinforcement learning Roth and Erev 1995). Moreover, our results support recent findings by Alós-Ferrer and Ritschel (2018) on subjects' frequent use of the reinforcement heuristic 'win-stay, lose-shift' rather than a more reasoned approach based on myopic best response. Therefore, the strong reliance on experiential learning in WTA settings can explain both the decreasing expenditures over time and the increasing propensity of zero expenditures choices. The more often a WTA participant loses, the more she will discourage positive expenditures up to nonparticipation. Further analyses confirm that expenditures decline significantly with an increase in prior accumulated losses. Since experienced losses are more frequent in larger competing groups, expenditures are expected to decrease at a faster rate. These results carry practical relevance for contest designers, who wish to maximize repeated participation, and are comparable to empirical regularities found in industrial dynamics, so-called 'industry shakeouts', in which many firms decide to drop out of an industry during its competitive expansion phase.
Previous experimental studies have explored subjects' behavior in contests, whose design or methods partially overlap with ours by varying the group size (e.g., Lim et al. 2014), the payout function (Chowdhury et al. 2014;Ghosh and Hummel 2018), and the matching protocol (Baik et al. 2015), but no one has so far explored the interaction between treatments varying the group size as well as the outcome uncertainty. More importantly, differences in behavior across contest structures have been proposed to stem from differences in learning (Fallucchi et al. 2013), which has not been rigorously tested yet. Similarly, the high fraction of zero expenditures has often been attributed to myopic best reply without proper analysis. In this paper, we investigate experimentally how expenditures dynamics, such as frequently observed zero expenditure choices, can be explained by distinct learning mechanisms across contest structures.
The paper is organized as follows: Sect. 2 introduces the contest forms and offers a brief review of the related experimental literature on contests. We analyze subjects' behavior in the two contest structures and their equivalence in expected payoff terms. Section 3 presents the experimental design and procedures. The experimental results are presented and discussed in Sect. 4. The first, descriptive, part of our result section highlights differences in group expenditures and in the fraction of zero expenditures. The second part presents the EWA model estimations and shows support for different learning modes across contest games. We conclude in Sect. 5 with a discussion of our findings and highlight how these results may well represent some empirical regularities in winner-take-all settings outside the experimental literature.

Theoretical background and experiments
The Tullock model of rent seeking (Tullock 1980) is extensively used to model a variety of contests (Konrad 2009). In the simplified model, often referred to as winner-take-all (WTA) or lottery contest, N agents compete for a prize of size V, where x i is the amount of expenditure of agent i and X is the aggregate expenditure. The individual profits p i depend on all agents' expenditures, the prize assignment and a homogeneous initial endowment denoted by e: Therefore, the probability of one agent receiving the prize increases with own expenditures, but decreases with the expenditures of others.
In an alternative version of the contest, also known as proportional-prize (PP) or share contest, the prize is not assigned to one agent only, but divided across all agents N proportionally to their own expenditures x i and the aggregate expenditures X. Thus, each agent with positive expenditure receives a share of the prize. The payoff function in this case is equal to: The two contests share the same expected payoffs and, under the assumption of risk neutrality, the same equilibrium predictions, where x Ã i ¼ VðN À 1Þ=N 2 . However, the realized payoff in the WTA contest differs from the one in the PP contest due to the stochastic winner-take-all nature of the game.
As it is often difficult to capture the expenditures' dynamics with field data, laboratory experiments have become increasingly popular in recent years to characterize behavior in different contest settings. 3 Many experiments support pervasive over-dissipation in WTA contests paired with high heterogeneity in effort levels across contestants (e.g., Millner and Pratt 1989;Sheremeta and Zhang 2010;Mago et al. 2016). Based on a sample of 30 studies, Sheremeta (2013) report a median overbidding rate of 72% compared to the equilibrium predictions. Contrary to the WTA contests, PP contests display less variation in individual spending behavior and a quicker convergence over time toward the predicted equilibrium level (Fallucchi et al. 2013;Chowdhury et al. 2014;Cason et al. 2010Cason et al. , 2020. Since it is common to focus on mean expenditures when analyzing overbidding in contests, the choice of zero expenditures has often been overlooked. We summarize the data of seven contest experiments, considering in total of ten independent standard repeated WTA treatments (as specified in Eq. 1). Table 1 shows that zero expenditures are the modal choice in four-player WTA contests making up 12% of the total choices and are increasing over time (see Fig. 1). In the two-player settings, the share of zero expenditures is lower (3.9%) and stable over time. Yet, most of them, especially in later periods, are not a myopic best response to previous opponent choices. We refer to these zero expenditures as 'dropout'. In four-player treatments, on average 50% of the zero expenditures are 'dropouts'-a share that is increasing over time.
The literature offers multiple explanations for overbidding patterns in WTA contests such as: bounded rationality (Lim et al. 2014), heterogeneous preferences (Shupp et al. 2013), and utility from winning (Schmitt et al. 2004). Therefore, zero expenditures are usually associated with the best response to over-dissipation. This cannot be an exclusive explanation, given the collected evidence from prior studies. An alternative motivation that we explore in this paper is that WTA contestants Share of zero expenditures that are not myopic best response (dropout) 3.5% 6.0% Share of dropout over share of zero expenditures 89.7% 50.0% The meta-analysis incorporates data from ten treatments of seven publications using repeated standard WTA contest treatments. Two players: Abbink et al. (2010), Cason et al. (2012). Four players: Mago et al. (2016), Sheremeta (2010), Sheremeta and Zhang (2010), Sheremeta (2011), Sheremeta (2011) choose zero investments, because they encounter difficulties to adapt optimal expenditures levels due to the stochastic nature of the outcome.
We are aware of a handful of studies that analyze learning in repeated games with stochastic outcomes. They differ from our experimental setting and learning identification strategy in many aspects. Yet, they support the use of simple learning heuristics by decision makers. Gunnthorsdottir and Rapoport (2006) find that reinforcement learning (Roth and Erev 1995) explains aggregate efforts in a twostage group game with an inter-group lottery in the first stage. Reinforcement learning combined with directional learning (Selten and Stoecker 1986) describe well individuals' behavior of a Tullock contest with group size uncertainty (Boosey et al. 2017). In addition, learning spillovers between PP and WTA contests are found in a within-subjects experiment by Masiliūnas (2019).
Even though learning behavior in the PP contests has so far only received minor attention in the literature, the expected payoff structure provides a useful benchmark to observe behavior in the standard contest, and allows us to treat them as a special case of the more commonly studied Cournot oligopoly. 4 Evidence on learning from oligopoly experiments suggests that players employ a mix of sophisticated and imitative learning. For example, Bigoni and Fort (2013), with an application of a modified EWA model to a Cournot game under endogenous information disclosure, find that participants use a mixture of reinforcement, imitation and belief learning, with the latter accounting for the major share. 5 Lastly, learning models have been used to explain behavior in repeated auction experiments. From the bidders' perspective, auctions look similarly stochastic to lotteries, since the value of the prize is usually drawn for each bidder from a random distribution, and thus bids submitted by rivals appear uncertain. In addition, overbidding is commonly observed in first-price auctions (Filiz-Ozbay and Ozbay 2007). Experiential and observational learning is found to reduce overbidding in first-price common value auctions (Garvin and Kagel 1994), while directional learning can explain repeated individual bids (Neugebauer and Selten 2006).
Through comparison of the behavior in both contest structures in a laboratory setting, we can identify how the randomness of the outcome inhibits learning, and therefore impacts the expenditures' dynamics that we observe in stochastic environments. Our analysis extends to competition under different group size. The reason for this choice is twofold: firstly, previous evidence has shown that varying the number of active firms in an oligopoly industry has important implications on the level of competitiveness (e.g., Huck et al. 2004); secondly, we check if the increase in the level of competition may exacerbate zero expenditures, as suggested by our meta-analysis, linked to lower earnings in proportional-prize contests or more frequent expected losses in winner-take-all contests.

Experimental design and procedures
The experiment was conducted at the University of Nottingham using the software z-tree (Fischbacher 2007) where 140 students from a wide range of disciplines were recruited through the online recruiting system ORSEE (Greiner 2015). No participant took part in more than one session or had taken part in any previous contest experiments.
At the beginning of each session, participants were randomly matched into groups that remained the same for the whole experiment. We opted for this fixed group matching, as it allows us to test whether participants myopically best respond to the choices of their opponents. 6 Moreover, this is the standard adopted in experiments that test for learning in other settings (e.g., Huck et al. 1999;Bigoni and Fort 2013) and therefore allows us to compare the results of our PP contests with previous evidence of learning in oligopoly settings.
Participants did not know the identities of the other subjects in the room with whom they were grouped. They were given instructions for the experiment (reproduced in ''Appendix C'') which were read aloud by the experimenter. Any questions were answered by the experimenter in private, and no communication between participants was allowed. No information passed across groups during the entire session.
In all sessions, the decision-making part of the experiment consisted of 60 periods. 7 In each period, subjects were endowed with 1000 points and competed to win a prize of 1000 points. Subjects simultaneously chose how many contest tokens to purchase, at the price of one point per contest token, and any points not used to purchase tokens were added to their total balance. At the end of the period, each subject possibly received contest earnings which were added to their total balance. If none of the subjects bought any tokens, the prize was not assigned. We adopt a 2 Â 2 design where treatments differ in the group size, with three (3) or five (5) contestants, 8 and the contest structure, proportional prize (P) and winner-take-all (W). 9 We conducted two sessions for each treatment, either with 15 or 20 subjects, resulting in ten independent observations in treatments with three-player groups and eight independent observations in treatments with five-player groups. A summary of the treatments is reported in Table 2.
After each period, subjects were reminded of their own choice and informed of the total expenditures of the other members of the group to which they belong and their own earnings. We opted for this ''partial'' feedback disclosure to rule out imitative behavior among contestants, although we could not rule out the other simple behavioral rule of imitating the average expenditures by opponents in the previous round. 10 Subjects accumulated points across the 60 periods and at the end of each session were paid in private and cash. Earnings averaged £9.30 for a session lasting about 60 min. At the end of the experiment, we conducted a sociodemographic questionnaire in which we also elicited risk attitude using a survey measure validated in a representative subject pool (see Dohmen et al. 2011). 11 The two contest structures share the same expected payoff and, under the assumption of risk neutrality, the same Nash equilibria. Introducing risk aversion could potentially alter theoretical predictions; however, the direction and extent of risk aversion on contest expenditures remain ambiguous under general conditions (Skaperdas and Baik et al. (2015) show that the fixed matching protocol leads to collusive behavior when groups are equal to 2, but not for larger groups. 9 3P, 5P, 3W and 5W are the treatment notations used throughout the rest of the paper. 10 We check the fraction of players whose choice imitates average opponents' expenditures of previous rounds in ''Appendix B'' Table 9. The fraction of imitation is significantly lower in WTA treatments and not increasing over time. A possible explanation for such difference is that imitating the average in PP contests provides an intuitive strategy that reduces inequity concerns. In WTA treatments imitating the average does not guarantee such outcome and might suppose not having the higher likelihood of winning in the group. 11 Subjects answered the following question on a Likert scale from 1 to 7: 'How willing are you to take risks, in general? Unwilling to take risks (1) Fully prepared to risks (7)'. We do not find significant differences of risk scores across treatments. ''Appendix A'' addresses the relevance of risk attitudes on expenditures in more detail.
Gan 1995; Konrad and Schlesinger 1997). Also in the experimental literature, there seems to exist no consensus regarding the effect of risk attitude on contest expenditures (see for example Shupp et al. 2013;Mago et al. 2013). To be able to compare both contest structures, we thus need to maintain the assumption of risk neutral contestants. The theoretical prediction of symmetric group expenditures under risk neutrality, given by x Ã ¼ NVðN À 1Þ=N 2 , corresponds to 666: 6 points for three-player contests and to 800 points for five-player contests. Hence, predicted equilibrium expenditures at individual level are 222 and 160, respectively.

Results
We lay out the results in two subsections. In Sect. 4.1, we illustrate the spending and participation behavior of contestants of all treatments. In Sect. 4.2, we illustrate how EWA estimation results differ across treatments and analyze expenditures using Tobit mixed effect regressions. Reported p values (p) for within-treatment comparisons between the two halves of the experiment are based on Wilcoxon matched-pairs signed-rank tests, while for between-treatment comparisons we report p values of two-sided Wilcoxon rank-sum tests, treating each group as a single, independent observation.

Group expenditures and participation
Result 1 (a) Average total expenditures in proportional-prize contests increase significantly with an increase in the group size. (b) Average total expenditures in winner-take-all contests, contrary to predictions, do not significantly increase with an increase in the group size. Figure 2 shows the average group expenditure patterns of all treatments relative to their predicted theoretical equilibria. In all cases, the initial expenditures lie substantially above the Nash equilibrium predictions, and are higher in larger groups. In the PP treatments, mean expenditures decline quickly to a level close to 600 800 1000 1200 1400 1600 Average Group Expenditures 1 − 5 6 − 1 0 1 1 − 1 5 1 6 − 2 0 2 1 − 2 5 2 6 − 3 0 3 1 − 3 5 3 6 − 4 0 4 1 − 4 5 4 6 − 5 0 5 1 − 5 5 5 6 − 6 0 Periods P 5 P 3 Nash Equilibrium 3P Nash Equilibrium 5P (a) PP Contest 600 800 1000 1200 1400 1600 Average Group Expenditures 1 − 5 6 − 1 0 1 1 − 1 5 1 6 − 2 0 2 1 − 2 5 2 6 − 3 0 3 1 − 3 5 3 6 − 4 0 4 1 − 4 5 4 6 − 5 0 5 1 − 5 5 5 6 − 6 0 Average group expenditure for all treatments over time the equilibrium and exhibit no noticeable time trend thereafter. This result is in line with previous experimental evidence. Instead, we find the results of the WTA contest surprising: over-expenditures compared to the predicted equilibrium levels are persistent throughout the experiment, but average expenditures do not differ across different group sizes. Moreover, over the longer horizon, total expenditures are lower in larger groups, sometimes even below the level predicted by the Nash equilibrium. We report in Table 3 the average group expenditures for the first, second half and overall periods and p values of within-and between-treatment comparisons. In the PP contests, we find that average total expenditures are significantly higher in 5P than in 3P for all intervals considered (all p 6 0:01). Contrary to the PP contests, between-treatment comparisons of WTA contests at group level confirm the pattern observed in Fig. 2, with an overall similar level of expenditures for all intervals considered (all p > 0:25). 12 This last finding contradicts theoretical results that predict higher group expenditures for larger groups. Previous studies that explored behavior in contests with different group size have supported the theoretical claim, yet considered only one-shot decisions (Anderson and Stafford 2003) or ten repetitions (Lim et al. 2014). 13 Over a longer time horizon, average group expenditures seem to show different dynamics.
Result 2 (a) The fraction of zero expenditures is significantly higher in the winner-take-all contests than in the proportional-prize contests and increases significantly over time for larger winner-take-all groups. Our results question the hypothesis that expenditures in the WTA contests converge toward group size-dependent equilibria. We thus look for other justifications that explain the decrease in expenditures in another common finding from contest experiments: the zero expenditures. To get a first glimpse of the prevalence and dynamics of zero expenditures in contests, we compute the total fraction of zero expenditures across treatments. The total share of zeros increases with group size (3P vs. 5P: p ¼ 0:01, 3W vs. 5W: p ¼ 0:01) and is more pronounced in the WTA contest (3P vs. 3W: p\0:01, 5P vs. 5W: p\0:01) reaching up to 40% in the late game of 5W. Moreover, as shown by the black lines in Fig. 3, the fraction of zero expenditures is stable across time in 3W (periods 1-30 vs. 31-60 p ¼ 0:92) and increases in 5W (periods 1-30 vs. 31-60 p ¼ 0:09). Result 1 can thus be explained by the increasing fraction of zero expenditures in 5W which lead to a faster decrease in average group efforts than in 3W.
The first justification given to the pronounced fraction of zero expenditures in the WTA contests could be that players expect their opponents to overbid. 14 Since we cannot observe players' expectations on future opponent expenditures, we assume that expectations are formed based on past opponent behavior. Thus, we assess if zero expenditures are a best response (BR) given the history of opponents' decisions using two forms of 'weighted-fictitious play'. A choice j of player i in period t þ 1 is justified under weighted fictitious play if j maximizes the following expression: The parameter / acts as a discount factor. If / ¼ 0, then the expression reduces to the myopic best-response case of p i À s j i ; s Ài ðt À 1Þ Á which denotes the hypothetical payoff of player i choosing an expenditures level j given the choices of its opponents s À i at time t (reported as the dashed lines in Fig. 3). At the other extreme, when / ¼ 1, all hypothetical past payoffs from strategy j are weighted equally for each period. In this case, the best choice is the one that would have resulted in the highest average payoff across all rounds played, also known as 'fictitious play' (reported as the gray lines in Fig. 3). Figure 3a-d shows the fraction of zero expenditures over time for each of the four treatments. Most zero choices can neither be justified by myopic best responses nor by fictitious play (average fraction of zeros not justified by myopic bestvresponses: 66% in 3W, 62% in 5W; by fictitious play: 87% in 3W, 81% in 5W). 15 Yet, myopic best responses account for more choices than fictitious play, consistent with the previous findings by Rockenbach and Waligora (2016) that WTA contestants hold myopic beliefs. We hence focus on the zero expenditures that are not explained by a myopic best response and define them as 'dropouts'. In case of a 'dropout', a player decides to spend nothing even if it is payoff maximizing to bid a positive amount based on a myopic best response. The average fraction of dropout per round can be assessed from Fig. 3 as the difference between the fraction of total zero expenditures and the fraction of zero expenditures under a myopic best response.
Result 2 (b) The share of dropouts is higher in winner-take-all contests and increases in the five-player treatment.
To characterize the strength of zero bids on the individual level, we show in Fig. 4 the percentage of periods in which contestants choose to bid zero after their initial zero effort choice. For each treatment, players that display at least one zero bid are ordered based on the frequency of their subsequent zero effort choices in the remaining periods. The frequency of zero efforts varies across contestants implying that we cannot equate period-specific zero expenditure choices with contestants abstaining from the participation throughout the remaining part of the experiment. Yet, for some contestants choosing not to bid becomes an important strategy, especially in the 5-player WTA treatment, where 14 players continue to bid zero in more than half of the remaining periods.
From the previous analysis, we deduce that the common decreasing pattern in average expenditures, which we observe in all treatments, may be driven by different behavior, depending on the contest structure. Although in proportionalprize contests we are far from having the whole contestants managing to achieve the equilibrium level, the decrease in average expenditures hints toward a process of learning to play optimal strategies. 16 Conversely, the decrease of expenditures in the winner-take-all contests can largely be attributed to an overall 'dropout' effect that is not explained by forms of fictitious play. We hypothesize that the differences in the contests' payoff structures have non-negligible effects on the learning process. The results from the EWA estimation in the next section further explore this thought.

EWA model estimation and interpretation
In the previous section, we have shown that the structure and group size of the contest affect total expenditures. Overall, the decrease in expenditures over time suggests that subjects display different behaviors across contests. We provide further insights to verify whether differences across treatments are driven by a Fig. 4 Frequency of individual zero effort choices across treatments 16 The heterogeneity of plays is in line with the previous findings in oligopoly experiments: e.g., Rassenti et al. (2000). different learning path. We estimate for each treatment the EWA model (Camerer and Ho 1999). In our estimations, we group the 1001 expenditure choices into K ¼ 11 bins of equal distance and round all choices to the closest bin to facilitate comparability. 17 Every contestant i forms a set of 'attractions' A j i ðtÞ, which get recursively reinforced or weakened after every round. Attractions are updated as follows: where s j i ðtÞ refers to the actual strategy j chosen by player i in period t, while s Àj i ðtÞ denote all the possible strategies in the same period. Defining s Ài ðtÞ as the strategy vector chosen by all other players, the payoff of player i choosing j in t is given by p i ðs j i ðtÞ; s Ài ðtÞÞ. Similarly, E p i ðs Àj i ðtÞ; s Ài ðtÞÞ Â Ã denotes the hypothetical payoff of player i that would have been expected for any possible strategy j given the strategies of all other players. Since the prize assignment in the PP contest is deterministic, this expression simplifies to p i ðs Àj i ðtÞ; s Ài ðtÞÞ and is equivalent to the expected hypothetical payoff of the WTA contest given risk neutrality. N(t) is a weight on past experience. The faster N(t) increases in t, the less players focus on immediate current payoffs at time t. The weights update with the following rule: The parameter j determines the growth rate of attractions, which reflects how quickly players lock into a strategy. The current attractions of the array of possible strategies J determine the probability of player i choosing strategy j in the next period t þ 1. A logistic transformation links previous attractions to the choice probabilities (Eq. 6). Thus, the higher a contestant's past attraction for a specific strategy, the higher is the probability that this strategy will be pursued.
Attractions for each strategy are updated via weighting the previous experience, the current forgone payoffs, and the current received payoff (given by the summands in Eq. 4). Current forgone payoffs are the payoffs that could have been expected if the contestant had chosen differently by keeping the opponents' strategies fixed. Formation of attractions via evaluating forgone payoffs is equivalent to belief learning (a version of weighted fictitious play, Brown 1951), whereas focusing exclusively on realized payoffs (d ¼ 0) reduces the model to reinforcement learning (Roth and Erev 1995). Thus, the EWA model incorporates two canonical learning models via the parameter delta (d). 18 We show in Table 5 for each treatment the simulation results of the EWA expenditure distribution using varying deltas (0, 0.5, 1) and the true expenditure frequencies. As the delta increases toward belief learning, the simulated choices show less variation and roughly resemble the true expenditure frequencies in the PP contests. The true distribution of expenditures in the WTA contest is more disperse, which is why we assume that these contestants rely less on belief learning than PP contestants.
Result 3 Proportional-prize contests allow for a mixture of reinforcement and adaptive learning. In winner-take-all contests, learning is mostly driven by previous own payoffs.
For each treatment, the EWA model is estimated over the first half and the complete sample via maximum Likelihood. 19 Estimated parameters and their theoretical domains are summarized in Table 4. We refrain from freely estimating initial attractions by following the approach used in Ho et al. (2008) and choose the initial attractions to maximize the likelihood of observing first period choice frequencies. 20 We report in Table 6 the estimated parameters with their clustered standard errors and confidence intervals in parentheses. 21 The main parameter of interest in our estimation analysis is delta, which indicates the degree of belief learning used in the game. In the PP contests, the deltas are significantly greater than zero, between 0.59 and 0.67, suggesting that players adopt a mixture between reinforcement learning (considering own realized payoffs) and belief learning (considering all own hypothetical payoffs) in the game. On the contrary, in WTA contests players rely mostly on reinforcement (experiential) learning, as deltas are significantly closer to zero. This is especially true for the early stage of the game (0.02 for 3W and 0.13 for 5W). As players get more familiar with the game, they shift slightly toward belief learning in all treatments. Nevertheless, the difference between PP and WTA contests remains substantial. One can argue that belief learning is more complex, since it requires the player to evaluate for each possible strategy the expected 18 We use a parameterized version of EWA, since it allows us to explicitly estimate d for each treatment, whereas a single factor EWA is suitable to model average contest choices as in Parco et al. (2005). 19 In the EWA model, we abstain from estimating separately the second half of the game due to possible information loss (the estimation of the second half would require initial attractions of period 31 to reflect the complete knowledge formed in the first half of the game). 20 The traditional EWA model considers only choices shaped by past attractions, while later versions add sophistication to allow for choices based on expectations of opponents' behavior (e.g., Camerer et al. 2002). A sophisticated player forms a best response from forecasting the actions of all other players. Since including additional free EWA parameters increases the danger of overfitting the model, we instead check for sophistication via the quantal response equilibrium (QRE) model of McKelvey and Palfrey (1995). In the QRE model, sophistication is captured via the precision parameter k Q . The higher the k Q , the more sophisticated are the overall choices. We find significantly lower k Q for WTA treatments. The estimation results of the QRE model can be found in ''Appendix D''. 21 Standard errors are clustered at individual level. Group level clustering is not necessary, due to random assignment of individuals to groups (see Abadie et al. 2017). A set of less conservative estimations without clustering adjustments at individual level is proposed in ''Appendix B'' Table 10. payoffs given the opponents' set of strategies. The evaluation of hypothetical scenarios could be more difficult in WTA contests due to the discrepancy between expected and realized payoffs. In fact, it has been shown theoretically that individuals lock-in at inefficient levels of expenditures in low delta scenarios (Pangallo et al. 2017).
Other parameter estimates are similar across various estimations. We find N0 below one in all estimations, which indicates that pre-game attractions are offset completely by first period attractions. 22 Kappa (j), which measures the growth property of attractions, is significantly different from zero in the first half of the game, indicating that the importance of past attractions grows over time. The decay rates of past attractions indicated by phi (/) are significantly different from zero, but similar across and within treatments, on average between 75% in the PP contest and 81% in the WTA contest. Although the difference is negligible, this indicates that players may rely more on past experience in WTA contests. This result is consistent with the idea that present payoffs in these treatments reveal less useful information. Finally, lambda (k) is significantly different from zero for all specifications, indicating that contestants' choices are influenced by attractions formed in past rounds.
The EWA parameter estimation uncovers a noticeable difference in learning between the PP and WTA contests. A strong reliance on own realized payoffs by subjects in WTA contests conveys that current decisions depend on the success of the previous ones. Victories strongly reinforce the probability of playing the corresponding level of expenditures, while losses make the corresponding expenditures level less attractive in the forthcoming periods. If a subject experiences frequent losses, positive expenditures levels will, over time, become less appealing to the advantage of zero expenditures. Following this thought, the fraction of zero

Ã
Weight of pre-game attractions. Indicates how many periods of experience are required to offset pre-game attractions. The boundaries ensure that the weights of N(t) are increasing in t j 2 ½0; 1 Growth property of attractions. If j ¼ 0, then current attractions are a weighted average of past attractions and past payoffs. If j ¼ 1, attractions accumulate and can grow larger than present payoffs / 2 ½0; 1 Decay rate of past observations. The smaller the /, the faster are the past observations discounted and the more is the weight assigned to the present k 2 ½0; 1Þ Sensitivity measure of attractions. The higher the k, the more do present attractions matter to determine choice probabilities of future actions. k ¼ 0 implies that choice probabilities are not influenced by attractions expenditures should be higher in contests with a higher share of non-winners. This reconciles with our results in Sect. 4.1 where the fraction of dropouts is significantly higher in WTA treatments than in PP treatments and increases significantly over time in larger groups. If WTA contestants rely indeed on previous own experiences, then we should observe a drop of expenditures after a series of losses.
Result 4 (a) Winner-take-all contestants significantly decrease expenditures after the accumulation of losses. (b) The effect is more pronounced for bigger groups.
We expect a negative relationship between the series of losses prior to time t and the expenditure level at time t. We assess this relationship, using a set of Tobit mixed effect models. 23 The model assumes that expenditure levels are left censored Table 5 Simulated EWA expenditure distribution and observed expenditure distribution The first three columns show the relative frequency of expenditures for the EWA simulations, the fourth column shows the true expenditure distribution. For each setting, 600 players were simulated and experimental specifications were kept. Each treatment is simulated for d ¼ 0 (reinforcement learning), d ¼ 0:5 (mix between reinforcement and belief learning), and d ¼ 1 (belief learning). All other EWA parameters were kept constant at moderate levels for each simulation (k ¼ 1, / ¼ 0:8, N0 ¼ 0, j ¼ 0:8, initial attractions=0)   at 0 with random effects at individual and group level. In all models, we regress the expenditures at t on previous own expenditures, previous opponents' expenditures (linear and squared) and a variable capturing time trend. 24 We check the relationship between prior losses and current expenditures via the variable loss streak defined as the accumulated, (negative) payoff from consecutive losses prior to time t relative to contestants' endowment. After every incurred loss, the variable decreases by the foregone profits that would have been received by choosing to not spend the endowment. Consequently, a loss streak remains unchanged for zero expenditures. If the contest has been won in the previous period, the contestant's loss streak is reset to zero. Models (1), for 3W, and (4), for 5W, in Table 7 contain only control variables. In both lotteries, the influence of prior expenditures on current choices is positive and significant, while the effect of period advancement is negative and significant. In models (2) and (5) we find a positive and significant effect of the loss streak, indicating that the accumulation of losses indeed leads to a decrease in expenditures. This effect is further analyzed in (3) and (6), where we additionally control for the Multilevel Tobit mixed effects model; standard error in parenthesis. p values: *6 0:10, **6 0:05, ***6 0:01, Obs. is the number of observations contestants' gender. As a an exploratory result, we find that women spend on average more than men 25 . From (3), we find that splitting the loss streak variable with respect to gender does not lead to significant effects on expenditures. Yet when increasing the group size (6), the accumulation of prior losses significantly decreases the expenditures for both genders, for women more sharply than for men. This last result has been similarly observed in experimental tournaments (Buser 2016). The regression results support the claim that decreasing expenditures in the WTA contest are driven by previous lottery outcomes. With the cumulation of losses, contestants tend to lower their expenditures and may, over time, dropout of the contest. 26 Since individual losses accumulate longer when facing more opponents, the decrease of expenditures is more pronounced in 5W. Although loss aversion may be thought as the mechanism that leads to lower expenditures (as shown by Kong 2008;Shupp et al. 2013;Chowdhury et al. 2018), we should be careful in distinguishing the role of loss aversion from repeatedly experiencing losses. As pointed out in many studies summarized by Kermer et al. (2006), individuals overestimate the impact of losses in prospect compared to losses they realize, and learn from experience that losses have a less emotional impact than estimated exante. Therefore, while loss aversion certainly has an impact on expenditures, this is not the only effect that impacts the decline over time. The EWA estimations and the results in Table 7 incorporate in addition the effect of experiencing losses.

Final discussion
Our contribution to the literature is two-pronged. First, we offer a clean comparison of how subjects behave across different contest structures. Similarly to previous studies of the PP contest and oligopolies, we find that the expenditures in this setting converge well to standard predictions. Unlike the PP contests, we observe that group size changes from three to five players do not affect total investments in the WTA contests. Secondly, we assessed the role of learning as one of the possible explanations for behavioral differences across different contest structures.
The behavioral discrepancy between contest types might be connected to the probabilistic prize assignment in the WTA treatment that influences how contestants form their choices. We hypothesize that PP and WTA contestants use distinct learning strategies that may also explain another expenditure peculiarity that tends to be overlooked: the modal choice in WTA contests is oftentimes zero.
We find that varying the group size from three to five players does not affect total investments in WTA contests, in discord with theoretical predictions. The decrease of investments in large-group WTA contests is influenced by an increasing fraction 25 In line with findings from prior contests experiments such as Price and Sheremeta (2015) and Brookins and Ryvkin (2014). 26 To analyze the effect of loss cumulation on the likelihood of zero expenditures, we report a random effects logit regression in ''Appendix B'' Table 11. We find that a prior loss streak significantly increases the probability of zero expenditures in 5W. For 3W, we find that the effect goes in the same direction but is not significant, which might be due to the lower loss cumulation in 3W. of zero expenditures. A substantial share of these zero expenditures is not justified by myopic best responses (or other forms of fictitious play) and defined by us as 'dropout'. Even though the dropout rate is lower for PP contests, the average expenditures converge to theoretical predictions which suggests that spending behavior across contests is formed by distinct learning processes. A parameter estimation of the EWA model in all treatments indicates that WTA contestants decide mostly based on the information gathered from their own realized payoffs. Since success in the WTA contest is stochastic, subjects who base their investment decision entirely on their previous decisions are less able to adapt their strategies in a payoff-optimizing fashion. On the contrary, participants in the PP contests rely on a mixture of own realized payoffs as well as foregone payoffs. This may be facilitated by the deterministic nature of the payoffs in the PP contest. The distinct learning patterns estimated in the two contest environments, do not significantly change over time and are robust to changes in the number of players.
Repeated losses that subjects face in the WTA contests decrease over time the reinforcement of positive expenditure levels and consequentially zero expenditures, irregardless of being a myopic best response, become more appealing. Our regression results add to this thought by showing that the cumulation of prior losses leads to a significant decrease in expenditures more pronounced in bigger groups. The higher dropout rate in the five-player WTA treatment is presumably driven by the faster accumulation of individual losses. As a consequence, an increase in the group size does not necessarily increase total rent seeking.
In a society, full of embedded winner-take-all contests, our results obtain practical relevance. First, it may be beneficial in rent-seeking situations, such as lobbying, that an increase in WTA contestants does not significantly change total rent-seeking effort. On the contrary, in case a high sum of efforts is favored, such as in a philanthropic fund-raising lottery, increasing the pool of participants might not lead to the desired effect if the individual loss probability increases simultaneously. Second, decisions not to invest can be aggregated on a macro-level to the so-called 'industry shakeout', i.e., a significant reduction in the number of active firms during the expansion of new industries (see Gort and Klepper 1982;Klepper 1996Klepper , 1997. One traditional explanation for firm exit dynamics postulates that market participants use Bayesian updating to learn their true ability of operating in the market (Jovanovic 1982;Jovanovic and Nyarko 1995). Thus, firms decide to exit assessing their past performance. Our finding that participants invest less after losses and potentially drop out of the contest due to non-reinforcement of positive payoffs takes a similar line. At the same time, we stress the difficulty of players to form rewarding learning strategies in highly uncertain domains, such as pharmaceutical R&D.
The presented work calls for a better understanding on how WTA contestants learn in highly uncertain environments with large group sizes, such as pharmaceutical R&D, and how their decision-making abilities can be improved. In addition, the observed 'dropout' effect and its relationship to group size and possibly other contest characteristics deserve increased attention to better bridge the gap between experimental findings and theoretical explanations of contestants' behavior.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/.
Funding Open access funding provided by Scuola IMT Alti Studi Lucca within the CRUI-CARE Agreement. Fallucchi acknowledges the support of the Network for Integrated Behavioural Science (Economic and Social Research Council Grant ES/K002201/1). We thank Roman Sheremeta for generously providing the data of previous research. We thank Ennio Bilancini, Marco Pangallo as well as participants at the Contest Conference of the University of East Anglia, the joint seminar of KU Leuven/ IMT Lucca, and the 2018 ESA World Meeting for their useful comments. Finally, we thank three anonymous reviewers and the editor for helpful comments. All errors are the responsibility of the authors.

Appendix
Appendix A: Risk assessment The equilibrium predictions are derived in both contest types under the assumption of risk neutrality. However, the role of risk aversion might alter the expected effort levels and hence the conclusions drawn from a direct comparison between contest types.
Theoretical evidence of the effect of risk aversion on efforts is ambiguous (Konrad and Schlesinger 1997), since results vary with the assumptions on risk aversion, contest success functions and homogeneity of rent seekers (Treich 2010).
Early experimental work by Millner and Pratt (1991) on WTA contests finds lower mean dissipation rates for risk-averse contest groups, not significantly different from risk-neutral equilibrium predictions. Other authors find risk aversion to significantly reduce efforts in WTA contests, but averages remain still above the risk-neutral equilibrium predictions (e.g., Anderson and Freeborn (2010); Mago et al. (2013);Sheremeta (2011)).
A direct comparison of WTA and PP contests does not find risk aversion to significantly drive down efforts (Shupp et al. 2013). In contrast, Cason et al. (2018) find negative effect of risk aversion on effort in various Tullock contests (with additive noise). Therefore, we believe that more empirical support that risk-aversion matters is needed.
To control for risk aversion in our experiment, we ask subjects to evaluate how willing they are to take risks, in general, using a Likert scale from 1 (unwilling to take risks) to 7 (fully prepared to take risks). 27 The average 'Risk score', reported in Table 8, is higher in PP contests than in WTA contests, although not significantly different across treatments. Hence, differences in efforts between treatments of equal group size seem not depend on different risk scores across treatments.   Welcome! You are about to participate in an experiment in the economics of decision making. Please do not talk to any of the other participants until the experiment is over. If you have a question at any time please raise your hand and an experimenter will come to your desk to answer it. The experiment will consist of 60 periods. In each period you will have the chance to earn points. At the end of the experiment each participant's accumulated point earnings from all periods will be converted into cash at the exchange rate of 0.015 pence per point. Each participant will be paid in cash and in private. At the beginning of the experiment, you will be matched with two [four] other people, randomly selected from the participants in this room, to form a group of three [five]. The composition of the group will stay the same throughout the experiment, i.e., you will form a group with the same two [four] other participants during the whole experiment. Your earnings will depend on the decisions made within your group, as described below. Your earnings will not be affected by decisions made in other groups. All decisions are made anonymously and you will not learn the identity of the other participants in your group. Random effects logit regression; standard error clustered at player level in parenthesis. p values: *6 0:10, **6 0:05, ***6 0:01, Obs. is the number of observations. All effort related regressors are normalized (divided by 1000)

Decision task in each period
Each period has the same structure. In each period, the three [five] participants in each group will be competing for a prize of 1000 points.
At the beginning of the period, each participant will be given an endowment of 1000 points. Each participant has to decide how many of these points they want to use to buy ''contest tokens''. Each contest token costs 1 point, so each participant can purchase up to 1000 of these tokens. Any part of the endowment that is not spent on contest tokens is kept by the participant. Each participant must enter his or her decision via the computer. An example screenshot is shown below (Fig. 5).
Once everybody has chosen how many contest tokens to purchase, the computer will calculate each participant's share of the prize of 1000 points. Your share of the prize will depend on how many contest tokens you have purchased and the total number of contest tokens purchased in your group.
[In proportional-prize contest description] If nobody in your group purchases any contest tokens, none of you will receive a share of the prize. Otherwise, the computer will calculate each participant's share of the prize so that your share of the prize will be equal to the number of contest tokens that you have purchased divided by the total number of contest tokens purchased in your group. That is, if you buy a number of X contest tokens and if the other two participants in your group buy Y and Z contest tokens each, then your share of the prize will be X=ðX þ Y þ ZÞ. [That is, if you buy a number of V contest tokens and if the other four participants in your group buy W, X, Y and Z contest tokens each, then your share of the prize will be V=ðV þ W þ X þ Y þ ZÞ.] Your contest earnings will be your share times 1000 points (rounded to the nearest point). [In winner-take-all contest description] If nobody in your group purchases any contest tokens, none of you will win the prize. Otherwise, the computer will determine which participant wins the prize in a way that will ensure that the probability that you will win the prize is equal to the number of contest tokens that you have purchased divided by the total number of contest tokens purchased in your group. That is, if you buy a number of V contest tokens and if the other two participants in your group buy W and X contest tokens each, then the probability that you win the prize will be V=ðV þ W þ XÞ. [That is, if you buy a number of V contest tokens and if the other four participants in your group buy W, X, Y, Z contest tokens each, then the probability that you win the prize will be V=ðV þ W þ X þ Y þ ZÞ.] Your contest earnings will be either 0 (if you do not win the prize), or 1000 (if you win the prize).
Your point earnings for the period will be calculated as follows: Point earnings = 1000 -contest tokens purchased ? contest earnings. After all participants have made a decision, a result screen will appear. An example screenshot is shown below. This is like the screen you will see during the experiment except that the blacked out fields will be filled in according to the decisions made in that round (Fig. 6).
Each participant will be informed of the points remaining from their endowment after making their purchase, the number of contest tokens they have purchased, the sum of tokens purchased by the other participants in their group, their contest earnings and their point earnings for the period. In addition, the results screen will inform each participant of his or her accumulated points from all periods so far.
Beginning of the experiment If you have any questions please raise your hand and an experimenter will come to your desk to answer it.
We are now ready to begin the decision-making part of the experiment. Please look at your computer screen and begin making your decisions.

Appendix D: QRE model
In result Sect. 4.2, we report that choices made by WTA contestants rely more on own past experience than the ones of PP contestants. This appendix adds to the behavioral differences across treatments by exploring the degree of contestants' 'sophistication' using the homogeneous quantal response equilibrium (QRE) model of McKelvey and Palfrey (1995). Sophisticated players base their choices on the evaluation of expectations on future actions of opponents (Camerer et al. 2002). The estimation results complement the EWA findings: sophistication increases over time, yet PP contestants act more sophisticated than WTA contestants.
Let pðx k Þ denote the probability of choosing the k th bin x k for an arbitrary player. Let E p ½pðx k Þ be the expected payoff from choosing bin x k conditional on all N À 1 opponents playing the mixed strategy p over the K bins. 28 The K probabilities are the solutions of a logit QRE specification in Eq. 7. k Q 2 ½0; 1Þ is called precision parameter. While k in the EWA model measures the impact of past attractions, k Q measures the impact of expectations on choice probabilities. Table 12 describes all reported QRE parameters. The estimation results in Table 13 show that the WTA treatments exhibit a lower value of k Q , indicating that players' choices depend less on payoff expectations given the anticipation of opponents mixed strategies and are, therefore, on average, less sophisticated.
We use likelihood ratio tests to verify heterogeneity in k Q within and between treatments. To test whether the impact of expected payoffs on choice changes over time, we compare the log likelihood of a constant k Q across all periods with the log likelihood of a differing k Q between the two halves of the session (Eq. 8).
Similarly, for between-treatment comparisons (3W-5W, 3P-5P, 3P-3W and 5P-5W), we take for each k Q the sum of the log likelihood of both treatments and find the new optimal k Q and its corresponding log likelihood (Eq. 9). We then compare the resulting LL T1þT2 with the sum of the log likelihoods from estimating separate k Q parameters for separate treatments (LL T1 þ LL T2 ). Under the null hypothesis that 28 To be coherent with the EWA formulation, we use 11 bins of equal distance. k Q is the same across treatments, both test statistics are asymptotically Chi-squared distributed with 1 degree of freedom. The summary of all likelihood ratio tests can be found in Table 14.
Estimations show a significant increase in the precision parameters over time: the estimated k Q in 3P increases from 0.33 in the first half to 0.76 in the second half (D ¼ 109:8, p ¼ 0:00). Comparable increases are observed for all other treatments (5P: from 0.31 to 0.51 (D ¼ 49:3, p ¼ 0:00), 3W: from 0.09 to 0.14 (D ¼ 18:8, p ¼ 0:00), 5W: from 0.16 to 0.24 (D ¼ 35:0, p ¼ 0:00). k Q Sensitivity measure of attractions. The higher k Q , the more best responses rely on forecasts of opponents' behavior. For k Q ¼ 0; the solution is pðx k Þ ¼ 1=K, i.e., all choices are uniformly distributed over the strategy space and do not depend on the relative expected payoff. Contrariwise, as k Q ! 1; the QRE prediction converges to the Nash equilibrium ÀLL k Q is chosen such that it maximizes the log likelihood of the theoretical choice probabilities of the QRE model given real choice frequencies. The corresponding log-likelihood value is given by ÀLL ÀLL Uni Expected log likelihood in case of a uniform distribution of choices ÀLL Max Estimated maximum log-likelihood in case of an exact match between the model predictions and the empirical distribution Q Q-statistic ( LLÀLLUniform LLMaxÀLLUniform ) measures the fit of the QRE model according to Lim et al. (2014). Q ¼ 0 if the best fit is given by the uniform distribution, and Q ¼ 1 if the model predicts perfectly the empirical distribution k Q is the precision parameter of the QRE model which maximizes the log likelihood (LL) to observe the relative frequency of actual choices. See Table 12 for further parameter explanations On comparing the results for different group sizes of the same contest, one might expect that an increase in group size could make the game more complex since the possible set of opponents' expenditures increases, leading thus to a lower lambda in larger groups. 29 We find support for this assumption in the overall periods of the PP contest (D ¼ 18:6, p ¼ 0:00), yet in the first half of the PP contests results are not significant (D ¼ 1:0, p ¼ 0:31). In the WTA contest, we find significantly higher k Q for the five-player treatment (1 st half: D ¼ 42:3, p ¼ 0:00, all: D ¼ 87:0, p ¼ 0:00).
Using the same approach for a between-contest type comparison, we discover that k Q is significantly higher for the PP contest in all evaluated treatments and subsamples (1 st half three-player: D ¼ 185:5, all three-player: D ¼ 509:1, 1 st half  Chowdhury et al. (2014), we find a higher Q-statistic for the PP contest in all treatments suggesting that the estimated QRE model better fits the PP contest expenditures. Expenditures in the WTA contain a higher share of zero expenditures, which may explain the lower goodness of fit.
Stylizing the main results of the QRE model, we find that players across treatments become more sophisticated over time, meaning that they increasingly rely on forecasts of opponents' behavior when determining their choices. Additionally, a higher number of players leads to an increase in k Q for WTA treatments, while the effect in the PP contest is ambiguous. The choice patterns observed in WTA treatments display less sophistication than in the PP contest which might be due to the probabilistic prize assignment that makes it challenging to form sophisticated expectations.