1 Introduction

Over the last three decades, the literature on behavioural economics has proposed a number of models to explain various anomalies that can hardly be organised by the standard equilibrium approach. In the context of games, these models consider alternative preferences, traits and/or rationales which relative explanatory powers have been assessed with laboratory experiments (see e.g., McKelvey & Palfrey, 1995; Selten & Chmura, 2008; Costa-Gomes et al., 2009; Crawford, 2013). While such horseracing approach documents the models’ relative goodness-of-fit performances and helps determining a ‘best model’, it leaves unanswered the question of whether the estimated models are indeed consistent with the restrictions they impose on individuals’ behaviour. This article presents a novel approach and a specification test to address this question in the context of symmetric binary-choice participation games such as market-entry games, volunteer’s dilemmas, discrete step-level public good and voter participation games. It contributes to the existing literature on this issue in two ways.

First, it provides a useful theory-based selection criterion for models which explanatory powers can hardly be assessed otherwise than by their goodness-of-fit. This is the case with the Quantal Response Equilibrium model (QRE, McKelvey & Palfrey, 1995), a stochastic version of the Nash equilibrium that assumes players to best-respond to their own and to the others’ payoff disturbances and which predictions hinge upon the distributional properties of these errors (see Goeree et al., 2016, and the references therein). This model has proven remarkably successful with fitting the data of numerous experiments but its reliance on players’ unobservable payoff disturbances has raised concerns about its falsifiability, see Haile et al. (2008). Goeree et al. (2005) addressed such concerns by determining restrictions on these disturbances to bracket QRE’s falsifiability (see Goeree et al., 2016, for further discussion on this topic). Golman (2011) deals with this problem in the context of heterogeneous agents and provides conditions under which the behaviour of the representative agent of a pool of individuals may be rationalised by QRE. These conditions determine whether the aggregation of agents’ payoff disturbances fulfils the i.i.d. assumption on which QRE builds, and they yield useful predictions for asymmetric binary-choice games by restricting the set of QRE-consistent choice frequencies. On the other hand, Melo et al. (2019) check whether players’ behaviour in multiple games is consistent with the QRE hypothesis. Their procedure exploits a set of restrictions on agents’ choices in different games and on these games’ payoffs. It is also nonparametric in the sense that it does not require the distribution of payoff disturbances in a particular game to be specified.

Unlike these investigations which pertain to QRE settings, ours exploits the behavioural restrictions imposed by a symmetric model on individuals’ participation rates only and therefore allows the comparison of different models, including QRE.

Second, it permits an assessment of a model’s consistency with the assumption of ‘cluster-heterogeneity’, whereby individuals with common characteristics (e.g., their participation rates) are clustered together and share a common model-parameter to be estimated. It thus alleviates the problem of modelling heterogeneity, which typically raises questions about which sort of relaxation of “common knowledge” assumption(s) about what agents believe about others can be used and which still allow one to ‘close’ the model.Footnote 1 Rogers et al. (2009), for example, develop QRE models where heterogeneity is modelled either in terms of common knowledge beliefs about others’ traits (as in Camerer et al., 2016) or of subjective beliefs, i.e., each player believes that the others’ traits are i.i.d. from the same distribution as her/his own, which is assumed private information (as in Armantier & Treich, 2009).Footnote 2

Although making these modifications shows that assuming heterogeneity considerably improves the model’s goodness-of-fit, it also heightens the question of the model’s falsifiability since the presumed beliefs about others’ behaviour remain difficult to assess. Our approach does not require additional behavioural assumptions about one’s own or others’ behaviour since it is based on observables, e.g., the players’ participation rates; and it allows one to determine how much cluster-heterogeneity a symmetric model can tolerate to remain consistent with the restrictions it imposes on individual behaviour. While the symmetric assumption provides valuable normative predictions for policy recommendations such as the design of markets, contracts and/or bargaining legislations, it is rather unrealistic and thus restrictive. By considering an observable cluster-heterogeneity rather than a hypothetical heterogeneity in the players’ beliefs, we can better assess a model’s predictions and possibly broaden its range of applications.

We assess our approach with new data on market-entry games of complete information. These games suit well our case since they involve fairly straightforward incentives and may account for a relatively large number of players (which is needed for studying cluster-heterogeneity). These games have also been widely studied in the social sciences and laboratory experiments typically indicate that participants somehow manage to behave almost optimally since their participation rates often even out the expected profits from entry and from no entry (Ochs, 1990; Sundali et al., 1995; Zwick & Rapoport, 2002).Footnote 3 This observation was first coined as ‘magic’ (Kahneman, 1988; Meyer et al., 1992; Rapoport, 1995), and subsequent experiments have put in perspective the roles of reinforcement learning processes (e.g., Erev & Rapoport, 1998; Rapoport et al., 1998, Duffy & Hopkins, 2005; Erev et al., 2010) and other behavioural traits like probability misperception (Rapoport et al., 2002) and overconfidence (Camerer & Lovallo, 1999). Goeree and Holt (2005) examine these and other participation games from a QRE perspective (with a Logit error structure, i.e., the Logit-QRE) and determine conditions to observe under- or over-participation.

We study these market-entry games through the lens of two stationary behavioural models: the ‘Exploration versus Exploitation’ dilemma (EvE) outlined in Nadal et al. (1998), Weisbuch et al. (2000), Kirman (2011) and Bouchaud (2013) and which essentially entails a trade-off between maximising current and future profits, or through that of Impulse Balance Equilibrium (IBE, Selten et al., 2005) which balances off the foregone expected payoffs associated to each possible choice. The details of these models are discussed in the next section and we highlight here two of their properties that motivate our experimental investigation. First, EvE is structurally equivalent to Logit-QRE and thus directly relates to the predictions of Goeree and Holt (2005)—in brief, ‘Exploration’ in EvE corresponds to a ‘purely random behaviour’ in QRE, ‘Exploitation’ corresponds to a ‘best-responding behaviour’, and any mix of these two options corresponds to a ‘stochastic best-responding behaviour’. Second, despite their different premises, EvE and IBE fit observed entry probabilities equally well in the range of EvE-consistent choice frequencies. And since the range of IBE-consistent choice frequencies is larger, the usual ‘goodness-of-fit horseracing’ would document nothing more than occurrences where IBE outperforms EvE and is therefore not pursued.Footnote 4 Given these properties, we focus analysis on the models’ relative success with consistently organising behaviour in treatments that manipulate payoff levels (i.e., ‘High’ or ‘Low’) and payoff structures (i.e., with payoffs from entry depending on attendance in various ways). In addition, we document the sensitivity of our conclusions to the econometric procedures used, i.e., with(out) imposing symmetry, with(out) assuming homoscedastic errors, and with(out) regularisation of the errors’ variance matrix.

We summarise our experimental findings in the following four points. First, imposing symmetry (as is usually done in the literature) yields significant IBE-estimates that are of similar magnitudes across aggregation levels (i.e., session or pooled data) and EvE-estimates that are either insignificant or that bear little consistency across aggregation levels. Second, relaxing symmetry and using OLS estimation methods leads the specification test to reject EvE with cluster heterogeneity less often than IBE no matter the payoff level or structure (17% vs 42% of all sessions). However, when considering the models’ non-rejected specifications, EvE typically yields insignificant cluster-estimates, and most of its multi-clustered specifications are over-parametrised, i.e., their cluster-estimates are not significantly different from each other. This is not the case for IBE which, in addition, can rationalise the presence or absence of clusters of players with low participation rates. Third, these patterns hardly change when the estimations pertain to the second half of the experiments to account for participants’ experience of play. Fourth, when estimating the models with more efficient econometric procedures with(out) regularisation, the EvE-specifications become more likely to be rejected (25% vs 17% of all sessions) and yield less insignificant cluster-estimates. Yet, most of the non-rejected EvE-specifications are still over-parametrised whereas our conclusions for IBE are hardly affected. In sum, our study indicates that IBE yields more consistent estimates than EvE when symmetry is imposed and that it accommodates cluster-heterogeneity better than EvE when it is relaxed.

The next section presents the EvE and IBE models for market-entry games. Section 3 lays out the econometric procedures and our specification test for this class of binary-choice games. The experimental design and procedures are presented in Sect. 4. Section 5 reports the estimation results when symmetry in the players’ choices is imposed and when it is relaxed. Section 6 concludes.

2 Two stationary models of market-entry games

Assume \(n\) agents who independently decide whether to enter a market or not. Agent \(i\)'s decision is represented by a variable \(d_{i}\) that takes the value \(1\) if she enters and \(0\) if not. The payoff from not entering is constant and equal to \(H\), whereas the one from entering is a function \(G\left( \cdot \right)\) of the number of entrants \(A = \sum_{i} d_{i}\). A congestion problem typically arises if for some integer value \(c < n\), we have \(G\left( A \right) \ge H\) if \(A \le c\) and \(G\left( A \right) < H\) if \(A > c\). With such a reward scheme, any vector of decisions \(d\) such that exactly \(c\) out of \(n\) agents choose to enter constitutes a pure Nash equilibrium. There are exactly \(\left( {\begin{array}{*{20}c} n \\ c \\ \end{array} } \right)\) such equilibria, each yielding an aggregate payoff equal to \(cG\left( c \right) + \left( {n - c} \right)H\).

There may also exist symmetric mixed-equilibrium strategies, i.e., that equalize an agent's expected payoff from entering, \(\pi^{E}\), to that from not entering, \(\pi^{NE} = H\). That is, if \(p\) stands for the common probability of entry, then an equilibrium probability \(p^{Nash}\) solves:

$$\pi^{E} \left( p \right) \equiv \mathop \sum \limits_{k = 0}^{n - 1} \left( {\begin{array}{*{20}c} {n - 1} \\ k \\ \end{array} } \right)p^{k} \left( {1 - p} \right)^{n - 1 - k} G\left( {k + 1} \right) = H \equiv \pi^{NE}$$
(1)

where \(k\) is a realization of the random variable \(K\) characterizing the number of entrants other than oneself. Note that (1) requires that the \(n\) agents behave symmetrically in that they all choose to enter with the same probability \(p\)—clearly, one could also consider asymmetric mixed-equilibria in which some agents enter with commonly known probabilities. For reasons that will become clear in Sect. 3, it is convenient to rewrite this expression as being conditional on \(p_{ - i}\), the \(n - 1\) vector of entry probabilities for agents other than agent \(i\)Footnote 5:

$$\pi_{i} \left( {d = 1{|}p_{ - i} } \right) = \mathop \sum \limits_{k = 0}^{n - 1} P[k\, go| p_{ - i} ]G\left( {k + 1} \right).$$

2.1 Exploration versus Exploitation: EvE

In this framework, agents aim at finding a compromise between maximizing their current payoff and keeping themselves informed about market conditions to maximize their future payoffs. In our context, we can think of changing market conditions driven by agents’ irregular or stochastic entry behaviour. In this case, agents may find it worthwhile to sometimes explore the alternative option, i.e., entering or not entering the market. While the ‘exploitation’ part of the dilemma, i.e., the maximization of current payoffs, is straightforward, the ‘exploration’ part hinges upon the maximum entropy principle which captures the agent's information seeking behaviour (see Anderson et al., 1992).Footnote 6 In brief, an agent seeking maximal information from her/his decisions would explore each alternative with equal probabilities so that entropy is maximized whereas an agent who does not seek information would clearly avoid exploring and would focus on maximizing current payoffs, so the weight on entropy is minimized. This framework was first used by Nadal et al. (1998) for the study of buyer–seller interactions and we adapt it here for the analysis of market-entry games.

Denote agent \(i\)'s probability of entry by \(p_{i}\) and that agent’s expected payoff from entry in terms of the probabilities of entry of the \(n - 1\) other agents by \(\pi^{E} \left( {p_{ - i} } \right)\). Using Shannon’s measure of entropy \(S_{i} = - p_{i}\) ln \(p_{i} - \left( {1 - p_{i} } \right){\text{ln}}\left( {1 - p_{i} } \right)\) with \(p_{i}\) neither 0 nor 1, the agent's objective function to maximise is then given by:

$$\begin{aligned} \pi_{i} =\, & p_{i} \pi^{E} \left( {p_{ - i} } \right) + \left( {1 - p_{i} } \right)H + \sigma S_{i} \\ =\, & p_{i} \pi^{E} \left( {p_{ - i} } \right) + \left( {1 - p_{i} } \right)H + \sigma \left[ { - p_{i} {\text{ln}}p_{i} - \left( {1 - p_{i} } \right)\ln \left( {1 - p_{i} } \right)} \right]. \\ \end{aligned}$$
(2)

where \(\sigma \ge 0\) is a parameter capturing the weight that agent \(i\) assigns to the preservation of information about market conditions for long term profits. Differentiating this expression with respect to \(p_{i}\), we obtain the following first-order condition formaximisation:

$$\pi^{E} \left( {p_{ - i} } \right) - H + \sigma \left[ { - {\text{ln}}p_{i} + \ln \left( {1 - p_{i} } \right)} \right] = 0,$$

or equivalently (with \(\sigma = 1/\lambda\))

$$p_{i} = \frac{1}{{1 + {\text{exp}}\left\{ { - \lambda \left[ {\pi^{E} \left( {p_{ - i} } \right) - H} \right]} \right\}}}.$$
(3)

This yields a system of \(n\) equations if there are \(n\) agents, and given the homogenous weighting parameter \(\lambda\), this should be solved for the vector \(p^{*} = \left( {p_{1} ,p_{2} , \ldots ,p_{n} } \right)\). Under the assumption of symmetry, \(p_{ - i}\) has all its components equal to \(p_{i}\), which we simply denote by \(p\), and thus \(p\) and \(\lambda\) are related by:or equivalently

$$p = \frac{1}{{1 + {\text{exp}}\left\{ { - \lambda \left[ {\pi^{E} \left( p \right) - H} \right]} \right\}}}$$
$$\lambda = \frac{{{\text{ln}}\frac{p}{1 - p}}}{{\pi^{E} \left( p \right) - H}}.$$
(4)

Note that this exactly matches McKelvey and Palfrey's definition of a Logit-QRE (with \(\lambda\) standing for the agents’ homogenous ‘best-responsiveness’) so that the models are structurally equivalent if agents’ payoff shocks in QRE are extreme-value i.i.d and if EvE assumes Shannon’s entropy measure.Footnote 7 Thus, if rational agents behave symmetrically and do not explore, then \(p\) is such that \(\pi^{E} \left( p \right) = H\), i.e., \(p = p^{Nash}\) and \(\lambda \to \infty\). On the other hand, if they maximise exploration, then they choose \(p\) such that \(p = 1 - p = 0.5\), so that \(\lambda \to 0\). If \(p > 0.5\), \(\lambda\) is positive if \(\pi^{E} \left( p \right) > H\) and it is negative (theory-inconsistent) otherwise. The Maximum Likelihood estimate of \(p\), assuming independent observations, is the relative frequency of entry, \(\overline{d}_{n}\), and the Maximum Likelihood Estimator (MLE) of \(\lambda\) follows from (4). Note that \(\overline{d}_{n}\) remains a statistically consistent estimator for \(E\left( d \right) = p\) for less restrictive covariance structures of the observations, by various flavours of the weak laws of large numbers.

2.2 Impulse Balance Equilibrium: IBE

IBE basically assumes that if at some stage an alternative option would have yielded a higher payoff, then the agent receives an impulse to use this alternative in the next stage, i.e., agents only take account of foregone payoffs, as in Learning Direction Theory (Selten & Buchta, 1999). It is defined as the long run outcome of such stage-to-stage behaviour. In the context of market-entry games, an agent receives an impulse for entry if the payoff received from not entering is smaller than that from entering. Denoting by \(I\) the number of other entrants and by \(p\) the common probability of entering the market, the expected magnitude of these impulses for entry is defined as:

$$\begin{aligned} IMP^{E} \left( p \right) =\, & E\left[ {G\left( {K + 1} \right){\mathbb{I}}_{{\{ G\left( {K + 1} \right) > H\} }} } \right] \\ =\, & \mathop \sum \limits_{k = 0}^{n - 1} \left( {\begin{array}{*{20}c} {n - 1} \\ k \\ \end{array} } \right)p^{k} \left( {1 - p} \right)^{n - 1 - k} G\left( {k + 1} \right){\mathbb{I}}_{{\{ G\left( {k + 1} \right) > H\} }} \\ \end{aligned}$$

or equivalently in terms of \(p_{ - i}\) rather than \(p\):

$$IMP^{E} \left( {p_{ - i} } \right) = { }\mathop \sum \limits_{k = 0}^{n - 1} P{[}k {\,\text{go}} {|}p_{ - i} ]G\left( {k + 1} \right){\mathbb{I}}_{{\left\{ {G\left( {k + 1} \right) > H} \right\}}} .$$
(5)

Similarly, an agent receives an impulse for no entry if the payoff received from entering is not larger than that from not entering. The expected magnitude of these impulses for no entry is defined as:

$$\begin{aligned} IMP^{NE} \left( p \right) =\, & H.P\left[ {G\left( {K + 1} \right) < H} \right] \\ =\, & H\left[ {1 - \mathop \sum \limits_{k = 0}^{n - 1} \left( {\begin{array}{*{20}c} {n - 1} \\ k \\ \end{array} } \right)p^{k} \left( {1 - p} \right)^{n - 1 - k} {\mathbb{I}}_{{\{ G\left( {k + 1} \right) > H\} }} } \right] \\ \end{aligned}$$

or equivalently

$$IMP^{NE} \left( {p_{ - i} } \right) = H\left[ {1 - \mathop \sum \limits_{k = 0}^{n - 1} P{[}k {\,\text{go}} {|}p_{ - i} ]{\mathbb{I}}_{{\left\{ {G\left( {k + 1} \right) > H} \right\}}} } \right].$$
(6)

Note that these impulses are defined relatively to the game's maximin pure strategy of not entering the market which yields a sure payoff of \(H\). Selten and Chmura (2008) further observe that receiving a payoff lower than this sure payoff should be perceived as a loss. To this extent, and in the light of empirical and experimental evidence of loss aversion in agents' preferences (Bernatzi & Thaler, 1995; Tversky & Kahneman, 1991), we follow Ockenfels and Selten (2005) and define an IBE for this market-entry game such that agent \(i\) is indifferent between ‘receiving \(IMP^{E} \left( {p_{ - i} } \right)\) and entering’ and ‘receiving \(\kappa IMP^{NE} \left( {p_{ - i} } \right)\) and not entering’, where \(\kappa > 0\) stands for an impulse weight. That is, agent \(i\) would choose to enter the market with probability \(p_{i}\) that equalises her expected weighted impulses:

$$p_{i} IMP^{E} \left( {p_{ - i} } \right) = \left( {1 - p_{i} } \right)\kappa IMP^{NE} \left( {p_{ - i} } \right)$$
(7)

This impulse balance equation characterizes a long-run IBE in which participants do no more react to the expected impulses they receive. We could of course consider a short-run IBE, i.e., that would solve \(IMP^{E} \left( {p_{ - i} } \right) = \kappa IMP^{NE} \left( {p_{ - i} } \right)\), but the resulting IBE for agent \(i\) would then be independent of \(p_{i}\) and, as shown in the next section, this would considerably limit the scope of our study.

Finally, unlike Selten and Chmura (2008) who assume \(\kappa = 2\), we estimate the impulse weight \(\kappa\) by Maximum Likelihood (as for EvE) so the estimator of \(p\) is \(\overline{d}_{n}\) and the MLE of \(\kappa\) (assuming symmetry and \(p \ne 1\)) follows from

$$\kappa = \frac{{pIMP^{E} \left( p \right)}}{{\left( {1 - p} \right)IMP^{NE} \left( p \right)}}.$$
(8)

3 A specification test: the \({{\varvec{\Sigma}}}\)-test

When we assume symmetry, the models we consider only propose a reparametrization \(\lambda \left( p \right)\) for EvE and \(\kappa \left( p \right)\) for IBE. Thus, under symmetry, there is no scope for discriminating between these models beyond commenting on implausible values of \(\lambda \left( p \right)\) and \(\kappa \left( p \right)\). If we do not impose symmetry, then (3) and (7) can be rewritten as systems of linear restrictions on parameters \(\lambda\) and \(\kappa\):

$$\ln \frac{{p_{i} }}{{1 - p_{i} }} - \lambda \left[ {\pi_{i}^{E} \left( {p_{ - i} } \right) - H} \right] = 0\quad {\text{for}}\quad i = 1, \ldots ,n$$
(9)

and

$$p_{i} IMP^{E} \left( {p_{ - i} } \right) - \left( {1 - p_{i} } \right)\kappa IMP^{NE} \left( {p_{ - i} } \right) = 0\quad {\text{for}}\quad i = 1, \ldots ,n$$
(10)

Both systems can thus be written in the form \(y\left( p \right) - \theta x\left( p \right) = g\left( {p,\theta } \right) = 0\), with \(\theta = \lambda\) or \(\kappa\), and with \(y\), \(x\) and \(g\) vector functions with values in \({\mathbb{R}}^{n}\). The proposed formulation of (9) and (10) in terms of \(p_{ - i}\) makes it possible to express the EvE or IBE model for homogenous players—in the sense that they share a common single parameter—while still allowing for possibly different individual entry probabilities, and to design a specification test. A further possibility we shall explore is to allow for cluster-heterogeneous players, i.e., players with similar characteristics (e.g., entry-probabilities) whom the model considers identical by assigning them the same parameter. In this case, θ is a vector instead of a scalar and the length of the vector directly affects the power of the \({\Sigma }\)-test since a vector of length \(n\) represents full heterogeneity and leads to never rejecting the null of consistency.

Given the asymptotically normal estimator \(\hat{p}_{T}\) of \(p\), the vector of individual entry frequencies, with asymptotic variance \(V\) of which we describe a consistent estimator \(\hat{V}_{T}\) in Appendix III.A, an optimal asymptotic least squares estimator of \(\theta\) isFootnote 8:

$$\begin{aligned} \hat{\theta }_{T} =\, & {\text{arg min}}_{\theta } \,g^{\prime}\!\left( {\hat{p}_{T} ,\theta } \right)\hat{S}_{T}^{ - 1} g\left( {\hat{p}_{T} ,\theta } \right) \\ =\, & \left[ {x^{\prime}\!\left( p \right)\hat{S}_{T}^{ - 1} x\left( p \right)} \right]^{ - 1} x^{\prime}\!\left( p \right)\hat{S}_{T}^{ - 1} y\left( p \right) \\ \end{aligned}$$
(11)

with \(\hat{S}_{T}\) converging to

$$S = \frac{{\partial g\left( {p,\theta } \right)}}{\partial p^{\prime}}V\frac{{\partial g^{\prime}\!\left( {p,\theta } \right)}}{\partial p}.$$

\(\hat{\theta }_{T}\) is thus the GLS estimator in the regression of \(y\left( p \right)\) on \(x\left( p \right)\), the variance of the error term being \(S\).

Given a preliminary estimate of \(\theta\), say \(\tilde{\theta }_{T}\) obtained by replacing \(\hat{S}_{T}\) in (11) with the identity matrix, i.e., \(\tilde{\theta }_{T}\) is the OLS estimator in the regression of \(y\left( p \right)\) on \(x\left( p \right)\), a consistent estimator of \(S\) is:

$$\hat{S}_{T} \left( {\hat{p}_{T} ,\tilde{\theta }_{T} } \right) = \frac{{\partial g\left( {\hat{p}_{T} ,\tilde{\theta }_{T} } \right)}}{\partial p^{\prime}}\hat{V}_{T} \frac{{\partial g^{\prime}\!\left( {\hat{p}_{T} ,\tilde{\theta }_{T} } \right)}}{\partial p} .$$

The asymptotic variance of \(\hat{\theta }_{T}\) is given by \(V_{asy} \left( {\hat{\theta }_{T} } \right) = \left[ {x^{\prime}\!\left( p \right)S^{ - 1} x\left( p \right)} \right]^{ - 1}\) and a consistent estimator is \(\widehat{{V_{asy} \left( {\hat{\theta }_{T} } \right)}} = \left[ {x^{\prime}\!\left( {\hat{p}_{T} } \right)\hat{S}_{T}^{ - 1} x\left( {\hat{p}_{T} } \right)} \right]^{ - 1}\). Under the null that there exists \(\theta\) such that \(g\left( {p,\theta } \right) = 0\) for the true \(p\), or in other words that the restrictions on entry probabilities embodied by the model are valid,

$$Tg^{\prime}\!\left( {\hat{p}_{T} ,\hat{\theta }_{T} } \right)\hat{S}_{T}^{ - 1} \left[ {\hat{S}_{T} \left( {\hat{p}_{T} ,\tilde{\theta }_{T} } \right)} \right]^{ - 1} g\left( {\hat{p}_{T} ,\hat{\theta }_{T} } \right) \approx \chi^{2} \left( {n - 1} \right),$$
(12)

and this over-identification test can be used to test the underlying theory. All we need for the implementation of this specification test, for short the \({\Sigma }\)-test, are thus \(\hat{V}_{T}\) and the derivatives \(\partial g_{i} \left( {p,\theta } \right)/\partial p_{i}\). The technical details for the determination of these expressions are given in Appendix III.B. The number of degrees of freedom is \(n - 1\) when assuming homogeneity [i.e., the length of vector \(\theta\) is 1, cf. (12)] and it is at most \(n - K\) when assuming heterogeneous players sorted in \(K\) clusters (i.e., the length of vector \(\theta\) is \(K\)), as discussed in Appendix III.C.

Note finally that since this test exploits the game’s probabilistic structure by rewriting agents’ probabilities of entry as a function of \(p_{ - i}\), it can be tailored for the assessment of behaviour in other binary-choice participation games like the volunteer’s game, the (discrete) step-level public good game and voter participation games. This, of course, remains conditional on having well-defined predictions to test, as is the case for EvE and QRE in general but not necessarily for IBE since its long-run equilibrium may not always be defined.Footnote 9

4 Experimental design and procedures

The experiments involve groups of 10 participants and a 2 × 3 factorial design which assumes two payoff levels, High and Low, and three payoff structures: one two-step payoff function (DISC) yielding a positive payoff \(G\) from entering if attendance \(A < c\) and 0 otherwise, and two non-monotone ones (NOM1 and NOM2) in which payoffs first increase and then decrease with \(A\). The binary payoff structure of DISC implies that the players’ choices are strict substitutes whereas the non-monotone structures introduce both strategic complementarity and strategic substitutability in the players’ actions that have been theoretically studied in the context of global congestion games (see e.g., Karp et al., 2007) but which effects in complete information settings have not yet been investigated experimentally.Footnote 10

These payoff structures are displayed in Fig. 1, and the models’ equilibrium relationships between \(p\) and \(\lambda > 0\) or \(\kappa > 0\) for the treatments considered are shown in Fig. 2. For each payoff level, both DISC and NOM1 yield \(\left( {\begin{array}{*{20}c} {10} \\ 6 \\ \end{array} } \right) = 210\) Nash equilibria in pure strategies, unique mixed-equilibrium strategies and unique IBE strategies whereas NOM2 has one more equilibrium in pure strategies (where all agents choose not to enter), two mixed-equilibrium strategies and two IBE strategies (one with a low entry-probability and one with a high entry-probability).Footnote 11

Fig. 1
figure 1

Payoff levels and structures of market-entry games. No filling stands for ‘No Entry’, light gray (dark gray) stands for ‘Entry’ when payoffs are Low (High). Payoffs expressed in Experimental Currency Units—see Appendix IV.D for exact figures

Fig. 2
figure 2

Relationship between \(p\) and EvE’s \(\lambda\) or IBE’s \(\kappa\). Thick (Thin) lines stand for High (Low) payoff levels. For EvE, the plots report the \(p^{Nash}\) predictions for each payoff structure and level (cf. coloured horizontal lines). For IBE, the plots display the \(p^{Nash}\) predictions (cf. dots) for each payoff structure and level. As \(\kappa \to \infty\), \(p \to 0\) in DISC and NOM1

We are interested in checking if and how behaviour is affected by these payoff structures and to what extent it is consistent with EvE and/or IBE when allowing for cluster-heterogeneity. In this regard, since the ranges of probabilities for which EvE and IBE yield model-consistent estimates in DISC and NOM1 are \(\left[ {0.5,p^{Nash} } \right]\) for EvE and \(\left[ {0, 1} \right]\) for IBE, the models’ cluster-estimates should lie within these ranges and be significantly different from each other for cluster-heterogeneity to be model-consistent and significant. It thus follows that the scope for IBE to accommodate the latter in these treatments is considerably larger than that for EvE.Footnote 12

A similar argument holds for NOM2 since the mixed-equilibria have different loci of consistent choice frequencies (defined either on \(\left[ {0.5,p_{1}^{Nash} } \right]\) or on \(\left[ {0,p_{2}^{Nash} } \right]\) with \(p_{2}^{Nash} < 0.5\)) whereas the IBE equilibria have a unique locus (because both equilibria depend on a common \(\kappa\)), so the identification of model-consistent clusters of participants playing in such different (Nash or IBE) equilibria can be achieved with IBE but not with EvE.

Our motivation to consider different payoff levels is to check whether the payoffs’ magnitude affects the presence of model-consistent clusters of players, and thus to possibly complement the findings of McKelvey et al. (2000) who report no significant payoff-magnitude effect on the participants’ QRE best-responsiveness in \(2 \times 2\) games and evidence of a heterogeneous play.

The experiments were conducted at the Laboratory for Experimental Economics of the University of Jaume I (Spain). Participants were undergraduate students in Business Administration, Law or Engineering and were recruited by public advertisement on campus. We conducted eight sessions per payoff structure (DISC, NOM1, NOM2) with 10 participants per session, totalling 240 individuals. For each payoff structure, we conducted four sessions with Low payoffs and four sessions with High payoffs. The experiments were conducted with a between-subject matching protocol and participants could play in only one session. Upon arriving in the laboratory, they were randomly assigned to cubicles equipped with computer terminals and were given instructions that were read aloud.Footnote 13 To avoid framing effects, we presented the game in neutral language by asking participants to choose between actions A and B. Each session involved 150 rounds of play, and at the end of each round, participants were only informed about the total number of players in their group who chose B ("No entry"), their own payoff in that round and their cumulated payoff. This information was appended to a "History" window that could be seen at any time during the experiment. Although participants played in fixed groups of 10, we believe that the provision of a sparse end-of-round information feedback combined with the relatively large number of players (10), and a relatively large ‘market size-to-capacity’ ratio (60%) renders entry-coordination very difficult to achieve. Each session lasted a maximum of 1 h, including the time needed to read the instructions. Participants were rewarded for each round of play at the rate of 0.02 € per 100 points and individual average earnings were €12.77 (i.e., €11.94 in the Low payoff sessions and €13.60 in High payoff ones).

5 Results

We start with an overview of the data by displaying the evolution of averaged entry probabilities and their polynomial fits in Fig. 3. The plots suggest an under-entry \((\hat{p} < p^{Nash} )\) in all High payoff treatments, and that the ‘magic’ \(\left( {\hat{p} \approx p^{Nash} } \right)\) is more likely to hold when payoffs are Low, especially in NOM1 and NOM2. These entry patterns are also present in the session data (cf. Appendix V) and in line with the session and treatment average entry rates of Table 1.

Fig. 3
figure 3

Evolution of average probabilities of entry. Horizontal lines stand for the symmetric mixed-equilibrium predictions (we only consider the high-probability equilibrium of NOM2). Bold lines represent polynomial fits of degree 10

Table 1 Average entry probabilities

The treatment (pooled) figures of Table 1 show no support for the predicted ranking of entry rates \(p_{DISC}^{Nash} < p_{NOM1}^{Nash} < p_{1, NOM2}^{Nash}\). Pairwise comparisons indicate a substantially higher entry rate in NOM1 than in DISC and NOM2 when payoffs are High and similar entry rates when they are Low. They also significantly increase with the payoff level, as predicted in equilibrium and as reported by Zwick and Rapoport (2002) who study the effect of ‘low’ and ‘high’ entry costs in treatments with a similar ‘market size-to-capacity’ ratio (50%). We summarise this overview of the pooled data as follows:

Observation 0: (A) There is under-entry when payoffs are High. When payoffs are Low, there is (1) over-entry in DISC, (2) a weak support for the Nash mixed-equilibrium play in NOM1, and (3) under-entry (with respect to the high probability equilibrium) in NOM2.

(B) The effect of the payoff structure is most salient when payoffs are High and yields a substantially higher average entry rate in NOM1. Average entry rates also significantly increase with the payoff level, as expected in equilibrium.

Before estimating the models, we briefly assess the symmetry of individuals’ entry probabilities. The bar-charts in Fig. 4 reveal minor differences in average entry probabilities between the sessions of a treatment, and large within-session disparities with clusters of participants displaying a similar entry behaviour.Footnote 14 The data also show no support for the ‘low probability’ mixed-equilibrium of NOM2 so we will always refer to the ‘high probability’ equilibrium of this treatment when discussing our estimation results.

Fig. 4
figure 4

Bar-charts of individual probabilities of entry. Each vertical bar represents an individual. Horizontal thin (thick) lines stand for the symmetric mixed-equilibrium predictions (average probabilities of entry)

5.1 Structural estimations when imposing symmetry

Table 2 reports the (pseudo-)Maximum Likelihood estimation outcomes of EvE and IBE when assuming symmetric players and unknown forms of autocorrelation and heteroskedascity in the errors. As the log-likelihood values contain no information about the model’s goodness-of-fit beyond the estimated probability of entry \(\hat{p}\), we focus on the estimates’ overall consistency with Observation 0, and on their data-consistency, i.e., that a treatment’s session estimates are of similar magnitude and significance as the estimate for the pooled data.Footnote 15

Table 2 EvE and IBE estimates

Looking first at the outcomes for EvE, it appears that except for NOM2/High, all sessions report insignificant or inconsistent (negative) estimates no matter if \(p^{Nash}\) is rejected or not (cf. shaded cells) or if their average entry rates indicate under-entry (cf. Table 1 and Fig. 4). Such insignificant estimates support maximal exploration whereas inconsistent ones result from EvE’s inability to rationalize over-entry when \(p^{Nash} > 0.5\), as shown in Fig. 2. In the case of NOM2/High, they are all significantly positive and support a contained exploitation that is in line with the observed under-entry.

The pooled EvE-estimates indicate a contained exploitation in all High payoff structures and in NOM2/Low, and they are otherwise inconsistent (or almost so) as a result of over-entry. Thus, besides a significant under-entry in NOM2/High, the EvE-estimates provide no evidence of a data-consistent behaviour when the estimations impose a symmetric play.

This sharply contrasts with the outcomes for IBE since the session estimates are all significantly positive, typically larger when payoffs are High in DISC and NOM2, and similar across payoff levels in NOM1. This is confirmed by the treatments’ estimates which pairwise-comparisons further indicate that \(\hat{\kappa }_{DISC} > \hat{\kappa }_{NOM2} > \hat{\kappa }_{NOM1}\) when payoffs are High and \(\hat{\kappa }_{DISC} > \hat{\kappa }_{NOM1} \approx \hat{\kappa }_{NOM2}\) otherwise. We summarise the above in the following observation:

Observation 1: When assuming symmetric players and estimating the models with pseudo-Maximum Likelihood methods:

(A) The EvE-estimates are data-consistent in NOM2/High and indicate a contained exploitation that is in keeping with the observed under-entry. Otherwise, they are data-inconsistent: they mostly indicate maximal exploration whereas pooled estimates are either negative (thus inconsistent) or they support a contained exploitation.

(B) The IBE-estimates are data-consistent and in keeping with Observation 0. They indicate:

  1. (1)

    \(\hat{\kappa }_{High} > \hat{\kappa }_{Low}\) in DISC and NOM2, and \(\hat{\kappa }_{High} \approx \hat{\kappa }_{Low}\) in NOM1.

  2. (2)

    \(\hat{\kappa }_{DISC} > \hat{\kappa }_{NOM2} > \hat{\kappa }_{NOM1}\) when payoffs are High and \(\hat{\kappa }_{DISC} > \hat{\kappa }_{NOM2} \approx \hat{\kappa }_{NOM1}\) when they are Low.

5.2 Structural estimations when relaxing symmetry

We now estimate the models without imposing symmetry and we run our specification test to assess the consistency of estimates with the restrictions that either model imposes on individual behaviour. Note that the Σ-test only suits the analysis of session data, i.e., games with \(n\) players.

For each session, we cluster the entry probabilities \(p_{i}\) using the kmeans procedure (with 20 random initial values) and estimate each model and its inverse form with \(K = \left\{ {1,2,3,4} \right\}\) clusters; each cluster having its own \(\theta\)-parameter (where \(\theta\) is either to \(\lambda\) or \(\tau\)).Footnote 16 This generates eight specifications for each model and treatment which we estimate with OLS procedures. For each session, model (IBE and EVE) and value of \(K\), we select the ‘best’ specification in terms of the estimates’ theoretical consistency and the credibility of their confidence intervals. Next, for each session and model, we select the estimated specification with the smallest number of clusters, \(K_{Min}\), needed to not reject the Σ-test at \(\alpha = 5\%\). Thus, the reported estimation results document the models’ non-rejections of the Σ-test when \(K_{Min} < 4\), and their rejections or non-rejections when \(K_{Min} = 4\). Noting that a rejection with \(K_{Min} = 4\) can reasonably be seen as disqualifying the model when \(n = 10\), we focus discussion on specifications that do not reject the Σ-test.

The estimation outcomes are relegated to Tables VII.A.1–4 in Appendix VII.A, and since they display no obvious pattern in terms of payoff structure, we start with summarising their main characteristics for each payoff level in the upper panel of Table 3. The first three columns tally the models’ rejections and non-rejections of the Σ-test when \(K_{Min} = 1\) (i.e., homogeneity is not rejected) or when \(1 < K_{Min} \le 4\) (i.e., homogeneity is rejected in favour of cluster-heterogeneity).

Table 3 Summary of specification test outcomes: OLS procedures

EvE is not rejected for a total of 20 sessions (out of 24, 83%) whereas IBE is not rejected for a total of 14 sessions (58%). Of these non-rejected specifications, EvE supports cluster-heterogeneity in 12 sessions (60%) whereas all non-rejected IBE-specifications do so. The summary tables in Appendix VII further reveal that both models are rejected for 4 sessions and that both are not rejected for 14 others. Since the remaining 6 sessions (25%) reject only IBE, it appears that EvE organises best the observed behaviour.

We proceed with checking whether the cluster-estimates of a specification (session) are heterogeneous with pairwise \(\chi^{2}\)-tests of equality and note that when all pairwise-tests are rejected, the estimates are considered heterogeneous if all pairwise-tests are also rejected when assuming \(K + 1\) clusters and the clusters were nested – the pairwise test outcomes are summarised in the last columns of Tables VII.A.1–4 in Appendix VII.A. On the other hand, a single non-rejection of equality implies that the specification is over-parametrised so the estimated cluster parameters are unreliable and one can only conclude that it has at most \(K_{Min} - 1\) clusters.

The last three columns of Table 3 refer to the non-rejected specifications of a treatment and report the percentages of (1) over-parametrised multi-clustered specifications, (2) insignificant or inconsistent estimates and (3) individuals affected by such estimates. The models sharply differ according to these criteria as EvE’s specifications are far more likely to be over-parametrised than the IBE ones no matter the payoff level, i.e., a five-fold (three-fold) percentage difference when payoffs are High (Low). Most estimates of non-over-parametrised EvE-specifications are insignificant and none of these specifications yields estimates that fulfil the conditions to be considered heterogeneous. As for IBE, all estimates of non-over-parametrised specifications comply with these conditions when \(1 < K_{Min} \le 3\) which leads us to conclude that, as expected, IBE accommodates cluster-heterogeneity better that EVE (cf. Sect. 4). Finally, about 50% of EvE’s estimates are insignificant and affect some 37% of individuals no matter the payoff level whereas for IBE the figures drop at least by half, especially when payoffs are Low.

We highlight treatment differences by assigning to each participant the \(\theta\)-estimate of the cluster s/he belongs to and by comparing the resulting cumulative distributions of estimates for High and Low payoffs in each payoff structure. These distributions are displayed in Fig. 5 (with the samples’ median estimates)—insignificant estimates were set equal to 0. To document the effect of the \({\Sigma }\)-test on inference, the plots assume either (1) all estimates regardless of the sessions’ \({\Sigma }\)-test outcomes (cf. dashed lines), or (2) estimates of non-rejected specifications only (cf. plain lines). In this regard, the distributions pertaining to (1) and (2) reveal important differences only when non-rejected specifications are seldom, as for IBE in NOM2/High.

Fig. 5
figure 5

Cumulative distributions of individuals’ OLS estimates. Thick (Thin) lines stand for High (Low) payoff levels—dashed lines refer to the \(4 \times 10\) estimates of a treatment regardless of the \({\Sigma }\)-test outcomes. Insignificant estimates are set equal to 0. The plots report the estimates medians and numbers of non-rejected specifications (in brackets). The CDFs assume a maximum \(\hat{\lambda }_{i}\)- and \(\hat{\kappa }_{i}\)-estimates of 5 and 15, respectively

The distributions’ large steps witness the presence of prominent clusters. In the case of EvE, the most prominent clusters consist of insignificant estimates and are found in DISC and NOM1 no matter the payoff level. There are also noticeable clusters of relatively large estimates supporting a more intense exploitation when payoffs are Low in DISC (with \(\hat{\lambda }_{i} \ge 5\) for over 20% of participants) and in NOM2 (with \(\hat{\lambda }_{i} \ge 3.5\) for about 40%). Such larger estimates counter-intuitively suggest that for these participants, exploitation is more intense when payoffs are Low. This contrasts with NOM1 where the distributions are more alike across payoff levels and support the prediction that exploitation intensifies with payoffs, as the median estimates also qualitatively suggest.

As for IBE, the distributions look similar in NOM1 and suggest no particular ‘payoff magnitude’ effect. They also display no prominent clusters of large estimates and thus contrast with the distributions of DISC and NOM2 which both do when payoffs are High (\(\hat{\kappa }_{i} > 10\) for about 30% of participants in these treatments). The presence of such clusters in those treatments identifies participants with low entry-rates, and their absence in NOM1 is in line with Observation 1(B): (1) the distributions and median estimates suggest that \(\kappa_{High} > \kappa_{Low}\) in DISC and NOM2, and \(\kappa_{High} \approx \kappa_{Low}\) in NOM1, and (2) the median estimates support \(\kappa_{NOM1} < \kappa_{DISC} \approx \kappa_{NOM2}\) when payoffs are High.

We attribute the absence of such clusters in NOM1 and the higher participation in NOM1/High to the relatively lower risk of regretting to enter that this structure entails when compared to DISC (which yields zero payoffs in case of over-entry) or to NOM2 (which bears an incentive to enter to avoid the risk of under-entry but which highest payoffs obtain only when \(A = \left\{ {3,4,5} \right\}\), cf. Appendix IV.D).

All in all, allowing for cluster-heterogeneity in the estimations reveals important differences in the models’ explanatory powers and indicates that IBE outperforms EvE in this regard. We summarize this as follows:

Observation 2: When relaxing symmetry and estimating the models with OLS procedures, the null of the \(\Sigma\)-test is less likely to be rejected by EvE than by IBE (17% vs 42% of all sessions, respectively). However, when compared to IBE, the non-rejected EvE-specifications are: (1) less likely to reject homogeneity, (2) more likely to be over-parametrised, (3) more likely to generate insignificant or inconsistent estimates that affect a larger proportion of participants, and (4) unable to rationalise the presence of clusters of players with low entry-probabilities. Thus IBE accommodates cluster-heterogeneity better than EvE.

We conduct the same analysis for the last 75 rounds to check for a possible experience effect in the observed behaviour. The tests’ outcomes are summarised in the lower panel of Table 3—see Tables VII.B.1–4 in Appendix VII.B for detailed results.Footnote 17 Now EvE is not rejected for all sessions whereas IBE is not rejected for 18 of them (75%, instead of 58% when accounting for all rounds) mostly with Low payoffs. Homogeneity (\(K_{Min} = 1\)) is again rejected for IBE in all sessions, and it is not for EvE in 12 sessions so that 50% of EvE-specifications are multi-clustered (instead of 60%). These specifications also display fewer clusters only when assuming EvE in DISC and NOM1/High so behaviour in these treatments would become more homogenous in the long run according to EvE. Overall, since both models are not rejected for 18 (75%) sessions and the remaining 6 reject IBE but not EvE (cf. Appendix VII.B), EvE would appear again to organise the observed behaviour best.

Looking into the specifications’ details, we find that the models yield more non-rejected over-parametrised specifications: over 83% for EvE, and 50% for IBE no matter the payoff level. There is also no evidence of heterogeneous estimates in the unique non-over-fitted EvE-specification (cf. NOM1/Low/Session 1) whereas all IBE-specifications with \(1 < K_{Min} \le 3\) are heterogeneous. The models’ differences remain in terms of insignificant estimates, with 55% of EvE-estimates indicating maximal exploration and affecting about 50% of participants whilst only 27% of the IBE ones are insignificant and concern 17% of participants no matter the payoff level.

The distributions of estimates in Fig. 6 tend to confirm the patterns found when assuming all data and they are moderately affected by the data-attrition resulting from the \({\Sigma }\)-test rejections. Insignificant EvE-estimates are frequent in all treatments but NOM2/Low, where the estimates support a contained exploitation and the null of homogeneity (\(K_{Min} = 1\)) in all sessions. Otherwise, the distributions pertaining to DISC and NOM2 still counter-intuitively suggest that it increases when payoffs are Low whereas those of NOM1 comply with the alternative that exploitation increases with payoff levels.

Fig. 6
figure 6

Cumulative distributions of individuals’ OLS estimates (last 75 rounds). Thick (Thin) lines stand for High (Low) payoff levels—dashed lines refer to the \(4 \times 10\) estimates of a treatment regardless of the \({\Sigma }\)-test outcomes. Insignificant estimates (at \(\alpha = 5\%\)) are set equal to 0. The plots report the estimates medians and numbers of non-rejected specifications (in brackets). The CDFs assume a maximum \(\hat{\lambda }_{i}\)- and \(\hat{\kappa }_{i}\)-estimates of 5 and 15, respectively

For IBE, the distributions and median estimates of DISC look alike those in Fig. 5 whilst those of NOM1 and NOM2 reveal (1) a drop in the median estimate of NOM1/Low and the presence of a cluster with large \(\kappa_{i}\)-estimates in NOM1/High, and (2) the absence of such a cluster in NOM2/High. However, the evolution of play in session data suggests that such higher (lower) participation for some participants in NOM1/Low and NOM2/High (NOM1/Low) are actually due to an ‘end-game effect’ in the last 10–20 rounds of these treatments, cf. Appendix V. This leads to the following observation:

Observation 3: When relaxing symmetry and estimating the models with OLS procedures and the data of the last 75 rounds:

(A) The null of the \({\Sigma }\)-test is less likely to be rejected by EvE than by IBE (0% of all sessions vs 25% for IBE).

(B) Non-rejected EvE-specifications in DISC/High and especially NOM1/High have fewer clusters so behaviour becomes more homogenous in the long run according to EvE.

(C) The features 1) to 4) of EvE’s non-rejected specifications outlined in Observation 2 hold and confirm IBE’s superior ability in organising the observed behaviour.

We proceed with a second robustness check of Observation 2 by estimating the models with more efficient procedures that possibly call for (‘naïve’ or Tikhonov) regularisation of the error variance matrix to address the unstable results we got when estimating the models with GLS methods. We thus consider five minimum-distance estimators in addition to the OLS and GLS ones, and we allow for regularisation whenever it is deemed necessary to give the models their best shot at organising the data.Footnote 18 That is, we estimated the models and their inverse forms for each session with seven estimators and with \(K = \left\{ {1,2,3,4} \right\}\) clusters, generating over 80 specifications per model and treatment. For each model and value of \(K\), we selected the specification that best addresses a set of criteria regarding the theoretical consistency of parameter estimates and the credibility of their confidence intervals, but also to the condition number of the variance matrix of the error terms (not too large) and to the magnitude of the efficiency gains relative to OLS (not too large but not negligible).

The selected \(K_{Min}\)-specifications are reported in Tables I to IV of Appendix VII.C and indicate that some form of regularisation is needed for 19 sessions (79%) when estimating EvE and for only 2 (8%) when estimating IBE.Footnote 19 The \({\Sigma }\)-test outcomes and the main characteristics of the models’ non-rejected specifications are summarised in Table 4. They first indicate that the use of regularisation marginally affects the \({\Sigma }\)-test outcomes for EvE and leaves those for IBE idle. Also, both models are not rejected for 13 sessions (instead of 14 when using OLS methods), EvE is not rejected for 6 (25%) and IBE for only 1 session (instead of 0).

Table 4 Summary of specification test outcomes with(out) regularisation

The effect of regularisation is more salient on the estimates since the models become comparable in terms of rejecting homogeneity (i.e., 22 sessions for EvE vs 24 sessions for IBE), the proportion of insignificant/inconsistent estimates and, to a lesser extent, the proportion of individuals with such estimates. Yet, over 50% of EvE’s non-rejected specifications are still over-parametrised whereas less than 20% of the IBE-ones are so.

The plots in Fig. 7 refer to heterogeneous samples of estimators and appear again to be affected by the \({\Sigma }\)-test results only when the available data is sparse, as for EvE in NOM2/High. Insignificant EvE-estimates are mostly found in DISC/Low and NOM2/High, and they are about equally frequent no matter the payoff level in NOM1. Otherwise, the distributions of IBE-estimates, like those of EVE-estimates in DISC, display similar patterns as those referring to OLS estimates, cf. Fig. 5. The most noticeable changes occur for the EVE-estimates of NOM1 and NOM2: they are now most similar across payoff levels in NOM1 and suggest no particular payoff magnitude effect (like the IBE-distributions of this treatment) whereas they are mostly different in NOM2, with stochastically larger (and mostly homogenous) cluster-estimates when payoffs are Low.

Fig. 7
figure 7

Cumulative distributions of individuals’ (regularized) estimates. Thick (Thin) lines stand for High (Low) payoff levels—dashed lines refer to the \(4 \times 10\) estimates of a treatment regardless of the \({\Sigma }\)-test outcomes. Insignificant estimates are set equal to 0. The plots report the estimates medians and numbers of non-rejected specifications (in brackets). The CDFs assume a maximum \(\hat{\lambda }_{i}\)- and \(\hat{\kappa }_{i}\)-estimates of 5 and 15, respectively

Overall, this robustness analysis confirms the models’ respective (in)sensitivity to the symmetric assumption (Observation 1) and IBE’s superior ability to diagnose a model-consistent cluster-heterogeneity in the observed behaviour (Observation 2). We summarise the above in the following final observation:

Observation 4: When relaxing symmetry and using (naïve or Tikhonov) regularisation procedures when estimating the models with GLS or distance-based estimators (instead of OLS estimators):

(A) EvE is still less likely to reject the null of the \(\Sigma\)-test (25% of all sessions vs 42% for IBE).

(B) The features 1) to 4) of EvE’s non-rejected specifications outlined in Observation 2 hold and confirm IBE’s superior ability in organising the observed behaviour.

(C) Our conclusions for IBE are hardly affected by the use of regularisation procedures.

6 Conclusion

In this paper we propose a novel approach to the analysis of symmetric participation games that checks the consistency of a model’s estimates with the restrictions it imposes on individual behaviour. This approach relaxes the model’s assumption of symmetry by allowing for the existence of clusters of players with similar observable characteristics, and it assesses how much cluster-heterogeneity a model can tolerate to still be consistent with its behavioural restrictions by means of a specification test. Thus, besides offering an alternative to the usual assessment of a model in terms of its goodness-of-fit, this approach allows for individual differences to be accounted for in a model-consistent way and therefore contributes to the literature on modelling heterogeneity in static games, see e.g., Rogers et al. (2009) and Golman (2011).Footnote 20

We assessed this approach with data on market-entry experiments which we analyse in terms of two stationary models: Exploitation versus Exploration (EvE, which is equivalent Logit-QRE) and Impulse Balance Equilibrium (IBE). Our empirical analysis sheds new light on the models’ sensitivities to the assumption of symmetric players or of cluster-heterogeneity and to the econometric procedures used. We summarise our findings in the following four points.

First, estimating EvE with the usual assumption of symmetric and homogenous players provides limited insight into the analysis of behaviour in these games because (1) the session estimates are largely invariant to treatment conditions and mostly support a maximal exploration (or purely random behaviour), and (2) the estimates for the pooled data are seldom consistent with session estimates. In this regard, IBE outperforms EvE.

Second, when allowing for cluster-heterogeneity and estimating the models with OLS methods, the null of the specification test is less likely to be rejected for EvE, and EvE is more likely to support homogeneity than IBE. However, the estimated specifications have considerably more insignificant cluster-estimates and are typically over-parametrised, so IBE also outperforms EvE in terms of accommodating cluster-heterogeneity. This holds when the estimations pertain to the second half of the experiments to account for participants’ experience of play.

Third, our approach can unveil behavioural patterns such as the presence of clusters of players with low-entry rates in some treatments and may explain them, i.e., such clusters are absent in treatments where payoffs remain positive when participation is over-capacity (as in NOM1) and they are present in treatments where the risk of experiencing a regret from entering is more salient (as in DISC and NOM2).

Fourth, when estimating the models with more efficient procedures (i.e., GLS or distance-based estimators that possibly allow for regularisation) our conclusions for IBE are hardly affected whereas those for EvE change considerably: homogeneity is then always rejected (like for IBE when assuming OLS methods) and insignificant or inconsistent cluster-estimates are less frequent. Yet, IBE still accommodates cluster-heterogeneity better than EvE.

Finally, the proposed approach is flexible enough to also allow an assessment of which type of heterogeneity is most consistent with some behavioural model, e.g., gender, socio-demographics, or any relevant mixture of observable characteristics. For example, it can be used to reveal a gender and/or a socio-demographic effect in the players’ participation, and the specification test could determine whether this effect (or which of these effects) is consistent with the symmetric model considered.Footnote 21 It can also be applied to test predictions regarding the sorting of players into clusters of individuals who either always or never participate as a result of reinforcement learning, as Duffy and Hopkins (2005) predict and find. This, however, would raise the more challenging question of the formation of such clusters over time and its consistency with the type of learning considered. In this regard, our approach provides some first insights which we hope will be further explored.