Introspection Dynamics in Asymmetric Multiplayer Games

Evolutionary game theory and models of learning provide powerful frameworks to describe strategic decision-making in social interactions. In the simplest case, these models describe games among two identical players. However, many interactions in everyday life are more complex. They involve more than two players who may differ in their available actions and in their incentives to choose each action. Such interactions can be captured by asymmetric multiplayer games. Recently, introspection dynamics has been introduced to explore such asymmetric games. According to this dynamics, at each time step players compare their current strategy to an alternative strategy. If the alternative strategy results in a payoff advantage, it is more likely adopted. This model provides a simple way to compute the players’ long-run probability of adopting each of their strategies. In this paper, we extend some of the previous results of introspection dynamics for 2-player asymmetric games to games with arbitrarily many players. First, we derive a formula that allows us to numerically compute the stationary distribution of introspection dynamics for any multiplayer asymmetric game. Second, we obtain explicit expressions of the stationary distribution for two special cases. These cases are additive games (where the payoff difference that a player gains by unilaterally switching to a different action is independent of the actions of their co-players), and symmetric multiplayer games with two strategies. To illustrate our results, we revisit several classical games such as the public goods game.


Introduction
Social behavior has been studied extensively through pairwise interactions [37].Despite their simplicity, they provide important insights, such as how populations can sustain cooperation [8,50,52].Yet, many interesting collective behaviors occur when multiple individuals interact simultaneously [5,6,30,34,56,58,64,77,86].Most of these situations cannot be captured by the sum of several pairwise interactions.Thus, to account for such nonlinearities, one needs to consider multiplayer games [30].For example, a well-known effect that only emerges when more than two players are present is the "second-order free-riding problem" [20].A natural solution to maintain pro-social behavior in a community is to monitor and punish defectors (and/or reward cooperators).However, most forms of sanctioning are considerably costly [33].Therefore, an additional (second-order) dilemma emerges: individuals would like cooperation to be incentivized but they prefer that others pay the associated costs.
Another interesting effect that can be explored with multiplayer games is the scale or size of the interaction itself.In situations that require some sort of coordination and where expectations on others play an important role in one's decisions, a growing group size might hinder the optimal outcome [77].Likewise, it has been shown that it is hard to cooperate in large groups [34,63,71].This is not, however, a general effect [28].Additionally, group size can vary in a population of players.There, not only the average group size can have an important effect, but also the variance of the group size distribution [16,61] and the group size distribution itself [62].
Complexity further increases when players differ significantly among themselves.This diversity can be captured by asymmetric games [26,32,35,36,44,55,73,78,85].In symmetric games, all players are indistinguishable.Thus, to fully characterize the state of the game, we only require to know the number of players playing each strategy.Conversely, in asymmetric games, players can differ in their available actions and in their incentives to choose each action.Therefore, they can have uneven effects on others' payoffs too.For example, in public goods games and collective-risk dilemmas, players can have different initial endowments (or wealth), productivities, costs, risk perceptions, or risk exposures [1,32,46,47,84,87].Hence, to fully describe the state of the game, we need to know the action of each player.This greatly increases the size of the game's state space; even more so, for more than two players.
Models from evolutionary game theory (EGT) [37,42,43,51] and learning theory [22,40,59,70], have been widely used to study strategic behavior.The concept of evolutionary stable strategy, originally proposed for pairwise encounters [43], was extended to multiplayer games [15,17,58].Also the well-known replicator equation [37,79] can easily account for multiplayer games [19,27,31,64].More recently, the replicator-mutator equation was applied to study the dynamics of multiplayer games, too [41].As for asymmetric games, a few additional assumptions are needed in the description of the model.For example, if there are two different types of players, typically, either there are two populations co-evolving ("bimatrix games" [29,37,81]) or there is a single population of players where each can play the two types or roles ("role games") [37].The case of asymmetric games with more than two players is substantially less studied within deterministic EGT.Gokhale and Traulsen and Zhang et al. are two exceptions [29,89].Notably, although these works study multiplayer games, they consider, at most, two different types (drawn from two populations), which leaves out the exploration of full asymmetry.Also stochastic evolutionary game dynamics [54,80] provides several models for studying multiplayer and asymmetric games.Fixation probabilities [23] in asymmetric 2-player [76], asymmetric 3-player games [74], and symmetric multiplayer games [27,39] were recently derived.Furthermore, average strategy abundances [3,4] were obtained only for 2−player asymmetric games [55,75] or multiplayer symmetric games [12,28,38].For a review on evolutionary multiplayer games both in infinitely large populations and in finite populations, we refer to Gokhale and Traulsen [30].Learning models (of strategic behavior) take a different approach from EGT [9,10,22,24,36,40,59,70,82].There is no evolution of strategies in a population necessarily, but a process by which individuals learn strategies dynamically.
Introspection dynamics has recently proven to be a useful learning model for tackling (a)symmetric games [18,32,45,69,72].In here, players update their strategies by exploring their own set of strategies in the following simple way: each time, after a round of the game, a random player considers a random alternative strategy; they compare the payoff that it would have given them to their current payoff; if the new strategy would provide a higher payoff, it is more likely adopted on the next round.We describe the model formally in the next section.While in Couto et al. [18] only 2−player games were considered, this framework is general enough to account for multiple players.Particularly, introspection dynamics allows a natural exploration of full asymmetry in many-player games compared to population models.For example, in imitation dynamics, one needs to specify who is being imitated by whom [84].When players differ, it might not make sense to assume that they imitate others.Introspection avoids this assumption because players' decisions only depend on their own payoffs.An existing model that shares this same property is the logit-response dynamics or, simply, logit dynamics [2,7,13].Unlike introspection dynamics, at each time, the randomly drawn player to update their strategy, can switch to any other strategy with a non-zero probability.This probability grows with the payoff provided by each strategy at the (possible) future state.The probability of switching functions of introspection dynamics and logit dynamics is similar in their exponential shape; hence, the two processes have some interesting connections.
Here, we extend previous results of pairwise games under introspection dynamics [18] to multiplayer games.First, we derive a formula that allows us to numerically compute the stationary distribution of introspection dynamics for any multiplayer asymmetric game.Second, we obtain explicit expressions of the stationary distribution for two special cases.These cases are additive games (where the payoff difference that a player gains by unilaterally switching to a different action is independent of the actions of their co-players), and symmetric multiplayer games with two strategies.To illustrate our theoretical results, we analyze various multiplayer asymmetric social dilemmas, extending the framework in [31] to asymmetric games.We also study the asymmetric version of a public goods game with a rewarding stage [57].Finally, we compare introspection dynamics with logit dynamics, in the Appendix, where we show that the two processes have equivalent stationary distributions for some particular games (namely, 2-strategy, potential and additive games).

Model of Introspection Dynamics in Multiplayer Games
We consider a normal form game with N (≥ 2) players.In the game, a player, say player i, can play actions from their action set, A i := {a i,1 , a i,2 , ..., a i,m i }.The action set of player i has m i actions.In this model, players only use pure strategies.Therefore, there are finitely many states of the game.More precisely, there are exactly m 1 × m 2 × ... × m N states.We denote a state of the game by collecting the actions of all the players in a vector, a := (a 1 , a 2 , ..., a N ) where a ∈ A := A 1 × A 2 × ... × A N and a i ∈ A i .We also use the common notation, a := (a i , a −i ) to denote the state from the perspective of player i.In the state (a i , a −i ), player i plays the action a i ∈ A i and their co-players play the actions a −i ∈ A −i where A −i is defined as A −i := j =i A j .The payoff of a player depends on the state of the game.We denote the payoff of player i in the state a with π i (a) or π i (a i , a −i ).In this paper, we use bold font letters to denote vectors and matrices.We use the corresponding normal font letters with subscripts to denote elements of the vectors (or matrices).Since players only use pure strategies in this model, we use the terms strategies and actions interchangeably throughout the whole paper.
In this model, players update their strategies over time using introspection dynamics [18].At every time step, one randomly chosen player can update their strategy.The randomly chosen player, say i, currently playing action a i,k , compares their current payoff to the payoff that they would obtain if they played a randomly selected action a i,l = a i,k from their action set A i .This comparison is done while assuming that the co-players do not change their respective actions.When the co-players of player i play a −i , player i changes from action a i,k to the new action a i,l in the next round with probability . ( Here β ∈ [0, ∞) is the selection strength parameter that represents the importance that players give to payoff differences while updating their actions.For β = 0, players update to a randomly chosen strategy with probability 0.5.For β > 0, players update to the alternative strategy under consideration with probability greater than 0.5 (or less than 0.5) if the switch gives them an increase (or decrease) in the payoff.Introspection dynamics comprises a Markov chain and can be studied by analyzing the properties of the corresponding transition matrix T. The transition matrix element T a,b denotes the conditional probability that the game goes to the state b in the next round if it is in state a in the current round.In order to formally define the transition matrix, we first need to introduce some notations and definitions.We start by defining the neighborhood set of state a. Definition 1 (Neighborhood set of a state) The neighborhood set of state a, Neb(a), is defined as: In other words, a state in Neb(a) is a state that has exactly one player playing a different action than in state a.For example, consider the game where there are three players and each player has the identical action set {C, D}.The state (C, C, D) is in the neighborhood set of (C, C, C) whereas the state (C, D, D) is not.Two states that belong in each other's neighborhood set only differ in exactly a single player's action (and, we call this player as the index of difference between the neighboring states).(3) In the previous example, the index of difference between the neighboring states (C, C, C) and (C, C, D) is 3. Using the above definitions, one can formally define the transition matrix of introspection dynamics by The transition matrix is a row stochastic matrix (the sums of the rows are 1).This implies that the stationary distribution of T, a left eigenvector of T corresponding to eigenvalue 1, always exists.We introduce a sufficient condition for the stationary distribution of T to be unique.
When the selection strength, β, is finite, the transition matrix of introspection dynamics has a unique stationary distribution.A finite value of β results in a transition of non-zero probability between neighboring states.Since no state is isolated (i.e., every state belongs in the neighborhood set of another state) and there are only finitely many states of the game, every state is reachable in a finite number of steps from any starting point.The transition matrix, T, is therefore primitive for a finite β.By the Perron-Frobenius theorem, a primitive matrix, T, will have a unique and strictly positive stationary distribution u := (u a ) a∈A which satisfies the conditions: where 1 is the column vector with the same size as u and has all elements equal to 1.For all the analytical results in this paper, we consider β to be finite so that stationary distributions of the processes are unique.The above equations only present an implicit representation of the stationary distribution u.The stationary distribution can be explicitly calculated by the following expression (which is derived using Eqs. 5 and 6), where U is a square matrix of the same size as T with all elements equal to 1 and 1 is the identity matrix.The matrix 1 + U − T is invertible when T is a primitive matrix [18].Using Eq. ( 7), one can compute the unique stationary distribution of introspection dynamics (with a finite β) for any normal form game (with arbitrary number of asymmetric players and strategies).
The stationary distribution element u a is the probability that the state a will be played by the players in the long-run.Using the stationary distribution, one can calculate the marginal probabilities corresponding to each player's actions.That is, the probability that player i plays action a ∈ A i in the long-run, ξ i,a , can be computed as

Additive Games and Their Properties Under Introspection Dynamics
In this section, we discuss the stationary properties of introspection dynamics when players learn to play strategies in a special class of games: additive games.In an additive game, the payoff difference that a player earns by making a unilateral switch in their actions is independent of what their co-players play.In other words, if none of the co-players change their current actions, the payoff difference earned by making a switch in actions is only determined by the switch and not on the actions of the co-players'.Formally, in additive games, for any player i, any pair of actions x, y ∈ A i , and any q ∈ A −i , is independent of q and only dependent on x and y.In the literature, this property is sometimes called equal gains from switching [53,83].For games with this property, the stationary distribution of introspection dynamics takes a simple form.
Proposition 1 When β is finite, the unique stationary distribution, u = (u a ) a∈A , of introspection dynamics for an N −player additive game is given by where f j (a , a j ) is the co-player independent payoff difference given by Eq. (9).
For all proofs of Propositions and Corollaries, please see Appendix 2. Using the stationary distribution and Eq. ( 8), one can also exactly compute the cumulative probabilities with which players play their actions in the long-run (i.e., the marginal distributions).In this regard, introspection learning in additive games is particularly interesting.The stationary distribution and the marginal distributions of introspection dynamics in additive games are related in a special way.
Proposition 2 Let u = (u a ) a∈A be the unique stationary distribution of introspection dynamics with finite β for an N −player additive game.Then, u a is the product of the marginal probabilities with which each player plays their respective actions in a.That is, For an N -player additive game, ξ j,a j is given by where f j (a , a j ) is the co-player independent payoff difference given by Eq. (9).
The above proposition states that for additive games, the stationary distribution of introspection dynamics can be factorized into its corresponding marginals.In the long-run, the probability that players play the state a = (a 1 , a 2 , ..., a N ) is the product of the cumulative probabilities that player 1 plays a 1 , player 2 plays a 2 and so on.This property of the additive game was already shown for the simple 2−player, 2−action donation game in Couto et al. [18].Here, we extend that result for any additive game with arbitrary number of players, each having an arbitrary number of strategies.In the next section, we use the well-studied example of the linear public goods game (an additive game) to illustrate these results.

Example of an Additive Game: Linear Public Goods Game with 2 Actions
In the simplest version of the linear public goods game (LPGG) with N players, each player has two possible actions, to contribute (action C, to cooperate), or to not contribute (action D, to defect) to the public good.The players may differ in their cost of cooperation and the benefit they provide by contributing to the public good.We denote the cost of cooperation for player i and the benefit that they provide by c i and b i , respectively.We define an indicator function α(.) to map the action of cooperation to 1 and the action of defection to 0. That is, α(C) = 1 and α(D) = 0.The payoff of player i when the state of the game is a is given by The payoff difference that a player earns by unilaterally switching from C to D (or viceversa) in the linear public goods game is independent of what the other players play in the game.That is, for every player i, is independent of co-players' actions q.The linear public goods game is therefore an example of an additive game.This property of the game results in easily identifiable dominated strategies.For player i, defection dominates cooperation when c i > b i /N while cooperation dominates defection when c i < b i /N .Using Proposition 1, one can derive the closed-form expression for the stationary distribution of an N −player linear public goods game with two strategies.

Proposition 3 When β is finite, the unique stationary distribution of introspection dynamics for an N −player linear public goods game is given by
where We use a simple example to illustrate the above result.Consider a 3−player linear public goods game.All players provide a benefit of 2 units when they contribute to the public good . They differ, however, in their cost of cooperation.For player 1 and 2, the cost of cooperation is 1 unit (c 1 = c 2 = 1) while for the third player, the cost is 1.5 units (c 3 = 1.5).In the stationary distribution of the process with selection strength β = 1, the cumulative probability that player 1 (or 2) cooperates and player 3 defects are ξ 1,C = ξ 2,C = 0.417 and ξ 3,D = 0.697, respectively.With the exact values, one can confirm the factorizing property of the stationary distribution for additive games in this example (i.e., Proposition 2).That is, u CCD = 0.121 We now use Eq.(15) to systematically analyze the LPGG under introspection dynamics.First, we study the simple case of 4−player symmetric LPGG (the cost and benefit for all the 4 players are c and b).Since all players are identical, the states of the game can be enumerated by counting the number of cooperators in the state.There are only 5 distinct states of the game (from 0 to 4 cooperators).When the parameters of the game are such that defection dominates cooperation (b = 2, c = 1, Fig. 1a), the stationary distribution of the process at high β indicates that in the long-run, states with higher number of cooperators are less likely than states with lower number of cooperators.However, for intermediate and low β, stationary results are qualitatively different.Here, the state with 1 cooperator (or even 2 cooperators, depending on how small β is) is the most probable state in the long-run (Fig. 1b).Since every possible state is equiprobable in the limit of β → 0, the outcome with 2 cooperators is most likely only because there are more states with 2 cooperators than states with any other number of cooperators.
Naturally, β plays an important role in determining the overall cooperation in the longrun.When β is low, average cooperation varies weakly with the strength of the dilemma, b/N − c (Fig. 1c).Even when the temptation to defect is high (b/N − c = −2), players cooperate with a non-zero probability.Similarly, when cooperation is highly beneficial and strictly dominates defection (b/N − c = 2), players defect sometimes.At higher values of β, the stationary behavior of players is more responsive to the payoffs and thus reflects an abrupt change near the parameters where the game transitions from defection-dominating to cooperation-dominating (b/N − c = 0).
To study what effects might appear due to asymmetry in the LPGG, we consider the game with 3 asymmetric players.All the players can differ in their cost of cooperation and the benefit they provide to the public goods.In this setup, the cost and benefit values of the reference player (player 2) are 1 and 2 units, respectively.Player 1 and player 3 differ from the reference player in opposite directions.For player 1, the cost and benefit are 1 + δ c and 2 + δ b , respectively, while for player 3, the cost and benefit are 1 − δ c and 2 − δ b , respectively.When this quantity is negative and low, we say that the dilemma is strong.In this case, choosing cooperation is strictly disadvantageous.When this quantity is positive and high, we say that the dilemma is weak.In this case, cooperation dominates defection.Typically, a linear public goods dilemma is defined to have a negative marginal gain.Here, the dilemma strength varies from −2 to 2. The results are shown for different values of selection strength, β = 1, 5 and 100.For high β, stationary distribution of the introspection dynamics reflects the rational play.In the long-run, players play the Nash equilibrium.When marginal gain is negative, defection is played with almost certainty (and vice-versa).For low β, however, some cooperation is possible even when the dilemma is strong The terms δ b and δ c represents the strength of asymmetry between the three players (a higher absolute value of δ indicating a bigger asymmetry).When the players only differ in their cost of cooperation (δ b = 0 and δ c = 0.5, Fig. 2a, left), their relative cooperation in the long-run reflects their relative ability to cooperate.The player with the lowest cooperation cost (player 3), cooperates with the highest probability (and vice-versa, Fig. 2a, right).Similarly, when players only differ in their ability to produce the public good (δ b = 1 and δ c = 0, Fig. 2b left), their relative cooperation in the long-run reflects the relative benefits they provide with their cooperation (Fig. 2b, right).In this example, if we consider that the reference player provides a benefit of 2 units and has a cost of 1 unit (in which case, defection always dominates cooperation for them), defection dominates cooperation for player 1 if and only if δ b < 1 + 3δ c and, for player 3, only when δ b > 3δ c − 1.These regions in the δ b − δ c parameter plane that correspond to defection dominating cooperation are circumscribed by white dashed lines in Fig. 2c.When players learn to play at high selection strength, β, their cooperation frequency in the long-run reflects the rational play (Fig. 2c).In the long-run, the average cooperation frequency of the group is low if the asymmetry in the benefit value is bounded as 3δ c − 1 < δ b < 3δ c + 1.This includes the case where players are symmetric (δ b = δ c = 0).A relatively high cooperation is only assured if players are aligned in their asymmetries (i.e., either δ b < 3δ c + 1 or δ b > 3δ c − 1).Or, in other words, if the player that has low cost of cooperation also provides a high benefit upon contribution, then cooperation is high in the long-run.

Games with Two Actions and Their Properties Under Introspection Dynamics
In the previous section, we studied the properties of additive games under introspection dynamics.In this section, we study games that are a) not necessarily additive and b) have only two actions for each player.First, we study the symmetric version of such a game.An N -player symmetric normal form game with two actions has the following properties: 1.All players have the same action set A := {C, D}.That is, 2. Players have the same payoff when they play the same action against the same composition of co-players.That is, for any i, j ∈ {1, 2, ..., N }, a ∈ A and b Since players are symmetric, states can again be enumerated by counting the number of C players in the state.We denote the payoff of a C and D player in a state where there are j co-players playing C by π C ( j) and π D ( j), respectively.We denote with f ( j) the payoff difference earned by switching from D to C when there are j co-players playing C, The stationary distribution of a 2−action symmetric game under introspection dynamics can be explicitly computed using the following proposition.
Proposition 4 When β is finite, the unique stationary distribution of introspection dynamics for an N −player symmetric normal form game with two actions, A = {C, D}, (u a ) a∈A N , is given by where f ( j) is defined as in Eq. (18) and C(a) is the number of cooperators in state a.The term is the normalization factor given by The number of unique states of the game can be reduced from 2 N to N + 1 due to symmetry.In the reduced state space, the state k corresponds to k players playing C and N − k players playing D.Then, Proposition 4 can be simply reformulated by relabelling the states as follows, Corollary 1 When β is finite, the unique stationary distribution, (u k ) k∈{0,1,...,N } , of introspection dynamics for an N −player symmetric normal form game with two actions, A = {C, D}, is given by where k represents the number of C players in the state and f ( j) is defined as in Eq. (18).The term is the normalization factor, given by The above corollary follows directly from Proposition 4. The key step is to count the number of states in the state space A N that corresponds to exactly k C players (and therefore N − k D players).This count is simply the binomal coefficient N k .In the next section, we use the example of a nonlinear public goods game to illustrate these results.

An Example of a Game with Two Actions: The General Public Goods Game
To study general public goods games, we adopt the framework of general social dilemmas from Hauert et al. [31].In the original paper, the authors propose a normal form game with symmetric players.The game's properties depend on a parameter w that determines the nature of the public good.The players have two actions: cooperation, C and defection, D. Here, we extend their framework to account for players with asymmetric payoffs.Before we explain the asymmetric setup, we describe the original model briefly.In the symmetric case, all N players have the same cost of cooperation c and they all generate the same benefit b for the public good.Unlike the linear public goods game, contributions to the public good are scaled by a factor that is determined by w and the number of cooperators in the group.The payoff of a defector and a cooperator in a group with k cooperators and N − k defectors is given by, The parameter w represents the nonlinearity of the public good.The game is linear when w = 1.Every cooperator's contribution is as valuable as the benefit that they can generate.When w < 1, the effective contribution of every additional cooperator goes down by a factor w (compared to the last cooperator).The public good is said to be discounting in this case.On the other hand, when w > 1, every new contribution is more valuable than the previous one.The public good is said to be synergistic in this case.For the symmetric case, the relationship between the cost-to-benefit ratio, cN /b, and the discount/synergy factor, w, determines the type of social dilemma arising from the game.In principle, this framework can produce generalizations of the prisoner's dilemma (D dominating C), the snowdrift game (coexistence between C and D), the stag-hunt game (no dominance but existence of an internal unstable equilibrium) and the harmony game (C dominating D) with respect to its evolutionary trajectories under the replicator dynamics.For more details, see Hauert et al. [31].Now, we describe our extension of the original model to account for asymmetric players.Here, for player i, the cost of cooperation is c i .The benefit that they can generate for the public good is b i .The benefit of cooperation generated by a player is either synergized (or discounted) by a factor depending on the number of cooperators already in the group and the synergy/discount factor, w (just like the original model).However, now, since players are asymmetric it is not entirely clear in which order the contributions of cooperators should be discounted (or synergized).For example, consider that there are 3 cooperators in the group: player p, q and r .The total benefit that they provide to the public good can be one of the six possibilities from x + yw + zw 2 , where x, y and z are permutations of b p , b q and b r .In this model, we assume that all such permutations are equally likely, and therefore, the expected benefit provided by all three of them is given by b(1 + w + w 2 ) where b = (b p + b q + b r )/3.
The complete state space of the game with asymmetric players is A = {C, D} N .The payoff of a defector in a state (D, a −i ) and that of a cooperator in state (C, a −i ) where a −i ∈ {C, D} N −1 are, respectively, given by where C(a, a −i ) counts the number of cooperators in state (a, a −i ) and α(.), as before, maps the actions C and D to 1 and 0, respectively.Note that the number of cooperators in the two states are related as: We are interested in studying the long-term stationary behavior of players in this game when they learn through introspection.We first discuss results from the symmetric public goods game and then discuss results for the game with asymmetric players.
To compute the stationary distribution of introspection dynamics in this game, we use Eq. ( 21).In our symmetric example, we consider that every player in an N −player game can generate a benefit b of value 2. Before exploring the c−w− N parameter space, we study four specific cases (with a 4−player game).In two of these cases, the public goods is discounted (w = 0.5, Fig. 3a left panels) and in two other cases, the public goods is synergistic (w = 1.5, Fig. 3a right panels).For each case, we consider two sub-cases: first, in which cost is high (c = 1, Fig. 3a top panels) and second, when cost is low (c = 0.2, Fig. 3a bottom panels).The four parameter combinations are chosen such that each of them corresponds to a unique social dilemma under the replicator dynamics.When selection strength is intermediate (β = 5), players sometimes play actions that are not optimal for the dilemma.For example, even when the parameters of the game make cooperation to be the dominated strategy (w = 0.5, c = 1), there is a single cooperator in the group in around 20% of the cases.When the parameters of the game reflect the stag-hunt dilemma (c = 1, w = 1.5), players are more likely to coordinate their actions in the long-run.The probabilities that the whole group plays C or D are higher than the probabilities that there is a group with a mixture of C and D players.In contrast, when the parameters reflect the snowdrift game (w = 0.5, c = 0.5), we get the opposite effect.In the long-run, mixed groups are more likely than homogeneous groups.Finally, when the parameters of the game make defection the dominated action (w = 1.5, c = 0.2), all players learn to cooperate in the long-run.
The average cooperation frequency of the group in the long-run is shown in the c − w and N −w parameter planes in Fig. 3b.First, let us consider the case when the group size is fixed at 4 players (the c−w plane in Fig. 3b).In that case, if the cost of cooperation is restrictively high, the average cooperation rate is negligible and does not change with the nature of the public good.In contrast, when the cost is not restrictively high, the discount/synergy parameter, w, determines the frequency with which players cooperate in the long-run.A higher w for the public good would result in higher cooperation (and vice-versa).Next, we consider the case where the cost of cooperation is fixed (the N − w plane in Fig. 3b).The cost is fixed to a value such that in a synergistic public good (w > 1), the cooperation frequency is almost 1 in the long-run for any group size.In this case, when the public good is discounted, group size N and the discounting factor w jointly determine the cooperation frequency in the long-run.In discounted public goods, cooperation rates fall with the increase in group size.2); left panels refer to a discounted public good (w = 0.5), and the right panels refer to a synergistic public good (w = 1.5).Each case is tagged with a symbol that places the particular case in the contour plot in panel (b).b On the left, the average cooperation frequency for varying discount/synergy factor, w, and varying cost of cooperation, c is shown.Cooperation is feasible when costs are not restrictively high and the public good is not too discounted.On the right, the average cooperation frequency for varying discount/synergy factor, w, and group size N .For this plot, the cost of cooperation for each player is c = 0.4.The feasibility of cooperation drops with larger group sizes when the public good is discounted.For all panels, b = 2 and β = 5 We also study introspection dynamics in this game with asymmetric players.We use the same setup that we used for studying the asymmetric linear public goods.The average frequency of cooperation per player is summarized in Supplementary Figs. 1 and 2. In Supplementary Fig. 1, we study two cases, a first in which the public good is synergistic and players have a high average cost, and a second in which public good is discounted and players have a lower average cost.In both of these cases, players cooperate highly when they simultaneously have low cost and high benefit.The only noticeable difference between the two cases is the minimum relation between the asymmetries δ b and δ c that results in high cooperation for the player with low cost and high benefit.When we observe individual cooperation frequency versus the synergy/discount factor, w (Supplementary Fig. 2), we find that when players are symmetric with respect to just benefits (or just costs), the one with the lowest cost (or highest benefit) cooperates with a high probability across all types of public goods, even for a high value of average cost.

Application: Introspection Learning in a Game with Cooperation and Rewards
In all the examples that we have studied so far, players can only choose between two actions (pure strategies).Introspection dynamics is particularly useful when players can use larger strategy sets.As such, in this section, we study the stationary behavior of players in the N −player, 16−strategy cooperation and rewarding game from Pal and Hilbe [57].In this game, there are two stages: in stage 1, players decide whether or not they contribute to a linear public good and in stage 2, they decide whether or not they reward their peers.When a player contributes to the public good, they pay a cost c i but generate a benefit worth r i c i that is equally shared by everyone.When a player rewards a peer, they provide them a benefit of ρ while incurring the cost of rewarding, γ i , to themselves.In between the stages, players get full information about the contribution of their peers.In the rewarding stage, players have four possible strategies: they can either reward all the peers who contributed (social rewarding), reward all the peers who defected (antisocial rewarding), reward all peers irrespective of contribution (always rewarding) or reward none of the peers (never rewarding).Before stage 1 commences, player i knows with some probability, λ i , the rewarding strategy of all their peers.In stage 1, players can have four possible strategies: they can either contribute or defect unconditionally or they can be conditional cooperators or conditional defectors.Conditional cooperators (or defectors) contribute (or do not contribute) when they have no information about their peers (which happens with probability 1 − λ i ).When a conditional player, i, knows the rewarding strategy of all their peers (which happens with probability λ i ) and finds that there are n SR social rewarders and n AR antisocial rewarders among their peers, they cooperate if and only if the marginal gain from rewards for choosing cooperation over defection outweighs the effective cost of cooperation.That is, Combining the two stages, players can use one of 16 possible strategies (4 in stage 1 and 4 in stage 2).In the simple case where players are identical, one can characterize the Nash equilibria of the game and identify the conditions which allow an equilibrium where all players contribute in the first stage and reward peers in second stage [57].In the symmetric case, full cooperation and rewarding is feasible in equilibrium when all players have sufficient information about each other and the reward benefit ρ is neither too high, nor too low.In this section, we study three simple cases of asymmetry between players to demonstrate how these asymmetric players may learn to play the game through introspection dynamics.The three specific examples that we show demonstrate that with introspection dynamics, asymmetric players can end up taking different roles in the long-run to produce the public good.To this end, we consider a 3−player game in which players 1 and 2 are identical but player 3 is asymmetric to them in some aspect.In each case, the asymmetric player either has a) a higher cost of rewarding γ 3 > γ 1 , b) low productivity r 3 < r 1 , or c) less information about peers λ 3 < λ 1 than their peers.We use Eq. ( 7) to exactly compute the expected abundances of the 16 strategies for each player.
In the case where player 3 is asymmetric with respect to their cost of rewarding, the longrun outcome of introspection reflects a division in labor between the players in producing the public good (Fig. 4a).The players to whom rewarding is less costly (player 1 and player 2), reward cooperation with a higher probability than to whom rewarding is very costly (player 3).In return, player 3 learns to respond by contributing with more probability than their co-players.With these specific parameters, one player takes up the role of providing the highest per-capita contribution while the others compensate with costly rewarding.When the asymmetric player differs only in their productivity, a different effect may appear in the longrun (Fig. 4b).In this case, the less productive player free-rides on the cooperation of their higher productive peers, but eventually reward the cooperation of their peers nonetheless.The asymmetric player free-rides but does not second-order free ride.The probability with which the less productive player rewards others in the long-run is slightly higher than the probability with which the contributing individuals reward each other.Finally, we consider the case where the asymmetric individual differs from others in terms of the information players have about others' rewarding strategy (Fig. 4c).In this case, the asymmetric player knows others' strategy with a considerably less chance than their peers.In the long-run, the asymmetric player cooperates less on average than their peers.This is because the asymmetric individual faces less instances where they can opportunistically cooperate with their coplayers.However, both types of player reward cooperation almost equally and just enough to sustain cooperation.

Discussion and Conclusion
We introduce introspection dynamics in N -player (a)symmetric games.In this learning model, at each time, one of the N players updates (or not) their strategy by comparing the payoffs of two strategies only: the one being currently played and a random prospective one.Clearly, this assumption implies a simple cognitive process.Players do not optimize over the entire set of strategies as, for example, in best-response models [13,25,36].One such model of particular interest due to its connections to introspection dynamics is the logit dynamics [2,7,13].In Appendix 1, we compare introspection dynamics and logit dynamics.We show that the two processes have equivalent stationary distributions for 2-strategy games, potential games and additive games.We also note that there are games for which the stationary distributions do not match.For example, we find coordination games with multiple Nash equilibria for which introspection dynamics and logit dynamics select a different equilibrium.Interestingly, whether one of the dynamics is better at selecting the higher payoff equilibrium in coordination games has no trivial answer and remains to be investigated.
Furthermore, although conceptually similar, our model is also simpler than typical reinforcement learning models.For example, while we only have selection strength as a parameter Fig. 4 Introspection dynamics in the linear public goods game with peer rewarding.Here, a game with three asymmetric players, each having 16 possible strategies, is studied.Players cooperate in a linear public goods and then reward each other in the next stage after everyone's contribution is revealed.In the first stage, players can condition their cooperation on the information they have about their co-players' rewarding strategies.For a full description of the model, please see the section on rewarding.In this example, players 1 and 2 are identical in all aspects while player 3 differs from them in only a single aspect.Here, Eq. ( 7) is used to plot the exact probability with which players cooperate and reward cooperation in the long-run.There are three types of asymmetry for player 3. a First, the case where player 3 has a high cost of rewarding compared to player 1 and 2, 0.7 = γ 3 > γ 1 = 0.1.b Then, the case where player 3 is less productive than their co-players, 1.2 = r 3 < r 1 = 2. c Finally, the case where player 3 has less information about co-players' rewarding strategies than the others, that is, 0.1 = λ 3 < λ 1 = 0.9.For all plots, a high value for the selection strength, β = 10, is considered.Unless otherwise mentioned, the following parameters are maintained for all panels: c i = 1 (individual cost of cooperation), r i = 2 (individual productivity), γ i = 0.1 (individual cost of rewarding), λ i = 0.9 (individual information about co-players' strategies).In panels (a) and (b), the reward value is ρ = 0.3 while for panel (c), the reward value ρ = 1 (apart from the payoffs), in Macy and Flache [40], there is a learning rate parameter (which could be comparable to our selection strength) but also an aspiration parameter which sets a payoff reference.In our model, the payoff reference is always the current one.All in all, while at each single time step individuals are restricted to reason over two strategies only, as they iterate this step over time, they are able to fully explore the whole set of strategies, in a trial-and-error fashion.
Importantly, our model is also much simpler computationally than the stochastic evolutionary game theory framework.While they both can involve solving the stationary distribution of a Markov process, they differ greatly in the state space size.Population models typically assume individuals play multiple games against (potentially all) other players in a population.As such, the state is defined by the number of players playing each strategy in the population(s).The number of states rapidly increases with the population size, the number of strategies, size of interaction and types of players (in the case of asymmetric games).One can see how the mathematical analysis of multiplayer asymmetric games can become cumbersome.To deal with this issue, previous models frequently resorted to additional approximations, like low mutation rate [21,85] and weak selection [88].On the contrary, in introspection dynamics, the states of the Markov process correspond to the outcome of a single (focal) game: for a N -player game, where player i has m i possible actions, there are states.This feature hugely reduces our state space size, which is key for obtaining exact results.
Here, we thus provide a general explicit formula, Eq. ( 7), that easily computes the stationary distribution of any multiplayer asymmetric game under introspection dynamics.Note that this formula is useful for the exploration of many-strategy games in the full range of selection strength.Additionally, we show that it is possible to obtain some analytical expressions for the long-run average strategy abundances.We start by analyzing the set of additive games, for which the gain from switching between two actions is constant, regardless of what co-players do.Due to this simple feature, additive games allow for the most general closeform expression for the stationary distribution (regarding the number of players, of strategies, and asymmetry of the game).We also find that for additive games, the joint distribution of strategies factorizes over the marginal distribution of strategies.For more general games, we provide the stationary distribution formula for 2-strategy, symmetric games.Finally, we study several examples of social dilemmas.From those, we see that, despite the differences to other models pointed out above, we recover some previous results qualitatively [31].We also conclude that players that have a lower cost or a higher benefit of cooperation learn to cooperate more frequently.
Introspection dynamics is rather broad in its scope.Here, we mainly focus on introducing a general framework.Still, we provide some examples to illustrate how it can be applied.Besides the generic public goods game, we study a 2-stage game, where players can choose among 16 strategies.There, individuals can reward their co-players condition on their previous cooperative (or not) behavior.Clearly, there are a number of ways in which our model can be further employed.For example, other researchers recently studied multiplayer games considering multiple games played concurrently [86], fluctuating environments [11], continuous strategies [48], or repeated interactions [34,87].Also, a number of previous works considered complex population structures [12,14,60,[65][66][67][68].As defined above, introspection dynamics does not consider a population of players, making it simple to work with.However, it could be equally applicable to population models.In that case, players would obtain average payoffs either from well-mixed or network-bounded interactions, as usual, but update their strategies introspectively.

Appendix 1: Comparison Between Introspection and Logit-Response Dynamics
In this section, we discuss similarities between introspection dynamics and the widely studied logit-response dynamics (or perturbed best-response dynamics) [2,13].In the logitresponse dynamics, at every time step, some set of players are chosen to update their strategies.The new strategy that a player adopts is drawn from a probability mass function over all of their strategies.The players construct this probability distribution by exponentially weighing the payoffs that they would receive if they switch to the new strategy (conditioned on all coplayers' strategies remaining fixed).Mathematically, the probability that player i switches to action a i ∈ A i when the co-players are currently playing a −i , is given by where the scalar β is akin to selection strength in the introspection dynamics.Note that while making comparisons between the two processes, we only consider a special case of the logit-response dynamics called the asynchronous learning logit-response dynamics [13].From here on, we refer to the asynchronous learning logit-response dynamics as simply logit-response dynamics.In this process, at every time step, exactly one player is uniform randomly chosen to update their strategy.The transition probability of going from state a to state b in the logit-response dynamics can thus be expressed as Throughout the rest of this section, we will refer to the transition matrix of the logitresponse dynamics with T LD and the transition matrix of the introspection dynamics, as given by Eq. ( 4), with T ID .We refer to the respective unique stationary distributions at finite β with u LD and u ID .Our first comparison between the two processes is for games with two actions.

Proposition 5
In games where m i = 2, for all i ∈ {1, 2, ..., N }, the transition matrix and the stationary distribution for introspection dynamics and logit-response dynamics are the same.That is, T LD = T ID and u LD = u ID .This proposition states that for games where every player has two available actions (strategies), the processes of introspection and logit-response are equivalent.
We also compare the two processes on two more classes of games-additive and potential games [49].We have already defined an additive game earlier.Monderer and Shapley [49] define potential games as any game where a scalar function φ : A → R exists such that for all i, a i , a i ∈ A i and a −i ∈ A −i , Here, the scalar function φ is called the potential of the game.Following this definition, one can see a potential game a game where it follows that where σ (a −i ) := π i (a , a −i ) − φ(a , a −i ), for a fixed a ∈ A i , is independent of a i .We show that, for potential games, introspection dynamics and logit-response dynamics have the same stationary distribution.

Proposition 6
For a potential game with potential φ, the stationary distributions of introspection dynamics (with a finite β) and logit-response dynamics (with the same finite parameter β) are the same and given by a ∈A e βφ (a )   . ( The above proposition states that both the processes (with the same finite β) lead to identical stationary distribution for any potential game.The proof of this proposition relies on proposition 1 from Alós-Ferrer and Netzer [2] where a closed-form expression of the stationary distribution of the logit-response dynamics for a potential game was provided.We obtain a similar result for additive games.
Proposition 7 For an N −player additive game, the stationary distributions of introspection dynamics (with a finite β) and logit-response dynamics (with the same finite β) are the same and given by where f j (a , a j ) is the co-player independent payoff difference given by Eq. (9).
To summarize, the two propositions above state that while the transition matrices of the two processes may be different from each other, the long-run stationary behavior of the players in potential and additive games are the same.

Appendix 2: Proofs
Proof of Proposition 1 Since β is finite, the stationary distribution u = (u a ) a∈A of the process is unique.The stationary distribution also satisfies the equalities in Eqs. ( 5) and (6).Before continuing through the remainder of the proof, we introduce some short-cut notation that we will be using: τ j,a j := 1 In order to show that the candidate stationary distribution, as proposed in Eq. ( 10) is the stationary distribution of the process, we need to that the following are true: Using our short-cut notation τ and the expression for our candidate stationary distribution in Eq. ( 10), we can express the stationary distribution as: Using this expression, the left-hand side of Eq. ( 36) can be simplified further with the steps: Using the above expressions and the expression for τ in Eq. ( 35), it can be shown that: After plugging the equality in Eq. ( 44) into Eq.( 41), we see that the left-hand side of Eq. ( 36) simplifies to u a .Now, to complete the proof we must check if Eq. ( 37) holds for our candidate distribution.Summing up the elements of the stationary distribution u a for all states a ∈ A: where q −1 , q −2 , ..., q −N are any arbitrary tuples from A −1 , A −2 , ..., A −N , respectively.The in the above expression can be taken out completely from the first sum.That is, Multiplying out the sums in the denominator of the above expression, we get The step from Eq. ( 48) to Eq. ( 49) involves multiplying out all the sums of exponents (where each term in the sum of exponents corresponds to payoff that player k receives by playing their actions against co-player composition, q −k ).Therefore, the stationary distribution sums up to 1.The candidate distribution we propose for the additive game is the unique stationary distribution of the process.

Proof of Proposition 2
Just like the previous proof, p −1 , p −2 , ..., p −N are any arbitrary tuples from A −1 , A −2 , ..., A −N , respectively.In the steps below, we always decompose the expression f j (a, b) to π j (a, p − j ) − π j (b, p − j ).When u = (u a ) a∈A is the unique stationary distribution of the N −player additive game under finite selection introspection dynamics, it is given by the closed form expression in Eq. (10).We use this expression to calculate the marginal distribution of actions played at a particular state a, (ξ j,a j ) j∈{1,2,...,N } .(54) The interchange of the sum and the product between the expressions in Eqs. ( 53) and ( 54) can be carried out by observing that when all the sums are multiplied out, one is left with sums of terms, each of which is a exponential with power equal to sum of payoffs that co-players of j (here k) receive when they play their respective strategies from q (that is q k ) against co-players that play p −k .This is similar to the step between Eqs. ( 48) and ( 49) in the proof of Proposition 2. Thus, Using the expression in Eq. ( 56), we can confirm that for additive games, the product of the marginals is the stationary distribution, Proof of Proposition 3 Since we have demonstrated that the linear public goods game is an additive game, the proof of this theorem can be performed by directly using Proposition 1.
Here, we provide an independent proof.The idea behind this proof is identical to the proof of Proposition 1. Again, since β is finite, the process will have a unique stationary distribution.Before continuing with the rest of the proof where we show that our candidate stationary distribution is the unique stationary distribution, we define the following short-cut notations for the ease of the proof: In addition, we introduce an indicator function α(.) which maps the action C to 1 and the action D to 0. That is α(C) := 1 and α(D) := 0. Using these notations and Eqs. ( 1) and ( 14) and utilizing our shortcut notation from above, we can write the probability that a player j updates to a j from ā j while their co-players play a − j as The candidate stationary distribution u given in Eq. ( 15) can be written down using our short-cut notation as stationary distribution must satisfy the following properties, which are also given in Eqs. ( 5) and ( 6): where terms in the right-hand side of Eq. ( 62) can be simplified using Eqs.( 1) and ( 4) as follows: T and additionally, using Eq. ( 61) the second term can be simplified to Now, using Eqs.( 64) and ( 67) one can show that the right-hand side of Eq. ( 62) is the element of the stationary distribution, corresponding to the state a, u a .Now, to complete the proof, we must show that Eq. ( 63) is also true for our candidate stationary distribution.This can be done by decomposing the sum of the elements of the stationary distribution as follows When the above decomposition is performed N − 1 more times, the sum of the right-hand side becomes 1.This proves that the candidate stationary distribution is also a probability distribution.

Proof of Proposition 4
By construction, the candidate stationary distribution given by Eqs.(19) and ( 20) is a probability distribution since it satisfies the condition in Eq. ( 6) and for any state a, u a is between 0 and 1.Again, since β is finite the process will have a unique stationary distribution.Again, to show that the candidate stationary distribution is the unique stationary distribution, we need to check if Eq. ( 5) holds.That is, the condition in Eq. ( 62) must hold for all states a.We re-introduce some notations that we will use in this proof: For this process, since there are only two actions, the first term in the right-hand side of Eq. ( 62) can be simplified as where the function sign(.) is defined as in Eq. ( 16) and f ( j) is the difference in payoffs between playing D and C when there are j co-players playing C. The term N k represents the number of co-players of k that play C in state a.That is, The second term in the right-hand side of Eq. ( 62) can be simplified as From Eq. ( 78) to Eq. ( 79), we took out one term from the product that is present in our candidate distribution.This term accounts for the k th players action in the neighboring state ( āk , a −k ) of a.For simplicity, we represent T ( āk ,a −k ),(a k ,a −k ) with just T in the next steps.We continue the simplification of Eq. ( 79) in the next steps by introducing terms that cancel each other, The newly introduced term in Eq. ( 80) can be taken inside the product.Note that this term is 1 if the k th player plays D in the state a.When this term is taken inside the product bracket, products of exponent e −β f ( j−1) can be performed for j ranging from 1 to the number of cooperators in state a, C(a).This product is then the candidate stationary distribution The fraction inside the sum in Eq. ( 83) can be simplified using the sign(.)function (in 16) leading to further simplification of Eq. ( 83): In Eq. ( 84), we can replace the element of the transition matrix T ( āk ,a −k ),(a k ,a −k ) with T ( āk ,a −k ),(a k ,a −k ) = 1 1 + e sign(a k )β f (N k ) .( Using the expression for the transition matrix element from Eq. ( 85) into Eq.( 84) and by using Eq. ( 75), we can simplify further: The final step in the previous simplification shows that Eq. ( 62) holds for any a ∈ {C, D} N .Therefore, the candidate distribution we propose in Eq. ( 19) is the unique stationary distribution of the symmetric N -player game with two strategies.

Proof of Corollary 1
To show this result, we count how many states are identical to a state a ∈ {C, D} N in a symmetric game.When players are symmetric in a two-strategy game, states can be enumerated by counting the number of C players in that state.This can also be confirmed by the expression of the stationary distribution in Eq. (19).Two distinct states a, a having the same number of cooperators (i.e., C(a ) = C(a)), have the same stationary distribution probability (i.e., u a = u a ).In a game with N players, there can be k players playing C in exactly N k ways.As argued before, all of these states are identical and are also equiprobable in the stationary distribution.Therefore, the stationary distribution probability of having k, C players, u k , is where the normalization factor can also be simplified as 1 + e β(π j (a j ,a − j )−π j (b j ,a − j )) (91) e j (b j ,a − j ) + e βπ j (a j ,a − j ) (92) a ∈A j e βπ j (a ,a − j ) (93) The above equality is sufficient to show that T LD = T ID (the other cases of the matrix 29 are trivial).Furthermore, when β is finite, both the processes will also have identical stationary distributions.That is, u LD = u ID .

Proof of Proposition 6
In this proof, we will use the result from proposition 1 in Alós-Ferrer and Netzer [2].There, it was shown that the stationary distribution of the logit-response dynamics for a potential game with potential φ (for a finite β) takes the form a ∈A e βφ (a )   . (95) We will use this stationary distribution as the candidate stationary distribution for introspection dynamics and show that u LD T ID = u LD .This is sufficient to prove that, with finite β, u LD = u ID .We will look at the value of the following expression for two arbitrary neighboring states, a and b (with j as the index of difference between the two states), The candidate u LD is indeed the unique stationary distribution of introspection dynamics too.Therefore, the two processes have the same stationary distribution.

Proof of Proposition 7
The idea behind the proof of this proposition is similar to the proof of Proposition 6.We consider the stationary distribution of introspection dynamics for additive games, u ID , from Eq. (10), to be the candidate stationary distribution of the logit-response dynamics.Then, showing that u ID T LD = u ID is equivalent to showing that u LD = u ID .Again, like in the previous proof, we look at the value of the following expression for two arbitrary neighboring states a and b (with j as the index of difference between the states), a∈A j e β f j (a,a j ) − e βπ j (a j ,a − j ) a∈A j e β f j (a,b j ) (107) The product of the first two terms, which we denote as L is strictly greater than 0. We focus on the rest of the expression.Since, f j (a, a j ) = π j (a, p) − π j (a j , p), is independent of p irrespective of the choice of p ∈ A − j , we use p = a − j in the steps below, = L • 1 a∈A j e βπ j (a,a − j ) e βπ j (b j ,a − j ) e βπ j (a j ,a − j ) − e βπ j (a j ,a − j ) e βπ j (b j ,a − j ) (109) = 0 (110) Now, following the exact same steps from Eq. (101) to Eq. (105), we can show that u ID T LD = u ID .Therefore, the two processes have the same stationary distribution, u ID = u LD .

Definition 2 (
Index of difference between neighboring states) If two states, a and b, satisfy a ∈ Neb(b), the index of difference between them, I(a, b), is the unique integer that satisfies a I(a,b) = b I(a,b) .

Fig. 1
Fig. 1 Introspection dynamics in a symmetric linear public goods game.Stationary distribution of the introspection dynamics for a linear public goods game with four identical players.For all the panels in this figure, the following parameters are used: N = 4 (group size), b = 2 (benefit provided to the public good upon cooperation), c = 1 (cost of cooperation).a Frequency of each state in the stationary distribution of introspection dynamics.As players are identical, each state can be defined by the number of cooperators.For a selection strength of β = 5, states with more cooperators are less likely than states with less cooperators.b Frequency of each state for varying selection strength, β.The color code is the same as panel (a).Comparing neutrality (β = 0) with low to intermediate β values, selection favors states other than 0 cooperators.Indeed, up to β ≈ 3, state 0 is not the most frequent state in the long-run.c Average cooperation frequency for varying dilemma strength depends on the selection strength, β.We use the marginal gain of choosing cooperation over defection, b/N − c, as a measure of the dilemma strength.When this quantity is negative and low, we say that the dilemma is strong.In this case, choosing cooperation is strictly disadvantageous.When this quantity is positive and high, we say that the dilemma is weak.In this case, cooperation dominates defection.Typically, a linear public goods dilemma is defined to have a negative marginal gain.Here, the dilemma strength varies from −2 to 2. The results are shown for different values of selection strength, β = 1, 5 and 100.For high β, stationary distribution of the introspection dynamics reflects the rational play.In the long-run, players play the Nash equilibrium.When marginal gain is negative, defection is played with almost certainty (and vice-versa).For low β, however, some cooperation is possible even when the dilemma is strong

Fig. 2
Fig. 2 Introspection dynamics in an asymmetric linear public goods game.Cooperation probabilities of the introspection dynamics for a linear public goods game with three asymmetric players.For each of the upper panels (a and b), we show the cost of cooperation and the benefit provided upon cooperation for the players on the left and the average cooperation frequency in the long-run on the right.In c, the asymmetry strengths between the players,δ c and δ b , varies simultaneously.Both average individual cooperation frequency and the overall average cooperation frequency in the long-run is shown.The reference player's cost and benefit are again 1 and 2 units, respectively.The area within the white dashed lines represents the parameter values for which the marginal gain of choosing cooperation over defection is negative, for each single player and, in the right-most panel, for all players simultaneously.In this example, cooperation is only feasible in the long-run if the asymmetries of players are aligned.That is, overall cooperation is high only when the individual with a low cost of cooperation has a high benefit value.For panels (a) and (b) selection strength is β = 2 while for panel (c), β = 5

Fig. 3
Fig.3Introspection dynamics in a symmetric general public goods game.Introspection dynamics in the general public goods game with 4 symmetric players, each having two possible actions-cooperation and defection.For a detailed description of the game, please see the main text.a The frequency of each state in the stationary distribution of introspection dynamics in four types of multiplayer social dilemmas display qualitatively different results.The upper panels refer to a high cost of cooperation (c = 1) while the bottom panels to a low cost of cooperation (c = 0.2); left panels refer to a discounted public good (w = 0.5), and the right panels refer to a synergistic public good (w = 1.5).Each case is tagged with a symbol that places the particular case in the contour plot in panel (b).b On the left, the average cooperation frequency for varying discount/synergy factor, w, and varying cost of cooperation, c is shown.Cooperation is feasible when costs are not restrictively high and the public good is not too discounted.On the right, the average cooperation frequency for varying discount/synergy factor, w, and group size N .For this plot, the cost of cooperation for each player is c = 0.4.The feasibility of cooperation drops with larger group sizes when the public good is discounted.For all panels, b = 2 and β = 5

= u a + 1 N 1 . ( 41 ) 1 1 1 1
I b →a I b • τ I b ,a I b − p a I b →b I b • τ I b ,b I b • 1 m I b − For an additive game, the expressions for p b I b →a I b and p a I b →b I b can be simply written as p b I b →a I b = + e β f I b (b I b ,a I b ) (42) p a I b →b I b = + e β f I b (a I b ,b I b ) .(43)

Proof of Proposition 5
We consider two states a and b such that a ∈ Neb(b).Let j = I(a, b) be the index of difference between these neighboring states.Taking into account that m i = 2, ∀i ∈ {1, 2, ..., N }, it follows that a j and b j are the only two actions in the action set of player j, A j .It can be shown that, →b j (a− j ) = 1 N • 1

= 1 Na= j 1 a
∈A j e βπ j (a ,a − j ) l ∈A l e β f l (a ,a l ) L>0 e βπ j (b j ,a − j )