1 Introduction

Social behavior has been studied extensively through pairwise interactions [37]. Despite their simplicity, they provide important insights, such as how populations can sustain cooperation [8, 50, 52]. Yet, many interesting collective behaviors occur when multiple individuals interact simultaneously [5, 6, 30, 34, 56, 58, 64, 77, 86]. Most of these situations cannot be captured by the sum of several pairwise interactions. Thus, to account for such nonlinearities, one needs to consider multiplayer games [30]. For example, a well-known effect that only emerges when more than two players are present is the “second-order free-riding problem" [20]. A natural solution to maintain pro-social behavior in a community is to monitor and punish defectors (and/or reward cooperators). However, most forms of sanctioning are considerably costly [33]. Therefore, an additional (second-order) dilemma emerges: individuals would like cooperation to be incentivized but they prefer that others pay the associated costs.

Another interesting effect that can be explored with multiplayer games is the scale or size of the interaction itself. In situations that require some sort of coordination and where expectations on others play an important role in one’s decisions, a growing group size might hinder the optimal outcome [77]. Likewise, it has been shown that it is hard to cooperate in large groups [34, 63, 71]. This is not, however, a general effect [28]. Additionally, group size can vary in a population of players. There, not only the average group size can have an important effect, but also the variance of the group size distribution [16, 61] and the group size distribution itself [62].

Complexity further increases when players differ significantly among themselves. This diversity can be captured by asymmetric games [26, 32, 35, 36, 44, 55, 73, 78, 85]. In symmetric games, all players are indistinguishable. Thus, to fully characterize the state of the game, we only require to know the number of players playing each strategy. Conversely, in asymmetric games, players can differ in their available actions and in their incentives to choose each action. Therefore, they can have uneven effects on others’ payoffs too. For example, in public goods games and collective-risk dilemmas, players can have different initial endowments (or wealth), productivities, costs, risk perceptions, or risk exposures [1, 32, 46, 47, 84, 87]. Hence, to fully describe the state of the game, we need to know the action of each player. This greatly increases the size of the game’s state space; even more so, for more than two players.

Models from evolutionary game theory (EGT) [37, 42, 43, 51] and learning theory [22, 40, 59, 70], have been widely used to study strategic behavior. The concept of evolutionary stable strategy, originally proposed for pairwise encounters [43], was extended to multiplayer games [15, 17, 58]. Also the well-known replicator equation [37, 79] can easily account for multiplayer games [19, 27, 31, 64]. More recently, the replicator-mutator equation was applied to study the dynamics of multiplayer games, too [41]. As for asymmetric games, a few additional assumptions are needed in the description of the model. For example, if there are two different types of players, typically, either there are two populations co-evolving (“bimatrix games" [29, 37, 81]) or there is a single population of players where each can play the two types or roles (“role games") [37]. The case of asymmetric games with more than two players is substantially less studied within deterministic EGT. Gokhale and Traulsen and Zhang et al. are two exceptions [29, 89]. Notably, although these works study multiplayer games, they consider, at most, two different types (drawn from two populations), which leaves out the exploration of full asymmetry. Also stochastic evolutionary game dynamics [54, 80] provides several models for studying multiplayer and asymmetric games. Fixation probabilities [23] in asymmetric 2-player [76], asymmetric 3-player games [74], and symmetric multiplayer games [27, 39] were recently derived. Furthermore, average strategy abundances [3, 4] were obtained only for \(2-\)player asymmetric games [55, 75] or multiplayer symmetric games [12, 28, 38]. For a review on evolutionary multiplayer games both in infinitely large populations and in finite populations, we refer to Gokhale and Traulsen [30]. Learning models (of strategic behavior) take a different approach from EGT [9, 10, 22, 24, 36, 40, 59, 70, 82]. There is no evolution of strategies in a population necessarily, but a process by which individuals learn strategies dynamically.

Introspection dynamics has recently proven to be a useful learning model for tackling (a)symmetric games [18, 32, 45, 69, 72]. In here, players update their strategies by exploring their own set of strategies in the following simple way: each time, after a round of the game, a random player considers a random alternative strategy; they compare the payoff that it would have given them to their current payoff; if the new strategy would provide a higher payoff, it is more likely adopted on the next round. We describe the model formally in the next section. While in Couto et al. [18] only \(2-\)player games were considered, this framework is general enough to account for multiple players. Particularly, introspection dynamics allows a natural exploration of full asymmetry in many-player games compared to population models. For example, in imitation dynamics, one needs to specify who is being imitated by whom [84]. When players differ, it might not make sense to assume that they imitate others. Introspection avoids this assumption because players’ decisions only depend on their own payoffs. An existing model that shares this same property is the logit-response dynamics or, simply, logit dynamics [2, 7, 13]. Unlike introspection dynamics, at each time, the randomly drawn player to update their strategy, can switch to any other strategy with a non-zero probability. This probability grows with the payoff provided by each strategy at the (possible) future state. The probability of switching functions of introspection dynamics and logit dynamics is similar in their exponential shape; hence, the two processes have some interesting connections.

Here, we extend previous results of pairwise games under introspection dynamics [18] to multiplayer games. First, we derive a formula that allows us to numerically compute the stationary distribution of introspection dynamics for any multiplayer asymmetric game. Second, we obtain explicit expressions of the stationary distribution for two special cases. These cases are additive games (where the payoff difference that a player gains by unilaterally switching to a different action is independent of the actions of their co-players), and symmetric multiplayer games with two strategies. To illustrate our theoretical results, we analyze various multiplayer asymmetric social dilemmas, extending the framework in [31] to asymmetric games. We also study the asymmetric version of a public goods game with a rewarding stage [57]. Finally, we compare introspection dynamics with logit dynamics, in the Appendix, where we show that the two processes have equivalent stationary distributions for some particular games (namely, 2-strategy, potential and additive games).

2 Model of Introspection Dynamics in Multiplayer Games

We consider a normal form game with \(N (\ge 2)\) players. In the game, a player, say player i, can play actions from their action set, \(\textbf{A}_i:= \{a_{i,1}, a_{i,2},..., a_{i,m_i} \}\). The action set of player i has \(m_i\) actions. In this model, players only use pure strategies. Therefore, there are finitely many states of the game. More precisely, there are exactly \(m_1 \times m_2 \times ... \times m_N\) states. We denote a state of the game by collecting the actions of all the players in a vector, \(\textbf{a}:= (a_1, a_2,..., a_N)\) where \(\textbf{a}\in \textbf{A}:= \textbf{A}_1 \times \textbf{A}_2 \times ... \times \textbf{A}_N\) and \(a_i \in \textbf{A}_i\). We also use the common notation, \(\textbf{a}:= (a_i, \textbf{a}_{-i})\) to denote the state from the perspective of player i. In the state \((a_i, \textbf{a}_{-i})\), player i plays the action \(a_i \in \textbf{A}_i\) and their co-players play the actions \(\textbf{a}_{-i} \in \textbf{A}_{-i}\) where \(\textbf{A}_{-i}\) is defined as \(\textbf{A}_{-i}:= \prod _{j \ne i} \textbf{A}_j\). The payoff of a player depends on the state of the game. We denote the payoff of player i in the state \(\textbf{a}\) with \(\pi _i(\textbf{a})\) or \(\pi _i(a_i, \textbf{a}_{-i})\). In this paper, we use bold font letters to denote vectors and matrices. We use the corresponding normal font letters with subscripts to denote elements of the vectors (or matrices). Since players only use pure strategies in this model, we use the terms strategies and actions interchangeably throughout the whole paper.

In this model, players update their strategies over time using introspection dynamics [18]. At every time step, one randomly chosen player can update their strategy. The randomly chosen player, say i, currently playing action \(a_{i,k}\), compares their current payoff to the payoff that they would obtain if they played a randomly selected action \(a_{i,l} \ne a_{i,k}\) from their action set \(\textbf{A}_i\). This comparison is done while assuming that the co-players do not change their respective actions. When the co-players of player i play \(\textbf{a}_{-i}\), player i changes from action \(a_{i,k}\) to the new action \(a_{i,l}\) in the next round with probability

$$\begin{aligned} p_{a_{i,k} \rightarrow a_{i,l}} (\textbf{a}_{-i})= \frac{1}{1 + e^{\displaystyle -\beta (\pi _i(a_{i,l}, \textbf{a}_{-i}) - \pi _i(a_{i,k}, \textbf{a}_{-i}))}} . \end{aligned}$$
(1)

Here \(\beta \in [0,\infty )\) is the selection strength parameter that represents the importance that players give to payoff differences while updating their actions. For \(\beta = 0\), players update to a randomly chosen strategy with probability 0.5. For \(\beta > 0\), players update to the alternative strategy under consideration with probability greater than 0.5 (or less than 0.5) if the switch gives them an increase (or decrease) in the payoff.

Introspection dynamics comprises a Markov chain and can be studied by analyzing the properties of the corresponding transition matrix \(\textbf{T}\). The transition matrix element \(\textrm{T}_{\textbf{a},\textbf{b}}\) denotes the conditional probability that the game goes to the state \(\textbf{b}\) in the next round if it is in state \(\textbf{a}\) in the current round. In order to formally define the transition matrix, we first need to introduce some notations and definitions. We start by defining the neighborhood set of state \(\textbf{a}\).

Definition 1

(Neighborhood set of a state) The neighborhood set of state \(\textbf{a}\), \(\textrm{Neb}(\textbf{a})\), is defined as:

$$\begin{aligned} \textrm{Neb}(\textbf{a}) := \{\textbf{b}\in \textbf{A}\big | \quad \exists j: b_{j} \ne a_{j} \wedge \textbf{b}_{-j} = \textbf{a}_{-j} \}. \end{aligned}$$
(2)

In other words, a state in \(\textrm{Neb}(\textbf{a})\) is a state that has exactly one player playing a different action than in state \(\textbf{a}\). For example, consider the game where there are three players and each player has the identical action set \(\{\textrm{C}, \textrm{D}\}\). The state \((\textrm{C},\textrm{C},\textrm{D})\) is in the neighborhood set of \((\textrm{C},\textrm{C},\textrm{C})\) whereas the state \((\textrm{C},\textrm{D},\textrm{D})\) is not. Two states that belong in each other’s neighborhood set only differ in exactly a single player’s action (and, we call this player as the index of difference between the neighboring states).

Definition 2

(Index of difference between neighboring states) If two states, \(\textbf{a}\) and \(\textbf{b}\), satisfy \(\textbf{a}\in \textrm{Neb}(\textbf{b})\), the index of difference between them, \(\textrm{I}(\textbf{a}, \textbf{b})\), is the unique integer that satisfies

$$\begin{aligned} a_{\textrm{I}(\textbf{a}, \textbf{b})} \ne b_{\textrm{I}(\textbf{a}, \textbf{b})}. \end{aligned}$$
(3)

In the previous example, the index of difference between the neighboring states \((\textrm{C},\textrm{C},\textrm{C})\) and \((\textrm{C},\textrm{C},\textrm{D})\) is 3. Using the above definitions, one can formally define the transition matrix of introspection dynamics by

$$\begin{aligned} \textrm{T}_{\textbf{a}, \textbf{b}} = {\left\{ \begin{array}{ll} \frac{1}{N(m_j-1)} \cdot p_{a_{j} \rightarrow b_{j}} (\textbf{a}_{-j}) \quad \quad &{}\text { if }\textbf{b}\in \textrm{Neb}(\textbf{a}) \quad \text {and,} \quad j = \textrm{I}(\textbf{a},\textbf{b})\\ \\ 0 \quad &{}\text { if } \textbf{b}\notin \textrm{Neb}(\textbf{a}) \\ \\ 1 - \sum _{\textbf{c}\ne \textbf{b}} \textrm{T}_{\textbf{a},\textbf{c}} \quad &{}\text { if } \textbf{a}= \textbf{b}\end{array}\right. }. \end{aligned}$$
(4)

The transition matrix is a row stochastic matrix (the sums of the rows are 1). This implies that the stationary distribution of \(\textbf{T}\), a left eigenvector of \(\textbf{T}\) corresponding to eigenvalue 1, always exists. We introduce a sufficient condition for the stationary distribution of \(\textbf{T}\) to be unique.

When the selection strength, \(\beta \), is finite, the transition matrix of introspection dynamics has a unique stationary distribution. A finite value of \(\beta \) results in a transition of non-zero probability between neighboring states. Since no state is isolated (i.e., every state belongs in the neighborhood set of another state) and there are only finitely many states of the game, every state is reachable in a finite number of steps from any starting point. The transition matrix, \(\textbf{T}\), is therefore primitive for a finite \(\beta \). By the Perron–Frobenius theorem, a primitive matrix, \(\textbf{T}\), will have a unique and strictly positive stationary distribution \(\textbf{u}:= (\textrm{u}_\textbf{a})_{\textbf{a}\in \textbf{A}}\) which satisfies the conditions:

$$\begin{aligned} \textbf{u}\textbf{T}&= \textbf{u}\end{aligned}$$
(5)
$$\begin{aligned} \textbf{u}\textbf{1}&= 1 \end{aligned}$$
(6)

where \(\textbf{1}\) is the column vector with the same size as \(\textbf{u}\) and has all elements equal to 1. For all the analytical results in this paper, we consider \(\beta \) to be finite so that stationary distributions of the processes are unique.

The above equations only present an implicit representation of the stationary distribution \(\textbf{u}\). The stationary distribution can be explicitly calculated by the following expression (which is derived using Eqs. 5 and 6),

$$\begin{aligned} \textbf{u}= {\textbf{1}}^\intercal ({\mathbbm {1}} + \textbf{U} - \textbf{T})^{-1} \end{aligned}$$
(7)

where \(\textbf{U}\) is a square matrix of the same size as \(\textbf{T}\) with all elements equal to 1 and \({\mathbbm {1}}\) is the identity matrix. The matrix \({\mathbbm {1}} + \textbf{U} - \textbf{T}\) is invertible when \(\textbf{T}\) is a primitive matrix [18]. Using Eq. (7), one can compute the unique stationary distribution of introspection dynamics (with a finite \(\beta \)) for any normal form game (with arbitrary number of asymmetric players and strategies).

The stationary distribution element \(\textrm{u}_\textbf{a}\) is the probability that the state \(\textbf{a}\) will be played by the players in the long-run. Using the stationary distribution, one can calculate the marginal probabilities corresponding to each player’s actions. That is, the probability that player i plays action \(a \in \textbf{A}_i\) in the long-run, \(\xi _{i,a}\), can be computed as

$$\begin{aligned} \mathbf {\xi }_{i,a} := \sum _{\textbf{q}\in \textbf{A}_{-i}} \textrm{u}_{(a, \textbf{q})}. \end{aligned}$$
(8)

3 Additive Games and Their Properties Under Introspection Dynamics

In this section, we discuss the stationary properties of introspection dynamics when players learn to play strategies in a special class of games: additive games. In an additive game, the payoff difference that a player earns by making a unilateral switch in their actions is independent of what their co-players play. In other words, if none of the co-players change their current actions, the payoff difference earned by making a switch in actions is only determined by the switch and not on the actions of the co-players’. Formally, in additive games, for any player i, any pair of actions \(x,y \in \textbf{A}_i\), and any \(\textbf{q}\in \textbf{A}_{-i}\),

$$\begin{aligned} \pi _i(x, \textbf{q}) - \pi _i(y, \textbf{q}) =: f_i(x,y) \end{aligned}$$
(9)

is independent of \(\textbf{q}\) and only dependent on x and y. In the literature, this property is sometimes called equal gains from switching [53, 83]. For games with this property, the stationary distribution of introspection dynamics takes a simple form.

Proposition 1

When \(\beta \) is finite, the unique stationary distribution, \(\textbf{u}= (\textrm{u}_\textbf{a})_{\textbf{a}\in \textbf{A}}\), of introspection dynamics for an \(N-\)player additive game is given by

$$\begin{aligned} \textrm{u}_\textbf{a}= \prod _{j=1}^N \frac{1}{\displaystyle \sum _{a' \in \textbf{A}_j} e^{\beta f_j(a', a_j)}} \end{aligned}$$
(10)

where \(f_j(a', a_j)\) is the co-player independent payoff difference given by Eq. (9).

For all proofs of Propositions and Corollaries, please see Appendix 2. Using the stationary distribution and Eq. (8), one can also exactly compute the cumulative probabilities with which players play their actions in the long-run (i.e., the marginal distributions). In this regard, introspection learning in additive games is particularly interesting. The stationary distribution and the marginal distributions of introspection dynamics in additive games are related in a special way.

Proposition 2

Let \(\textbf{u}= (\textrm{u}_\textbf{a})_{\textbf{a}\in \textbf{A}}\) be the unique stationary distribution of introspection dynamics with finite \(\beta \) for an \(N-\)player additive game. Then, \(\textrm{u}_\textbf{a}\) is the product of the marginal probabilities with which each player plays their respective actions in \(\textbf{a}\). That is,

$$\begin{aligned} \textrm{u}_\textbf{a}= \prod _{j = 1}^N \xi _{j,a_j}. \end{aligned}$$
(11)

For an N-player additive game, \(\xi _{j,a_j}\) is given by

$$\begin{aligned} \xi _{j,a_j} = \frac{1}{\displaystyle \sum _{a' \in \textbf{A}_j} e^{\beta f_j(a',a_j)}} \end{aligned}$$
(12)

where \(f_j(a', a_j)\) is the co-player independent payoff difference given by Eq. (9).

The above proposition states that for additive games, the stationary distribution of introspection dynamics can be factorized into its corresponding marginals. In the long-run, the probability that players play the state \(\textbf{a}= (a_1, a_2,...,a_N)\) is the product of the cumulative probabilities that player 1 plays \(a_1\), player 2 plays \(a_2\) and so on. This property of the additive game was already shown for the simple \(2-\)player, \(2-\)action donation game in Couto et al. [18]. Here, we extend that result for any additive game with arbitrary number of players, each having an arbitrary number of strategies. In the next section, we use the well-studied example of the linear public goods game (an additive game) to illustrate these results.

3.1 Example of an Additive Game: Linear Public Goods Game with 2 Actions

In the simplest version of the linear public goods game (LPGG) with N players, each player has two possible actions, to contribute (action \(\textrm{C}\), to cooperate), or to not contribute (action \(\textrm{D}\), to defect) to the public good. The players may differ in their cost of cooperation and the benefit they provide by contributing to the public good. We denote the cost of cooperation for player i and the benefit that they provide by \(c_i\) and \(b_i\), respectively. We define an indicator function \(\alpha (.)\) to map the action of cooperation to 1 and the action of defection to 0. That is, \(\alpha (\textrm{C}) = 1\) and \(\alpha (\textrm{D}) = 0\). The payoff of player i when the state of the game is \(\textbf{a}\) is given by

$$\begin{aligned} \pi _i(\textbf{a}) = \frac{1}{N}\sum _{j=1}^N \displaystyle \alpha (a_j) b_j - \alpha (a_i) c_i . \end{aligned}$$
(13)

The payoff difference that a player earns by unilaterally switching from \(\textrm{C}\) to \(\textrm{D}\) (or vice-versa) in the linear public goods game is independent of what the other players play in the game. That is, for every player i,

$$\begin{aligned} \pi _i(\textrm{D}, \textbf{q}) - \pi _i(\textrm{C}, \textbf{q}) = c_i - \frac{b_i}{N} =: f_i(\textrm{D}, \textrm{C}) \end{aligned}$$
(14)

is independent of co-players’ actions \(\textbf{q}\). The linear public goods game is therefore an example of an additive game. This property of the game results in easily identifiable dominated strategies. For player i, defection dominates cooperation when \(c_i > b_i/N\) while cooperation dominates defection when \(c_i < b_i/N\). Using Proposition 1, one can derive the closed-form expression for the stationary distribution of an \(N-\)player linear public goods game with two strategies.

Proposition 3

When \(\beta \) is finite, the unique stationary distribution of introspection dynamics for an \(N-\)player linear public goods game is given by

$$\begin{aligned} \textrm{u}_\textbf{a}= \prod _{j = 1}^{N} \frac{1}{1 + \displaystyle e^{\textrm{sign}(a_j)\beta f_j(\textrm{D}, \textrm{C})}} \end{aligned}$$
(15)

where

$$\begin{aligned} \textrm{sign}(a) = {\left\{ \begin{array}{ll} &{}1 \quad \text {if} \quad a = \textrm{C}\\ -&{}1 \quad \text {if} \quad a = \textrm{D}\end{array}\right. }. \end{aligned}$$
(16)

We use a simple example to illustrate the above result. Consider a \(3-\)player linear public goods game. All players provide a benefit of 2 units when they contribute to the public good (\(b_1 = b_2 = b_3 = 2\)). They differ, however, in their cost of cooperation. For player 1 and 2, the cost of cooperation is 1 unit (\(c_1 = c_2 = 1\)) while for the third player, the cost is 1.5 units (\(c_3 = 1.5\)). In the stationary distribution of the process with selection strength \(\beta = 1\), the cumulative probability that player 1 (or 2) cooperates and player 3 defects are \(\xi _{1,\textrm{C}} = \xi _{2,\textrm{C}}= 0.417\) and \(\xi _{3,\textrm{D}} = 0.697\), respectively. With the exact values, one can confirm the factorizing property of the stationary distribution for additive games in this example (i.e., Proposition 2). That is, \(\textrm{u}_\textrm{CCD} = 0.121 = \xi _{1,\textrm{C}} \cdot \xi _{2,\textrm{C}} \cdot \xi _{3,\textrm{D}}\).

We now use Eq. (15) to systematically analyze the LPGG under introspection dynamics. First, we study the simple case of \(4-\)player symmetric LPGG (the cost and benefit for all the 4 players are c and b). Since all players are identical, the states of the game can be enumerated by counting the number of cooperators in the state. There are only 5 distinct states of the game (from 0 to 4 cooperators). When the parameters of the game are such that defection dominates cooperation (\(b = 2, c = 1\), Fig. 1a), the stationary distribution of the process at high \(\beta \) indicates that in the long-run, states with higher number of cooperators are less likely than states with lower number of cooperators. However, for intermediate and low \(\beta \), stationary results are qualitatively different. Here, the state with 1 cooperator (or even 2 cooperators, depending on how small \(\beta \) is) is the most probable state in the long-run (Fig. 1b). Since every possible state is equiprobable in the limit of \(\beta \rightarrow 0\), the outcome with 2 cooperators is most likely only because there are more states with 2 cooperators than states with any other number of cooperators.

Naturally, \(\beta \) plays an important role in determining the overall cooperation in the long-run. When \(\beta \) is low, average cooperation varies weakly with the strength of the dilemma, \(b/N - c\) (Fig. 1c). Even when the temptation to defect is high (\(b/N - c = -2\)), players cooperate with a non-zero probability. Similarly, when cooperation is highly beneficial and strictly dominates defection (\(b/N - c = 2\)), players defect sometimes. At higher values of \(\beta \), the stationary behavior of players is more responsive to the payoffs and thus reflects an abrupt change near the parameters where the game transitions from defection-dominating to cooperation-dominating (\(b/N - c = 0\)).

Fig. 1
figure 1

Introspection dynamics in a symmetric linear public goods game. Stationary distribution of the introspection dynamics for a linear public goods game with four identical players. For all the panels in this figure, the following parameters are used: \(N = 4\) (group size), \(b = 2\) (benefit provided to the public good upon cooperation), \(c = 1\) (cost of cooperation). a Frequency of each state in the stationary distribution of introspection dynamics. As players are identical, each state can be defined by the number of cooperators. For a selection strength of \(\beta = 5\), states with more cooperators are less likely than states with less cooperators. b Frequency of each state for varying selection strength, \(\beta \). The color code is the same as panel (a). Comparing neutrality (\(\beta = 0\)) with low to intermediate \(\beta \) values, selection favors states other than 0 cooperators. Indeed, up to \(\beta \approx 3\), state 0 is not the most frequent state in the long-run. c Average cooperation frequency for varying dilemma strength depends on the selection strength, \(\beta \). We use the marginal gain of choosing cooperation over defection, \(b/N - c\), as a measure of the dilemma strength. When this quantity is negative and low, we say that the dilemma is strong. In this case, choosing cooperation is strictly disadvantageous. When this quantity is positive and high, we say that the dilemma is weak. In this case, cooperation dominates defection. Typically, a linear public goods dilemma is defined to have a negative marginal gain. Here, the dilemma strength varies from \(-2\) to 2. The results are shown for different values of selection strength, \(\beta = 1, 5\) and 100. For high \(\beta \), stationary distribution of the introspection dynamics reflects the rational play. In the long-run, players play the Nash equilibrium. When marginal gain is negative, defection is played with almost certainty (and vice-versa). For low \(\beta \), however, some cooperation is possible even when the dilemma is strong

Fig. 2
figure 2

Introspection dynamics in an asymmetric linear public goods game. Cooperation probabilities of the introspection dynamics for a linear public goods game with three asymmetric players. For each of the upper panels (a and b), we show the cost of cooperation and the benefit provided upon cooperation for the players on the left and the average cooperation frequency in the long-run on the right. In c, the asymmetry strengths between the players,\(\delta _c \) and \(\delta _b\), varies simultaneously. Both average individual cooperation frequency and the overall average cooperation frequency in the long-run is shown. The reference player’s cost and benefit are again 1 and 2 units, respectively. The area within the white dashed lines represents the parameter values for which the marginal gain of choosing cooperation over defection is negative, for each single player and, in the right-most panel, for all players simultaneously. In this example, cooperation is only feasible in the long-run if the asymmetries of players are aligned. That is, overall cooperation is high only when the individual with a low cost of cooperation has a high benefit value. For panels (a) and (b) selection strength is \(\beta = 2\) while for panel (c), \(\beta = 5\)

To study what effects might appear due to asymmetry in the LPGG, we consider the game with 3 asymmetric players. All the players can differ in their cost of cooperation and the benefit they provide to the public goods. In this setup, the cost and benefit values of the reference player (player 2) are 1 and 2 units, respectively. Player 1 and player 3 differ from the reference player in opposite directions. For player 1, the cost and benefit are \(1 + \delta _c\) and \(2 + \delta _b\), respectively, while for player 3, the cost and benefit are \(1 - \delta _c\) and \(2 - \delta _b\), respectively. The terms \(\delta _b\) and \(\delta _c\) represents the strength of asymmetry between the three players (a higher absolute value of \(\delta \) indicating a bigger asymmetry). When the players only differ in their cost of cooperation (\(\delta _b = 0\) and \(\delta _c = 0.5\), Fig. 2a, left), their relative cooperation in the long-run reflects their relative ability to cooperate. The player with the lowest cooperation cost (player 3), cooperates with the highest probability (and vice-versa, Fig. 2a, right). Similarly, when players only differ in their ability to produce the public good (\(\delta _b = 1\) and \(\delta _c = 0\), Fig. 2b left), their relative cooperation in the long-run reflects the relative benefits they provide with their cooperation (Fig. 2b, right). In this example, if we consider that the reference player provides a benefit of 2 units and has a cost of 1 unit (in which case, defection always dominates cooperation for them), defection dominates cooperation for player 1 if and only if \(\delta _b < 1 + 3\delta _c\) and, for player 3, only when \(\delta _b > 3\delta _c - 1\). These regions in the \(\delta _b-\delta _c\) parameter plane that correspond to defection dominating cooperation are circumscribed by white dashed lines in Fig. 2c. When players learn to play at high selection strength, \(\beta \), their cooperation frequency in the long-run reflects the rational play (Fig. 2c). In the long-run, the average cooperation frequency of the group is low if the asymmetry in the benefit value is bounded as \(3\delta _c - 1< \delta _b < 3\delta _c + 1\). This includes the case where players are symmetric (\(\delta _b = \delta _c = 0\)). A relatively high cooperation is only assured if players are aligned in their asymmetries (i.e., either \(\delta _b < 3\delta _c +1\) or \(\delta _b > 3\delta _c - 1\)). Or, in other words, if the player that has low cost of cooperation also provides a high benefit upon contribution, then cooperation is high in the long-run.

4 Games with Two Actions and Their Properties Under Introspection Dynamics

In the previous section, we studied the properties of additive games under introspection dynamics. In this section, we study games that are a) not necessarily additive and b) have only two actions for each player. First, we study the symmetric version of such a game. An N-player symmetric normal form game with two actions has the following properties:

  1. 1.

    All players have the same action set \(\mathcal {A}:= \{\textrm{C},\textrm{D}\}\). That is, \(\textbf{A}_1 = \textbf{A}_2 =... = \textbf{A}_N:= \mathcal {A}\).

  2. 2.

    Players have the same payoff when they play the same action against the same composition of co-players. That is, for any \(i,j \in \{1,2,...,N\}\), \(a \in \mathcal {A}\) and \(\textbf{b}\in \mathcal {A}^{N-1}\),

    $$\begin{aligned} \pi _i(a,\textbf{b}) = \pi _j(a,\textbf{b}). \end{aligned}$$
    (17)

Since players are symmetric, states can again be enumerated by counting the number of \(\textrm{C}\) players in the state. We denote the payoff of a \(\textrm{C}\) and \(\textrm{D}\) player in a state where there are j co-players playing \(\textrm{C}\) by \(\pi ^\textrm{C}(j)\) and \(\pi ^\textrm{D}(j)\), respectively. We denote with f(j) the payoff difference earned by switching from \(\textrm{D}\) to \(\textrm{C}\) when there are j co-players playing \(\textrm{C}\),

$$\begin{aligned} f(j) := \pi ^\textrm{D}(j) - \pi ^\textrm{C}(j). \end{aligned}$$
(18)

The stationary distribution of a \(2-\)action symmetric game under introspection dynamics can be explicitly computed using the following proposition.

Proposition 4

When \(\beta \) is finite, the unique stationary distribution of introspection dynamics for an \(N-\)player symmetric normal form game with two actions, \(\mathcal {A} = \{\textrm{C}, \textrm{D}\}\), \((\textrm{u}_{\textbf{a}})_{\textbf{a}\in \mathcal {A}^N}\), is given by

$$\begin{aligned} \textrm{u}_\textbf{a}= \frac{1}{\Gamma } \displaystyle \prod _{j=1}^{\mathcal {C}(\textbf{a})} \displaystyle e^{-\beta f(j-1)} \end{aligned}$$
(19)

where f(j) is defined as in Eq. (18) and \(\mathcal {C}(\textbf{a})\) is the number of cooperators in state \(\textbf{a}\). The term \(\Gamma \) is the normalization factor given by

$$\begin{aligned} \Gamma = \displaystyle \sum _{\textbf{a}' \in \mathcal {A}^N} \prod _{j = 1}^{\mathcal {C}(\textbf{a}')} \displaystyle e^{-\beta f(j-1)}. \end{aligned}$$
(20)

The number of unique states of the game can be reduced from \(2^N\) to \(N+1\) due to symmetry. In the reduced state space, the state k corresponds to k players playing \(\textrm{C}\) and \(N-k\) players playing \(\textrm{D}\). Then, Proposition 4 can be simply reformulated by relabelling the states as follows,

Corollary 1

When \(\beta \) is finite, the unique stationary distribution, \((\textrm{u}_k)_{k \in \{0,1,...,N\}}\), of introspection dynamics for an \(N-\)player symmetric normal form game with two actions, \(\mathcal {A} = \{\textrm{C}, \textrm{D}\}\), is given by

$$\begin{aligned} \textrm{u}_k = \frac{1}{\Gamma } \cdot {N \atopwithdelims ()k} \cdot \displaystyle \prod _{j=1}^{k} \displaystyle e^{-\beta f(j-1)} \end{aligned}$$
(21)

where k represents the number of \(\textrm{C}\) players in the state and f(j) is defined as in Eq. (18). The term \(\Gamma \) is the normalization factor, given by

$$\begin{aligned} \Gamma = \displaystyle \sum _{k=0}^N {N \atopwithdelims ()k} \cdot \displaystyle \prod _{j=1}^{k} \displaystyle e^{-\beta f(j-1)} . \end{aligned}$$
(22)

The above corollary follows directly from Proposition 4. The key step is to count the number of states in the state space \(\mathcal {A}^N\) that corresponds to exactly k \(\textrm{C}\) players (and therefore \(N-k\) \(\textrm{D}\) players). This count is simply the binomal coefficient \(N \atopwithdelims ()k\). In the next section, we use the example of a nonlinear public goods game to illustrate these results.

4.1 An Example of a Game with Two Actions: The General Public Goods Game

To study general public goods games, we adopt the framework of general social dilemmas from Hauert et al. [31]. In the original paper, the authors propose a normal form game with symmetric players. The game’s properties depend on a parameter w that determines the nature of the public good. The players have two actions: cooperation, \(\textrm{C}\) and defection, \(\textrm{D}\). Here, we extend their framework to account for players with asymmetric payoffs. Before we explain the asymmetric setup, we describe the original model briefly. In the symmetric case, all N players have the same cost of cooperation c and they all generate the same benefit b for the public good. Unlike the linear public goods game, contributions to the public good are scaled by a factor that is determined by w and the number of cooperators in the group. The payoff of a defector and a cooperator in a group with k cooperators and \(N-k\) defectors is given by,

$$\begin{aligned} \pi ^{\textrm{D}}(k)&= \frac{b}{N}\left( 1 + w + w^2 + \cdots + w^{k-1}\right) , \end{aligned}$$
(23)
$$\begin{aligned} \pi ^{\textrm{C}}(k)&= \pi ^{\textrm{D}}(k) - c . \end{aligned}$$
(24)

The parameter w represents the nonlinearity of the public good. The game is linear when \(w = 1\). Every cooperator’s contribution is as valuable as the benefit that they can generate. When \(w < 1\), the effective contribution of every additional cooperator goes down by a factor w (compared to the last cooperator). The public good is said to be discounting in this case. On the other hand, when \(w > 1\), every new contribution is more valuable than the previous one. The public good is said to be synergistic in this case. For the symmetric case, the relationship between the cost-to-benefit ratio, cN/b, and the discount/synergy factor, w, determines the type of social dilemma arising from the game. In principle, this framework can produce generalizations of the prisoner’s dilemma (\(\textrm{D}\) dominating \(\textrm{C}\)), the snowdrift game (coexistence between \(\textrm{C}\) and \(\textrm{D}\)), the stag-hunt game (no dominance but existence of an internal unstable equilibrium) and the harmony game (\(\textrm{C}\) dominating \(\textrm{D}\)) with respect to its evolutionary trajectories under the replicator dynamics. For more details, see Hauert et al. [31].

Now, we describe our extension of the original model to account for asymmetric players. Here, for player i, the cost of cooperation is \(c_i\). The benefit that they can generate for the public good is \(b_i\). The benefit of cooperation generated by a player is either synergized (or discounted) by a factor depending on the number of cooperators already in the group and the synergy/discount factor, w (just like the original model). However, now, since players are asymmetric it is not entirely clear in which order the contributions of cooperators should be discounted (or synergized). For example, consider that there are 3 cooperators in the group: player pq and r. The total benefit that they provide to the public good can be one of the six possibilities from \(x + y w + z w^2\), where xy and z are permutations of \(b_p, b_q\) and \(b_r\). In this model, we assume that all such permutations are equally likely, and therefore, the expected benefit provided by all three of them is given by \(\bar{b}(1 + w + w^2)\) where \(\bar{b} = (b_p + b_q + b_r)/3\).

The complete state space of the game with asymmetric players is \(\textbf{A}= \{\textrm{C},\textrm{D}\}^N\). The payoff of a defector in a state \((\textrm{D}, \textbf{a}_{-i})\) and that of a cooperator in state \((\textrm{C},\textbf{a}_{-i})\) where \(\textbf{a}_{-i} \in \{\textrm{C},\textrm{D}\}^{N-1}\) are, respectively, given by

$$\begin{aligned} \pi _i(\textrm{D}, \textbf{a}_{-i})&= {\left\{ \begin{array}{ll} \displaystyle \sum _{j=1}^N b_j \alpha (a_j) \cdot \frac{1}{N \cdot \mathcal {C}(\textrm{D},\textbf{a}_{-i})} \cdot \left( 1 + w + w^2 + \cdots w^{\mathcal {C}(\textrm{D},\textbf{a}_{-i}) - 1} \right) &{}\quad \text {if } \mathcal {C}(\textrm{D},\textbf{a}_{-i}) \ne 0 \\[15pt] 0 &{}\quad \text {if } \mathcal {C}(\textrm{D},\textbf{a}_{-i}) = 0 \end{array}\right. } \end{aligned}$$
(25)
$$\begin{aligned} \pi _i(\textrm{C}, \textbf{a}_{-i})&= \displaystyle \sum _{j=1}^N b_j \alpha (a_j) \cdot \frac{1}{N \cdot \mathcal {C}(\textrm{C},\textbf{a}_{-i})} \cdot \left( 1 + w + w^2 + \cdots w^{\mathcal {C}(\textrm{C},\textbf{a}_{-i}) - 1} \right) - c_i \end{aligned}$$
(26)

where \(\mathcal {C}(a,\textbf{a}_{-i})\) counts the number of cooperators in state \((a,\textbf{a}_{-i})\) and \(\alpha (.)\), as before, maps the actions \(\textrm{C}\) and \(\textrm{D}\) to 1 and 0, respectively. Note that the number of cooperators in the two states are related as: \(\mathcal {C}(\textrm{D},\textbf{a}_{-i}) = \mathcal {C}(\textrm{C},\textbf{a}_{-i}) - 1\). We are interested in studying the long-term stationary behavior of players in this game when they learn through introspection. We first discuss results from the symmetric public goods game and then discuss results for the game with asymmetric players.

To compute the stationary distribution of introspection dynamics in this game, we use Eq. (21). In our symmetric example, we consider that every player in an \(N-\)player game can generate a benefit b of value 2. Before exploring the \(c-w-N\) parameter space, we study four specific cases (with a \(4-\)player game). In two of these cases, the public goods is discounted (\(w = 0.5\), Fig. 3a left panels) and in two other cases, the public goods is synergistic (\(w = 1.5\), Fig. 3a right panels). For each case, we consider two sub-cases: first, in which cost is high (\(c = 1\), Fig. 3a top panels) and second, when cost is low (\(c = 0.2\), Fig. 3a bottom panels). The four parameter combinations are chosen such that each of them corresponds to a unique social dilemma under the replicator dynamics. When selection strength is intermediate (\(\beta = 5\)), players sometimes play actions that are not optimal for the dilemma. For example, even when the parameters of the game make cooperation to be the dominated strategy (\(w = 0.5, c = 1\)), there is a single cooperator in the group in around 20\(\%\) of the cases. When the parameters of the game reflect the stag-hunt dilemma (\(c = 1, w = 1.5\)), players are more likely to coordinate their actions in the long-run. The probabilities that the whole group plays \(\textrm{C}\) or \(\textrm{D}\) are higher than the probabilities that there is a group with a mixture of \(\textrm{C}\) and \(\textrm{D}\) players. In contrast, when the parameters reflect the snowdrift game (\(w = 0.5, c = 0.5\)), we get the opposite effect. In the long-run, mixed groups are more likely than homogeneous groups. Finally, when the parameters of the game make defection the dominated action (\(w = 1.5, c = 0.2\)), all players learn to cooperate in the long-run.

Fig. 3
figure 3

Introspection dynamics in a symmetric general public goods game. Introspection dynamics in the general public goods game with 4 symmetric players, each having two possible actions—cooperation and defection. For a detailed description of the game, please see the main text. a The frequency of each state in the stationary distribution of introspection dynamics in four types of multiplayer social dilemmas display qualitatively different results. The upper panels refer to a high cost of cooperation (\(c=1\)) while the bottom panels to a low cost of cooperation (\(c = 0.2\)); left panels refer to a discounted public good (\(w = 0.5\)), and the right panels refer to a synergistic public good (\(w = 1.5\)). Each case is tagged with a symbol that places the particular case in the contour plot in panel (b). b On the left, the average cooperation frequency for varying discount/synergy factor, w, and varying cost of cooperation, c is shown. Cooperation is feasible when costs are not restrictively high and the public good is not too discounted. On the right, the average cooperation frequency for varying discount/synergy factor, w, and group size N. For this plot, the cost of cooperation for each player is \(c = 0.4\). The feasibility of cooperation drops with larger group sizes when the public good is discounted. For all panels, \(b=2\) and \(\beta = 5\)

The average cooperation frequency of the group in the long-run is shown in the \(c-w\) and \(N-w\) parameter planes in Fig. 3b. First, let us consider the case when the group size is fixed at 4 players (the \(c-w\) plane in Fig. 3b). In that case, if the cost of cooperation is restrictively high, the average cooperation rate is negligible and does not change with the nature of the public good. In contrast, when the cost is not restrictively high, the discount/synergy parameter, w, determines the frequency with which players cooperate in the long-run. A higher w for the public good would result in higher cooperation (and vice-versa). Next, we consider the case where the cost of cooperation is fixed (the \(N-w\) plane in Fig. 3b). The cost is fixed to a value such that in a synergistic public good (\(w > 1\)), the cooperation frequency is almost 1 in the long-run for any group size. In this case, when the public good is discounted, group size N and the discounting factor w jointly determine the cooperation frequency in the long-run. In discounted public goods, cooperation rates fall with the increase in group size.

We also study introspection dynamics in this game with asymmetric players. We use the same setup that we used for studying the asymmetric linear public goods. The average frequency of cooperation per player is summarized in Supplementary Figs. 1 and 2. In Supplementary Fig. 1, we study two cases, a first in which the public good is synergistic and players have a high average cost, and a second in which public good is discounted and players have a lower average cost. In both of these cases, players cooperate highly when they simultaneously have low cost and high benefit. The only noticeable difference between the two cases is the minimum relation between the asymmetries \(\delta _b\) and \(\delta _c\) that results in high cooperation for the player with low cost and high benefit. When we observe individual cooperation frequency versus the synergy/discount factor, w (Supplementary Fig. 2), we find that when players are symmetric with respect to just benefits (or just costs), the one with the lowest cost (or highest benefit) cooperates with a high probability across all types of public goods, even for a high value of average cost.

5 Application: Introspection Learning in a Game with Cooperation and Rewards

In all the examples that we have studied so far, players can only choose between two actions (pure strategies). Introspection dynamics is particularly useful when players can use larger strategy sets. As such, in this section, we study the stationary behavior of players in the \(N-\)player, \(16-\)strategy cooperation and rewarding game from Pal and Hilbe [57]. In this game, there are two stages: in stage 1, players decide whether or not they contribute to a linear public good and in stage 2, they decide whether or not they reward their peers. When a player contributes to the public good, they pay a cost \(c_i\) but generate a benefit worth \(r_i c_i\) that is equally shared by everyone. When a player rewards a peer, they provide them a benefit of \(\rho \) while incurring the cost of rewarding, \(\gamma _i\), to themselves. In between the stages, players get full information about the contribution of their peers. In the rewarding stage, players have four possible strategies: they can either reward all the peers who contributed (social rewarding), reward all the peers who defected (antisocial rewarding), reward all peers irrespective of contribution (always rewarding) or reward none of the peers (never rewarding). Before stage 1 commences, player i knows with some probability, \(\lambda _i\), the rewarding strategy of all their peers. In stage 1, players can have four possible strategies: they can either contribute or defect unconditionally or they can be conditional cooperators or conditional defectors. Conditional cooperators (or defectors) contribute (or do not contribute) when they have no information about their peers (which happens with probability \(1 - \lambda _i\)). When a conditional player, i, knows the rewarding strategy of all their peers (which happens with probability \(\lambda _i\)) and finds that there are \(n_{\textrm{SR}}\) social rewarders and \(n_{\textrm{AR}}\) antisocial rewarders among their peers, they cooperate if and only if the marginal gain from rewards for choosing cooperation over defection outweighs the effective cost of cooperation. That is,

$$\begin{aligned} \rho (n_{\textrm{SR}} - n_{\textrm{AR}}) \ge c_i \left( 1 - \frac{r_i}{N} \right) . \end{aligned}$$
(27)

Combining the two stages, players can use one of 16 possible strategies (4 in stage 1 and 4 in stage 2). In the simple case where players are identical, one can characterize the Nash equilibria of the game and identify the conditions which allow an equilibrium where all players contribute in the first stage and reward peers in second stage [57]. In the symmetric case, full cooperation and rewarding is feasible in equilibrium when all players have sufficient information about each other and the reward benefit \(\rho \) is neither too high, nor too low. In this section, we study three simple cases of asymmetry between players to demonstrate how these asymmetric players may learn to play the game through introspection dynamics. The three specific examples that we show demonstrate that with introspection dynamics, asymmetric players can end up taking different roles in the long-run to produce the public good. To this end, we consider a \(3-\)player game in which players 1 and 2 are identical but player 3 is asymmetric to them in some aspect. In each case, the asymmetric player either has a) a higher cost of rewarding \(\gamma _3 > \gamma _1\), b) low productivity \(r_3 < r_1\), or c) less information about peers \(\lambda _3 < \lambda _1\) than their peers. We use Eq. (7) to exactly compute the expected abundances of the 16 strategies for each player.

In the case where player 3 is asymmetric with respect to their cost of rewarding, the long-run outcome of introspection reflects a division in labor between the players in producing the public good (Fig. 4a). The players to whom rewarding is less costly (player 1 and player 2), reward cooperation with a higher probability than to whom rewarding is very costly (player 3). In return, player 3 learns to respond by contributing with more probability than their co-players. With these specific parameters, one player takes up the role of providing the highest per-capita contribution while the others compensate with costly rewarding. When the asymmetric player differs only in their productivity, a different effect may appear in the long-run (Fig. 4b). In this case, the less productive player free-rides on the cooperation of their higher productive peers, but eventually reward the cooperation of their peers nonetheless. The asymmetric player free-rides but does not second-order free ride. The probability with which the less productive player rewards others in the long-run is slightly higher than the probability with which the contributing individuals reward each other. Finally, we consider the case where the asymmetric individual differs from others in terms of the information players have about others’ rewarding strategy (Fig. 4c). In this case, the asymmetric player knows others’ strategy with a considerably less chance than their peers. In the long-run, the asymmetric player cooperates less on average than their peers. This is because the asymmetric individual faces less instances where they can opportunistically cooperate with their co-players. However, both types of player reward cooperation almost equally and just enough to sustain cooperation.

Fig. 4
figure 4

Introspection dynamics in the linear public goods game with peer rewarding. Here, a game with three asymmetric players, each having 16 possible strategies, is studied. Players cooperate in a linear public goods and then reward each other in the next stage after everyone’s contribution is revealed. In the first stage, players can condition their cooperation on the information they have about their co-players’ rewarding strategies. For a full description of the model, please see the section on rewarding. In this example, players 1 and 2 are identical in all aspects while player 3 differs from them in only a single aspect. Here, Eq. (7) is used to plot the exact probability with which players cooperate and reward cooperation in the long-run. There are three types of asymmetry for player 3. a First, the case where player 3 has a high cost of rewarding compared to player 1 and 2, \(0.7 = \gamma _3 > \gamma _1 = 0.1\). b Then, the case where player 3 is less productive than their co-players, \(1.2 = r_3 < r_1 = 2\). c Finally, the case where player 3 has less information about co-players’ rewarding strategies than the others, that is, \(0.1 = \lambda _3 < \lambda _1 = 0.9\). For all plots, a high value for the selection strength, \(\beta = 10\), is considered. Unless otherwise mentioned, the following parameters are maintained for all panels: \(c_i = 1\) (individual cost of cooperation), \(r_i = 2\) (individual productivity), \(\gamma _i = 0.1\) (individual cost of rewarding), \(\lambda _i = 0.9\) (individual information about co-players’ strategies). In panels (a) and (b), the reward value is \(\rho =0.3\) while for panel (c), the reward value \(\rho = 1\)

6 Discussion and Conclusion

We introduce introspection dynamics in N-player (a)symmetric games. In this learning model, at each time, one of the N players updates (or not) their strategy by comparing the payoffs of two strategies only: the one being currently played and a random prospective one. Clearly, this assumption implies a simple cognitive process. Players do not optimize over the entire set of strategies as, for example, in best-response models [13, 25, 36]. One such model of particular interest due to its connections to introspection dynamics is the logit dynamics [2, 7, 13]. In Appendix 1, we compare introspection dynamics and logit dynamics. We show that the two processes have equivalent stationary distributions for 2-strategy games, potential games and additive games. We also note that there are games for which the stationary distributions do not match. For example, we find coordination games with multiple Nash equilibria for which introspection dynamics and logit dynamics select a different equilibrium. Interestingly, whether one of the dynamics is better at selecting the higher payoff equilibrium in coordination games has no trivial answer and remains to be investigated.

Furthermore, although conceptually similar, our model is also simpler than typical reinforcement learning models. For example, while we only have selection strength as a parameter (apart from the payoffs), in Macy and Flache [40], there is a learning rate parameter (which could be comparable to our selection strength) but also an aspiration parameter which sets a payoff reference. In our model, the payoff reference is always the current one. All in all, while at each single time step individuals are restricted to reason over two strategies only, as they iterate this step over time, they are able to fully explore the whole set of strategies, in a trial-and-error fashion.

Importantly, our model is also much simpler computationally than the stochastic evolutionary game theory framework. While they both can involve solving the stationary distribution of a Markov process, they differ greatly in the state space size. Population models typically assume individuals play multiple games against (potentially all) other players in a population. As such, the state is defined by the number of players playing each strategy in the population(s). The number of states rapidly increases with the population size, the number of strategies, size of interaction and types of players (in the case of asymmetric games). One can see how the mathematical analysis of multiplayer asymmetric games can become cumbersome. To deal with this issue, previous models frequently resorted to additional approximations, like low mutation rate [21, 85] and weak selection [88]. On the contrary, in introspection dynamics, the states of the Markov process correspond to the outcome of a single (focal) game: for a N-player game, where player i has \(m_i\) possible actions, there are \(m_1 \times m_2 \times \cdots \times m_N\) states. This feature hugely reduces our state space size, which is key for obtaining exact results.

Here, we thus provide a general explicit formula, Eq. (7), that easily computes the stationary distribution of any multiplayer asymmetric game under introspection dynamics. Note that this formula is useful for the exploration of many-strategy games in the full range of selection strength. Additionally, we show that it is possible to obtain some analytical expressions for the long-run average strategy abundances. We start by analyzing the set of additive games, for which the gain from switching between any two actions is constant, regardless of what co-players do. Due to this simple feature, additive games allow for the most general close-form expression for the stationary distribution (regarding the number of players, of strategies, and asymmetry of the game). We also find that for additive games, the joint distribution of strategies factorizes over the marginal distribution of strategies. For more general games, we provide the stationary distribution formula for 2-strategy, symmetric games. Finally, we study several examples of social dilemmas. From those, we see that, despite the differences to other models pointed out above, we recover some previous results qualitatively [31]. We also conclude that players that have a lower cost or a higher benefit of cooperation learn to cooperate more frequently.

Introspection dynamics is rather broad in its scope. Here, we mainly focus on introducing a general framework. Still, we provide some examples to illustrate how it can be applied. Besides the generic public goods game, we study a 2-stage game, where players can choose among 16 strategies. There, individuals can reward their co-players condition on their previous cooperative (or not) behavior. Clearly, there are a number of ways in which our model can be further employed. For example, other researchers recently studied multiplayer games considering multiple games played concurrently [86], fluctuating environments [11], continuous strategies [48], or repeated interactions [34, 87]. Also, a number of previous works considered complex population structures [12, 14, 60, 65,66,67,68]. As defined above, introspection dynamics does not consider a population of players, making it simple to work with. However, it could be equally applicable to population models. In that case, players would obtain average payoffs either from well-mixed or network-bounded interactions, as usual, but update their strategies introspectively.