Introspection Dynamics in Asymmetric Multiplayer Games

Couto, Marta C.; Pal, Saptarshi

doi:10.1007/s13235-023-00525-8

Introspection Dynamics in Asymmetric Multiplayer Games

Open access
Published: 15 September 2023

Volume 13, pages 1256–1285, (2023)
Cite this article

Download PDF

You have full access to this open access article

Dynamic Games and Applications Aims and scope Submit manuscript

Introspection Dynamics in Asymmetric Multiplayer Games

Download PDF

1353 Accesses
2 Citations
10 Altmetric
Explore all metrics

A Correction to this article was published on 17 October 2023

This article has been updated

Abstract

Evolutionary game theory and models of learning provide powerful frameworks to describe strategic decision-making in social interactions. In the simplest case, these models describe games among two identical players. However, many interactions in everyday life are more complex. They involve more than two players who may differ in their available actions and in their incentives to choose each action. Such interactions can be captured by asymmetric multiplayer games. Recently, introspection dynamics has been introduced to explore such asymmetric games. According to this dynamics, at each time step players compare their current strategy to an alternative strategy. If the alternative strategy results in a payoff advantage, it is more likely adopted. This model provides a simple way to compute the players’ long-run probability of adopting each of their strategies. In this paper, we extend some of the previous results of introspection dynamics for 2-player asymmetric games to games with arbitrarily many players. First, we derive a formula that allows us to numerically compute the stationary distribution of introspection dynamics for any multiplayer asymmetric game. Second, we obtain explicit expressions of the stationary distribution for two special cases. These cases are additive games (where the payoff difference that a player gains by unilaterally switching to a different action is independent of the actions of their co-players), and symmetric multiplayer games with two strategies. To illustrate our results, we revisit several classical games such as the public goods game.

Structure coefficients and strategy selection in multiplayer games

Article 05 April 2015

The modified stochastic game

Article 24 March 2018

Nonlinear and Multiplayer Evolutionary Games

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Social behavior has been studied extensively through pairwise interactions [37]. Despite their simplicity, they provide important insights, such as how populations can sustain cooperation [8, 50, 52]. Yet, many interesting collective behaviors occur when multiple individuals interact simultaneously [5, 6, 30, 34, 56, 58, 64, 77, 86]. Most of these situations cannot be captured by the sum of several pairwise interactions. Thus, to account for such nonlinearities, one needs to consider multiplayer games [30]. For example, a well-known effect that only emerges when more than two players are present is the “second-order free-riding problem" [20]. A natural solution to maintain pro-social behavior in a community is to monitor and punish defectors (and/or reward cooperators). However, most forms of sanctioning are considerably costly [33]. Therefore, an additional (second-order) dilemma emerges: individuals would like cooperation to be incentivized but they prefer that others pay the associated costs.

Another interesting effect that can be explored with multiplayer games is the scale or size of the interaction itself. In situations that require some sort of coordination and where expectations on others play an important role in one’s decisions, a growing group size might hinder the optimal outcome [77]. Likewise, it has been shown that it is hard to cooperate in large groups [34, 63, 71]. This is not, however, a general effect [28]. Additionally, group size can vary in a population of players. There, not only the average group size can have an important effect, but also the variance of the group size distribution [16, 61] and the group size distribution itself [62].

Complexity further increases when players differ significantly among themselves. This diversity can be captured by asymmetric games [26, 32, 35, 36, 44, 55, 73, 78, 85]. In symmetric games, all players are indistinguishable. Thus, to fully characterize the state of the game, we only require to know the number of players playing each strategy. Conversely, in asymmetric games, players can differ in their available actions and in their incentives to choose each action. Therefore, they can have uneven effects on others’ payoffs too. For example, in public goods games and collective-risk dilemmas, players can have different initial endowments (or wealth), productivities, costs, risk perceptions, or risk exposures [1, 32, 46, 47, 84, 87]. Hence, to fully describe the state of the game, we need to know the action of each player. This greatly increases the size of the game’s state space; even more so, for more than two players.

Models from evolutionary game theory (EGT) [37, 42, 43, 51] and learning theory [22, 40, 59, 70], have been widely used to study strategic behavior. The concept of evolutionary stable strategy, originally proposed for pairwise encounters [43], was extended to multiplayer games [15, 17, 58]. Also the well-known replicator equation [37, 79] can easily account for multiplayer games [19, 27, 31, 64]. More recently, the replicator-mutator equation was applied to study the dynamics of multiplayer games, too [41]. As for asymmetric games, a few additional assumptions are needed in the description of the model. For example, if there are two different types of players, typically, either there are two populations co-evolving (“bimatrix games" [29, 37, 81]) or there is a single population of players where each can play the two types or roles (“role games") [37]. The case of asymmetric games with more than two players is substantially less studied within deterministic EGT. Gokhale and Traulsen and Zhang et al. are two exceptions [29, 89]. Notably, although these works study multiplayer games, they consider, at most, two different types (drawn from two populations), which leaves out the exploration of full asymmetry. Also stochastic evolutionary game dynamics [54, 80] provides several models for studying multiplayer and asymmetric games. Fixation probabilities [23] in asymmetric 2-player [76], asymmetric 3-player games [74], and symmetric multiplayer games [27, 39] were recently derived. Furthermore, average strategy abundances [3, 4] were obtained only for $2-$player asymmetric games [55, 75] or multiplayer symmetric games [12, 28, 38]. For a review on evolutionary multiplayer games both in infinitely large populations and in finite populations, we refer to Gokhale and Traulsen [30]. Learning models (of strategic behavior) take a different approach from EGT [9, 10, 22, 24, 36, 40, 59, 70, 82]. There is no evolution of strategies in a population necessarily, but a process by which individuals learn strategies dynamically.

Introspection dynamics has recently proven to be a useful learning model for tackling (a)symmetric games [18, 32, 45, 69, 72]. In here, players update their strategies by exploring their own set of strategies in the following simple way: each time, after a round of the game, a random player considers a random alternative strategy; they compare the payoff that it would have given them to their current payoff; if the new strategy would provide a higher payoff, it is more likely adopted on the next round. We describe the model formally in the next section. While in Couto et al. [18] only $2-$player games were considered, this framework is general enough to account for multiple players. Particularly, introspection dynamics allows a natural exploration of full asymmetry in many-player games compared to population models. For example, in imitation dynamics, one needs to specify who is being imitated by whom [84]. When players differ, it might not make sense to assume that they imitate others. Introspection avoids this assumption because players’ decisions only depend on their own payoffs. An existing model that shares this same property is the logit-response dynamics or, simply, logit dynamics [2, 7, 13]. Unlike introspection dynamics, at each time, the randomly drawn player to update their strategy, can switch to any other strategy with a non-zero probability. This probability grows with the payoff provided by each strategy at the (possible) future state. The probability of switching functions of introspection dynamics and logit dynamics is similar in their exponential shape; hence, the two processes have some interesting connections.

Here, we extend previous results of pairwise games under introspection dynamics [18] to multiplayer games. First, we derive a formula that allows us to numerically compute the stationary distribution of introspection dynamics for any multiplayer asymmetric game. Second, we obtain explicit expressions of the stationary distribution for two special cases. These cases are additive games (where the payoff difference that a player gains by unilaterally switching to a different action is independent of the actions of their co-players), and symmetric multiplayer games with two strategies. To illustrate our theoretical results, we analyze various multiplayer asymmetric social dilemmas, extending the framework in [31] to asymmetric games. We also study the asymmetric version of a public goods game with a rewarding stage [57]. Finally, we compare introspection dynamics with logit dynamics, in the Appendix, where we show that the two processes have equivalent stationary distributions for some particular games (namely, 2-strategy, potential and additive games).

2 Model of Introspection Dynamics in Multiplayer Games

We consider a normal form game with $N (\ge 2)$ players. In the game, a player, say player i, can play actions from their action set, $\textbf{A}_i:= \{a_{i,1}, a_{i,2},..., a_{i,m_i} \}$. The action set of player i has $m_i$ actions. In this model, players only use pure strategies. Therefore, there are finitely many states of the game. More precisely, there are exactly $m_1 \times m_2 \times ... \times m_N$ states. We denote a state of the game by collecting the actions of all the players in a vector, $\textbf{a}:= (a_1, a_2,..., a_N)$ where $\textbf{a}\in \textbf{A}:= \textbf{A}_1 \times \textbf{A}_2 \times ... \times \textbf{A}_N$ and $a_i \in \textbf{A}_i$. We also use the common notation, $\textbf{a}:= (a_i, \textbf{a}_{-i})$ to denote the state from the perspective of player i. In the state $(a_i, \textbf{a}_{-i})$, player i plays the action $a_i \in \textbf{A}_i$ and their co-players play the actions $\textbf{a}_{-i} \in \textbf{A}_{-i}$ where $\textbf{A}_{-i}$ is defined as $\textbf{A}_{-i}:= \prod _{j \ne i} \textbf{A}_j$. The payoff of a player depends on the state of the game. We denote the payoff of player i in the state $\textbf{a}$ with $\pi _i(\textbf{a})$ or $\pi _i(a_i, \textbf{a}_{-i})$. In this paper, we use bold font letters to denote vectors and matrices. We use the corresponding normal font letters with subscripts to denote elements of the vectors (or matrices). Since players only use pure strategies in this model, we use the terms strategies and actions interchangeably throughout the whole paper.

In this model, players update their strategies over time using introspection dynamics [18]. At every time step, one randomly chosen player can update their strategy. The randomly chosen player, say i, currently playing action $a_{i,k}$, compares their current payoff to the payoff that they would obtain if they played a randomly selected action $a_{i,l} \ne a_{i,k}$ from their action set $\textbf{A}_i$. This comparison is done while assuming that the co-players do not change their respective actions. When the co-players of player i play $\textbf{a}_{-i}$, player i changes from action $a_{i,k}$ to the new action $a_{i,l}$ in the next round with probability

$$\begin{aligned} p_{a_{i,k} \rightarrow a_{i,l}} (\textbf{a}_{-i})= \frac{1}{1 + e^{\displaystyle -\beta (\pi _i(a_{i,l}, \textbf{a}_{-i}) - \pi _i(a_{i,k}, \textbf{a}_{-i}))}} . \end{aligned}$$

(1)

Here $\beta \in [0,\infty )$ is the selection strength parameter that represents the importance that players give to payoff differences while updating their actions. For $\beta = 0$, players update to a randomly chosen strategy with probability 0.5. For $\beta > 0$, players update to the alternative strategy under consideration with probability greater than 0.5 (or less than 0.5) if the switch gives them an increase (or decrease) in the payoff.

Introspection dynamics comprises a Markov chain and can be studied by analyzing the properties of the corresponding transition matrix $\textbf{T}$. The transition matrix element $\textrm{T}_{\textbf{a},\textbf{b}}$ denotes the conditional probability that the game goes to the state $\textbf{b}$ in the next round if it is in state $\textbf{a}$ in the current round. In order to formally define the transition matrix, we first need to introduce some notations and definitions. We start by defining the neighborhood set of state $\textbf{a}$.

Definition 1

(Neighborhood set of a state) The neighborhood set of state $\textbf{a}$, $\textrm{Neb}(\textbf{a})$, is defined as:

$$\begin{aligned} \textrm{Neb}(\textbf{a}) := \{\textbf{b}\in \textbf{A}\big | \quad \exists j: b_{j} \ne a_{j} \wedge \textbf{b}_{-j} = \textbf{a}_{-j} \}. \end{aligned}$$

(2)

In other words, a state in $\textrm{Neb}(\textbf{a})$ is a state that has exactly one player playing a different action than in state $\textbf{a}$. For example, consider the game where there are three players and each player has the identical action set $\{\textrm{C}, \textrm{D}\}$. The state $(\textrm{C},\textrm{C},\textrm{D})$ is in the neighborhood set of $(\textrm{C},\textrm{C},\textrm{C})$ whereas the state $(\textrm{C},\textrm{D},\textrm{D})$ is not. Two states that belong in each other’s neighborhood set only differ in exactly a single player’s action (and, we call this player as the index of difference between the neighboring states).

Definition 2

(Index of difference between neighboring states) If two states, $\textbf{a}$ and $\textbf{b}$, satisfy $\textbf{a}\in \textrm{Neb}(\textbf{b})$, the index of difference between them, $\textrm{I}(\textbf{a}, \textbf{b})$, is the unique integer that satisfies

$$\begin{aligned} a_{\textrm{I}(\textbf{a}, \textbf{b})} \ne b_{\textrm{I}(\textbf{a}, \textbf{b})}. \end{aligned}$$

(3)

In the previous example, the index of difference between the neighboring states $(\textrm{C},\textrm{C},\textrm{C})$ and $(\textrm{C},\textrm{C},\textrm{D})$ is 3. Using the above definitions, one can formally define the transition matrix of introspection dynamics by

$$\begin{aligned} \textrm{T}_{\textbf{a}, \textbf{b}} = {\left\{ \begin{array}{ll} \frac{1}{N(m_j-1)} \cdot p_{a_{j} \rightarrow b_{j}} (\textbf{a}_{-j}) \quad \quad &{}\text { if }\textbf{b}\in \textrm{Neb}(\textbf{a}) \quad \text {and,} \quad j = \textrm{I}(\textbf{a},\textbf{b})\\ \\ 0 \quad &{}\text { if } \textbf{b}\notin \textrm{Neb}(\textbf{a}) \\ \\ 1 - \sum _{\textbf{c}\ne \textbf{b}} \textrm{T}_{\textbf{a},\textbf{c}} \quad &{}\text { if } \textbf{a}= \textbf{b}\end{array}\right. }. \end{aligned}$$

(4)

The transition matrix is a row stochastic matrix (the sums of the rows are 1). This implies that the stationary distribution of $\textbf{T}$, a left eigenvector of $\textbf{T}$ corresponding to eigenvalue 1, always exists. We introduce a sufficient condition for the stationary distribution of $\textbf{T}$ to be unique.

When the selection strength, $\beta $, is finite, the transition matrix of introspection dynamics has a unique stationary distribution. A finite value of $\beta $ results in a transition of non-zero probability between neighboring states. Since no state is isolated (i.e., every state belongs in the neighborhood set of another state) and there are only finitely many states of the game, every state is reachable in a finite number of steps from any starting point. The transition matrix, $\textbf{T}$, is therefore primitive for a finite $\beta $. By the Perron–Frobenius theorem, a primitive matrix, $\textbf{T}$, will have a unique and strictly positive stationary distribution $\textbf{u}:= (\textrm{u}_\textbf{a})_{\textbf{a}\in \textbf{A}}$ which satisfies the conditions:

$$\begin{aligned} \textbf{u}\textbf{T}&= \textbf{u}\end{aligned}$$

(5)

$$\begin{aligned} \textbf{u}\textbf{1}&= 1 \end{aligned}$$

(6)

where $\textbf{1}$ is the column vector with the same size as $\textbf{u}$ and has all elements equal to 1. For all the analytical results in this paper, we consider $\beta $ to be finite so that stationary distributions of the processes are unique.

The above equations only present an implicit representation of the stationary distribution $\textbf{u}$. The stationary distribution can be explicitly calculated by the following expression (which is derived using Eqs. 5 and 6),

$$\begin{aligned} \textbf{u}= {\textbf{1}}^\intercal ({\mathbbm {1}} + \textbf{U} - \textbf{T})^{-1} \end{aligned}$$

(7)

where $\textbf{U}$ is a square matrix of the same size as $\textbf{T}$ with all elements equal to 1 and ${\mathbbm {1}}$ is the identity matrix. The matrix ${\mathbbm {1}} + \textbf{U} - \textbf{T}$ is invertible when $\textbf{T}$ is a primitive matrix [18]. Using Eq. (7), one can compute the unique stationary distribution of introspection dynamics (with a finite $\beta $) for any normal form game (with arbitrary number of asymmetric players and strategies).

The stationary distribution element $\textrm{u}_\textbf{a}$ is the probability that the state $\textbf{a}$ will be played by the players in the long-run. Using the stationary distribution, one can calculate the marginal probabilities corresponding to each player’s actions. That is, the probability that player i plays action $a \in \textbf{A}_i$ in the long-run, $\xi _{i,a}$, can be computed as

$$\begin{aligned} \mathbf {\xi }_{i,a} := \sum _{\textbf{q}\in \textbf{A}_{-i}} \textrm{u}_{(a, \textbf{q})}. \end{aligned}$$

(8)

3 Additive Games and Their Properties Under Introspection Dynamics

In this section, we discuss the stationary properties of introspection dynamics when players learn to play strategies in a special class of games: additive games. In an additive game, the payoff difference that a player earns by making a unilateral switch in their actions is independent of what their co-players play. In other words, if none of the co-players change their current actions, the payoff difference earned by making a switch in actions is only determined by the switch and not on the actions of the co-players’. Formally, in additive games, for any player i, any pair of actions $x,y \in \textbf{A}_i$, and any $\textbf{q}\in \textbf{A}_{-i}$,

$$\begin{aligned} \pi _i(x, \textbf{q}) - \pi _i(y, \textbf{q}) =: f_i(x,y) \end{aligned}$$

(9)

is independent of $\textbf{q}$ and only dependent on x and y. In the literature, this property is sometimes called equal gains from switching [53, 83]. For games with this property, the stationary distribution of introspection dynamics takes a simple form.

Proposition 1

When $\beta $ is finite, the unique stationary distribution, $\textbf{u}= (\textrm{u}_\textbf{a})_{\textbf{a}\in \textbf{A}}$, of introspection dynamics for an $N-$player additive game is given by

$$\begin{aligned} \textrm{u}_\textbf{a}= \prod _{j=1}^N \frac{1}{\displaystyle \sum _{a' \in \textbf{A}_j} e^{\beta f_j(a', a_j)}} \end{aligned}$$

(10)

where $f_j(a', a_j)$ is the co-player independent payoff difference given by Eq. (9).

For all proofs of Propositions and Corollaries, please see Appendix 2. Using the stationary distribution and Eq. (8), one can also exactly compute the cumulative probabilities with which players play their actions in the long-run (i.e., the marginal distributions). In this regard, introspection learning in additive games is particularly interesting. The stationary distribution and the marginal distributions of introspection dynamics in additive games are related in a special way.

Proposition 2

Let $\textbf{u}= (\textrm{u}_\textbf{a})_{\textbf{a}\in \textbf{A}}$ be the unique stationary distribution of introspection dynamics with finite $\beta $ for an $N-$player additive game. Then, $\textrm{u}_\textbf{a}$ is the product of the marginal probabilities with which each player plays their respective actions in $\textbf{a}$. That is,

$$\begin{aligned} \textrm{u}_\textbf{a}= \prod _{j = 1}^N \xi _{j,a_j}. \end{aligned}$$

(11)

For an N-player additive game, $\xi _{j,a_j}$ is given by

$$\begin{aligned} \xi _{j,a_j} = \frac{1}{\displaystyle \sum _{a' \in \textbf{A}_j} e^{\beta f_j(a',a_j)}} \end{aligned}$$

(12)

where $f_j(a', a_j)$ is the co-player independent payoff difference given by Eq. (9).

The above proposition states that for additive games, the stationary distribution of introspection dynamics can be factorized into its corresponding marginals. In the long-run, the probability that players play the state $\textbf{a}= (a_1, a_2,...,a_N)$ is the product of the cumulative probabilities that player 1 plays $a_1$, player 2 plays $a_2$ and so on. This property of the additive game was already shown for the simple $2-$player, $2-$action donation game in Couto et al. [18]. Here, we extend that result for any additive game with arbitrary number of players, each having an arbitrary number of strategies. In the next section, we use the well-studied example of the linear public goods game (an additive game) to illustrate these results.

3.1 Example of an Additive Game: Linear Public Goods Game with 2 Actions

In the simplest version of the linear public goods game (LPGG) with N players, each player has two possible actions, to contribute (action $\textrm{C}$, to cooperate), or to not contribute (action $\textrm{D}$, to defect) to the public good. The players may differ in their cost of cooperation and the benefit they provide by contributing to the public good. We denote the cost of cooperation for player i and the benefit that they provide by $c_i$ and $b_i$, respectively. We define an indicator function $\alpha (.)$ to map the action of cooperation to 1 and the action of defection to 0. That is, $\alpha (\textrm{C}) = 1$ and $\alpha (\textrm{D}) = 0$. The payoff of player i when the state of the game is $\textbf{a}$ is given by

$$\begin{aligned} \pi _i(\textbf{a}) = \frac{1}{N}\sum _{j=1}^N \displaystyle \alpha (a_j) b_j - \alpha (a_i) c_i . \end{aligned}$$

(13)

The payoff difference that a player earns by unilaterally switching from $\textrm{C}$ to $\textrm{D}$ (or vice-versa) in the linear public goods game is independent of what the other players play in the game. That is, for every player i,

$$\begin{aligned} \pi _i(\textrm{D}, \textbf{q}) - \pi _i(\textrm{C}, \textbf{q}) = c_i - \frac{b_i}{N} =: f_i(\textrm{D}, \textrm{C}) \end{aligned}$$

(14)

is independent of co-players’ actions $\textbf{q}$. The linear public goods game is therefore an example of an additive game. This property of the game results in easily identifiable dominated strategies. For player i, defection dominates cooperation when $c_i > b_i/N$ while cooperation dominates defection when $c_i < b_i/N$. Using Proposition 1, one can derive the closed-form expression for the stationary distribution of an $N-$player linear public goods game with two strategies.

Proposition 3

When $\beta $ is finite, the unique stationary distribution of introspection dynamics for an $N-$player linear public goods game is given by

$$\begin{aligned} \textrm{u}_\textbf{a}= \prod _{j = 1}^{N} \frac{1}{1 + \displaystyle e^{\textrm{sign}(a_j)\beta f_j(\textrm{D}, \textrm{C})}} \end{aligned}$$

(15)

where

$$\begin{aligned} \textrm{sign}(a) = {\left\{ \begin{array}{ll} &{}1 \quad \text {if} \quad a = \textrm{C}\\ -&{}1 \quad \text {if} \quad a = \textrm{D}\end{array}\right. }. \end{aligned}$$

(16)

We use a simple example to illustrate the above result. Consider a $3-$player linear public goods game. All players provide a benefit of 2 units when they contribute to the public good ($b_1 = b_2 = b_3 = 2$). They differ, however, in their cost of cooperation. For player 1 and 2, the cost of cooperation is 1 unit ($c_1 = c_2 = 1$) while for the third player, the cost is 1.5 units ($c_3 = 1.5$). In the stationary distribution of the process with selection strength $\beta = 1$, the cumulative probability that player 1 (or 2) cooperates and player 3 defects are $\xi _{1,\textrm{C}} = \xi _{2,\textrm{C}}= 0.417$ and $\xi _{3,\textrm{D}} = 0.697$, respectively. With the exact values, one can confirm the factorizing property of the stationary distribution for additive games in this example (i.e., Proposition 2). That is, $\textrm{u}_\textrm{CCD} = 0.121 = \xi _{1,\textrm{C}} \cdot \xi _{2,\textrm{C}} \cdot \xi _{3,\textrm{D}}$.

We now use Eq. (15) to systematically analyze the LPGG under introspection dynamics. First, we study the simple case of $4-$player symmetric LPGG (the cost and benefit for all the 4 players are c and b). Since all players are identical, the states of the game can be enumerated by counting the number of cooperators in the state. There are only 5 distinct states of the game (from 0 to 4 cooperators). When the parameters of the game are such that defection dominates cooperation ($b = 2, c = 1$, Fig. 1a), the stationary distribution of the process at high $\beta $ indicates that in the long-run, states with higher number of cooperators are less likely than states with lower number of cooperators. However, for intermediate and low $\beta $, stationary results are qualitatively different. Here, the state with 1 cooperator (or even 2 cooperators, depending on how small $\beta $ is) is the most probable state in the long-run (Fig. 1b). Since every possible state is equiprobable in the limit of $\beta \rightarrow 0$, the outcome with 2 cooperators is most likely only because there are more states with 2 cooperators than states with any other number of cooperators.

Naturally, $\beta $ plays an important role in determining the overall cooperation in the long-run. When $\beta $ is low, average cooperation varies weakly with the strength of the dilemma, $b/N - c$ (Fig. 1c). Even when the temptation to defect is high ($b/N - c = -2$), players cooperate with a non-zero probability. Similarly, when cooperation is highly beneficial and strictly dominates defection ($b/N - c = 2$), players defect sometimes. At higher values of $\beta $, the stationary behavior of players is more responsive to the payoffs and thus reflects an abrupt change near the parameters where the game transitions from defection-dominating to cooperation-dominating ($b/N - c = 0$).

To study what effects might appear due to asymmetry in the LPGG, we consider the game with 3 asymmetric players. All the players can differ in their cost of cooperation and the benefit they provide to the public goods. In this setup, the cost and benefit values of the reference player (player 2) are 1 and 2 units, respectively. Player 1 and player 3 differ from the reference player in opposite directions. For player 1, the cost and benefit are $1 + \delta _c$ and $2 + \delta _b$, respectively, while for player 3, the cost and benefit are $1 - \delta _c$ and $2 - \delta _b$, respectively. The terms $\delta _b$ and $\delta _c$ represents the strength of asymmetry between the three players (a higher absolute value of $\delta $ indicating a bigger asymmetry). When the players only differ in their cost of cooperation ($\delta _b = 0$ and $\delta _c = 0.5$, Fig. 2a, left), their relative cooperation in the long-run reflects their relative ability to cooperate. The player with the lowest cooperation cost (player 3), cooperates with the highest probability (and vice-versa, Fig. 2a, right). Similarly, when players only differ in their ability to produce the public good ($\delta _b = 1$ and $\delta _c = 0$, Fig. 2b left), their relative cooperation in the long-run reflects the relative benefits they provide with their cooperation (Fig. 2b, right). In this example, if we consider that the reference player provides a benefit of 2 units and has a cost of 1 unit (in which case, defection always dominates cooperation for them), defection dominates cooperation for player 1 if and only if $\delta _b < 1 + 3\delta _c$ and, for player 3, only when $\delta _b > 3\delta _c - 1$. These regions in the $\delta _b-\delta _c$ parameter plane that correspond to defection dominating cooperation are circumscribed by white dashed lines in Fig. 2c. When players learn to play at high selection strength, $\beta $, their cooperation frequency in the long-run reflects the rational play (Fig. 2c). In the long-run, the average cooperation frequency of the group is low if the asymmetry in the benefit value is bounded as $3\delta _c - 1< \delta _b < 3\delta _c + 1$. This includes the case where players are symmetric ($\delta _b = \delta _c = 0$). A relatively high cooperation is only assured if players are aligned in their asymmetries (i.e., either $\delta _b < 3\delta _c +1$ or $\delta _b > 3\delta _c - 1$). Or, in other words, if the player that has low cost of cooperation also provides a high benefit upon contribution, then cooperation is high in the long-run.

4 Games with Two Actions and Their Properties Under Introspection Dynamics

In the previous section, we studied the properties of additive games under introspection dynamics. In this section, we study games that are a) not necessarily additive and b) have only two actions for each player. First, we study the symmetric version of such a game. An N-player symmetric normal form game with two actions has the following properties:

1.
All players have the same action set $\mathcal {A}:= \{\textrm{C},\textrm{D}\}$. That is, $\textbf{A}_1 = \textbf{A}_2 =... = \textbf{A}_N:= \mathcal {A}$.
2.
Players have the same payoff when they play the same action against the same composition of co-players. That is, for any $i,j \in \{1,2,...,N\}$, $a \in \mathcal {A}$ and $\textbf{b}\in \mathcal {A}^{N-1}$,
$$\begin{aligned} \pi _i(a,\textbf{b}) = \pi _j(a,\textbf{b}). \end{aligned}$$
(17)

Since players are symmetric, states can again be enumerated by counting the number of $\textrm{C}$ players in the state. We denote the payoff of a $\textrm{C}$ and $\textrm{D}$ player in a state where there are j co-players playing $\textrm{C}$ by $\pi ^\textrm{C}(j)$ and $\pi ^\textrm{D}(j)$, respectively. We denote with f(j) the payoff difference earned by switching from $\textrm{D}$ to $\textrm{C}$ when there are j co-players playing $\textrm{C}$,

$$\begin{aligned} f(j) := \pi ^\textrm{D}(j) - \pi ^\textrm{C}(j). \end{aligned}$$

(18)

The stationary distribution of a $2-$action symmetric game under introspection dynamics can be explicitly computed using the following proposition.

Proposition 4

When $\beta $ is finite, the unique stationary distribution of introspection dynamics for an $N-$player symmetric normal form game with two actions, $\mathcal {A} = \{\textrm{C}, \textrm{D}\}$, $(\textrm{u}_{\textbf{a}})_{\textbf{a}\in \mathcal {A}^N}$, is given by

$$\begin{aligned} \textrm{u}_\textbf{a}= \frac{1}{\Gamma } \displaystyle \prod _{j=1}^{\mathcal {C}(\textbf{a})} \displaystyle e^{-\beta f(j-1)} \end{aligned}$$

(19)

where f(j) is defined as in Eq. (18) and $\mathcal {C}(\textbf{a})$ is the number of cooperators in state $\textbf{a}$. The term $\Gamma $ is the normalization factor given by

$$\begin{aligned} \Gamma = \displaystyle \sum _{\textbf{a}' \in \mathcal {A}^N} \prod _{j = 1}^{\mathcal {C}(\textbf{a}')} \displaystyle e^{-\beta f(j-1)}. \end{aligned}$$

(20)

The number of unique states of the game can be reduced from $2^N$ to $N+1$ due to symmetry. In the reduced state space, the state k corresponds to k players playing $\textrm{C}$ and $N-k$ players playing $\textrm{D}$. Then, Proposition 4 can be simply reformulated by relabelling the states as follows,

Corollary 1

When $\beta $ is finite, the unique stationary distribution, $(\textrm{u}_k)_{k \in \{0,1,...,N\}}$, of introspection dynamics for an $N-$player symmetric normal form game with two actions, $\mathcal {A} = \{\textrm{C}, \textrm{D}\}$, is given by

$$\begin{aligned} \textrm{u}_k = \frac{1}{\Gamma } \cdot {N \atopwithdelims ()k} \cdot \displaystyle \prod _{j=1}^{k} \displaystyle e^{-\beta f(j-1)} \end{aligned}$$

(21)

where k represents the number of $\textrm{C}$ players in the state and f(j) is defined as in Eq. (18). The term $\Gamma $ is the normalization factor, given by

$$\begin{aligned} \Gamma = \displaystyle \sum _{k=0}^N {N \atopwithdelims ()k} \cdot \displaystyle \prod _{j=1}^{k} \displaystyle e^{-\beta f(j-1)} . \end{aligned}$$

(22)

The above corollary follows directly from Proposition 4. The key step is to count the number of states in the state space $\mathcal {A}^N$ that corresponds to exactly k $\textrm{C}$ players (and therefore $N-k$ $\textrm{D}$ players). This count is simply the binomal coefficient $N \atopwithdelims ()k$. In the next section, we use the example of a nonlinear public goods game to illustrate these results.

4.1 An Example of a Game with Two Actions: The General Public Goods Game

To study general public goods games, we adopt the framework of general social dilemmas from Hauert et al. [31]. In the original paper, the authors propose a normal form game with symmetric players. The game’s properties depend on a parameter w that determines the nature of the public good. The players have two actions: cooperation, $\textrm{C}$ and defection, $\textrm{D}$. Here, we extend their framework to account for players with asymmetric payoffs. Before we explain the asymmetric setup, we describe the original model briefly. In the symmetric case, all N players have the same cost of cooperation c and they all generate the same benefit b for the public good. Unlike the linear public goods game, contributions to the public good are scaled by a factor that is determined by w and the number of cooperators in the group. The payoff of a defector and a cooperator in a group with k cooperators and $N-k$ defectors is given by,

$$\begin{aligned} \pi ^{\textrm{D}}(k)&= \frac{b}{N}\left( 1 + w + w^2 + \cdots + w^{k-1}\right) , \end{aligned}$$

(23)

$$\begin{aligned} \pi ^{\textrm{C}}(k)&= \pi ^{\textrm{D}}(k) - c . \end{aligned}$$

(24)

The parameter w represents the nonlinearity of the public good. The game is linear when $w = 1$. Every cooperator’s contribution is as valuable as the benefit that they can generate. When $w < 1$, the effective contribution of every additional cooperator goes down by a factor w (compared to the last cooperator). The public good is said to be discounting in this case. On the other hand, when $w > 1$, every new contribution is more valuable than the previous one. The public good is said to be synergistic in this case. For the symmetric case, the relationship between the cost-to-benefit ratio, cN/b, and the discount/synergy factor, w, determines the type of social dilemma arising from the game. In principle, this framework can produce generalizations of the prisoner’s dilemma ($\textrm{D}$ dominating $\textrm{C}$), the snowdrift game (coexistence between $\textrm{C}$ and $\textrm{D}$), the stag-hunt game (no dominance but existence of an internal unstable equilibrium) and the harmony game ($\textrm{C}$ dominating $\textrm{D}$) with respect to its evolutionary trajectories under the replicator dynamics. For more details, see Hauert et al. [31].

Now, we describe our extension of the original model to account for asymmetric players. Here, for player i, the cost of cooperation is $c_i$. The benefit that they can generate for the public good is $b_i$. The benefit of cooperation generated by a player is either synergized (or discounted) by a factor depending on the number of cooperators already in the group and the synergy/discount factor, w (just like the original model). However, now, since players are asymmetric it is not entirely clear in which order the contributions of cooperators should be discounted (or synergized). For example, consider that there are 3 cooperators in the group: player p, q and r. The total benefit that they provide to the public good can be one of the six possibilities from $x + y w + z w^2$, where x, y and z are permutations of $b_p, b_q$ and $b_r$. In this model, we assume that all such permutations are equally likely, and therefore, the expected benefit provided by all three of them is given by $\bar{b}(1 + w + w^2)$ where $\bar{b} = (b_p + b_q + b_r)/3$.

The complete state space of the game with asymmetric players is $\textbf{A}= \{\textrm{C},\textrm{D}\}^N$. The payoff of a defector in a state $(\textrm{D}, \textbf{a}_{-i})$ and that of a cooperator in state $(\textrm{C},\textbf{a}_{-i})$ where $\textbf{a}_{-i} \in \{\textrm{C},\textrm{D}\}^{N-1}$ are, respectively, given by

$$\begin{aligned} \pi _i(\textrm{D}, \textbf{a}_{-i})&= {\left\{ \begin{array}{ll} \displaystyle \sum _{j=1}^N b_j \alpha (a_j) \cdot \frac{1}{N \cdot \mathcal {C}(\textrm{D},\textbf{a}_{-i})} \cdot \left( 1 + w + w^2 + \cdots w^{\mathcal {C}(\textrm{D},\textbf{a}_{-i}) - 1} \right) &{}\quad \text {if } \mathcal {C}(\textrm{D},\textbf{a}_{-i}) \ne 0 \\[15pt] 0 &{}\quad \text {if } \mathcal {C}(\textrm{D},\textbf{a}_{-i}) = 0 \end{array}\right. } \end{aligned}$$

(25)

$$\begin{aligned} \pi _i(\textrm{C}, \textbf{a}_{-i})&= \displaystyle \sum _{j=1}^N b_j \alpha (a_j) \cdot \frac{1}{N \cdot \mathcal {C}(\textrm{C},\textbf{a}_{-i})} \cdot \left( 1 + w + w^2 + \cdots w^{\mathcal {C}(\textrm{C},\textbf{a}_{-i}) - 1} \right) - c_i \end{aligned}$$

(26)

where $\mathcal {C}(a,\textbf{a}_{-i})$ counts the number of cooperators in state $(a,\textbf{a}_{-i})$ and $\alpha (.)$, as before, maps the actions $\textrm{C}$ and $\textrm{D}$ to 1 and 0, respectively. Note that the number of cooperators in the two states are related as: $\mathcal {C}(\textrm{D},\textbf{a}_{-i}) = \mathcal {C}(\textrm{C},\textbf{a}_{-i}) - 1$. We are interested in studying the long-term stationary behavior of players in this game when they learn through introspection. We first discuss results from the symmetric public goods game and then discuss results for the game with asymmetric players.

To compute the stationary distribution of introspection dynamics in this game, we use Eq. (21). In our symmetric example, we consider that every player in an $N-$player game can generate a benefit b of value 2. Before exploring the $c-w-N$ parameter space, we study four specific cases (with a $4-$player game). In two of these cases, the public goods is discounted ($w = 0.5$, Fig. 3a left panels) and in two other cases, the public goods is synergistic ($w = 1.5$, Fig. 3a right panels). For each case, we consider two sub-cases: first, in which cost is high ($c = 1$, Fig. 3a top panels) and second, when cost is low ($c = 0.2$, Fig. 3a bottom panels). The four parameter combinations are chosen such that each of them corresponds to a unique social dilemma under the replicator dynamics. When selection strength is intermediate ($\beta = 5$), players sometimes play actions that are not optimal for the dilemma. For example, even when the parameters of the game make cooperation to be the dominated strategy ($w = 0.5, c = 1$), there is a single cooperator in the group in around 20$\%$ of the cases. When the parameters of the game reflect the stag-hunt dilemma ($c = 1, w = 1.5$), players are more likely to coordinate their actions in the long-run. The probabilities that the whole group plays $\textrm{C}$ or $\textrm{D}$ are higher than the probabilities that there is a group with a mixture of $\textrm{C}$ and $\textrm{D}$ players. In contrast, when the parameters reflect the snowdrift game ($w = 0.5, c = 0.5$), we get the opposite effect. In the long-run, mixed groups are more likely than homogeneous groups. Finally, when the parameters of the game make defection the dominated action ($w = 1.5, c = 0.2$), all players learn to cooperate in the long-run.

The average cooperation frequency of the group in the long-run is shown in the $c-w$ and $N-w$ parameter planes in Fig. 3b. First, let us consider the case when the group size is fixed at 4 players (the $c-w$ plane in Fig. 3b). In that case, if the cost of cooperation is restrictively high, the average cooperation rate is negligible and does not change with the nature of the public good. In contrast, when the cost is not restrictively high, the discount/synergy parameter, w, determines the frequency with which players cooperate in the long-run. A higher w for the public good would result in higher cooperation (and vice-versa). Next, we consider the case where the cost of cooperation is fixed (the $N-w$ plane in Fig. 3b). The cost is fixed to a value such that in a synergistic public good ($w > 1$), the cooperation frequency is almost 1 in the long-run for any group size. In this case, when the public good is discounted, group size N and the discounting factor w jointly determine the cooperation frequency in the long-run. In discounted public goods, cooperation rates fall with the increase in group size.

We also study introspection dynamics in this game with asymmetric players. We use the same setup that we used for studying the asymmetric linear public goods. The average frequency of cooperation per player is summarized in Supplementary Figs. 1 and 2. In Supplementary Fig. 1, we study two cases, a first in which the public good is synergistic and players have a high average cost, and a second in which public good is discounted and players have a lower average cost. In both of these cases, players cooperate highly when they simultaneously have low cost and high benefit. The only noticeable difference between the two cases is the minimum relation between the asymmetries $\delta _b$ and $\delta _c$ that results in high cooperation for the player with low cost and high benefit. When we observe individual cooperation frequency versus the synergy/discount factor, w (Supplementary Fig. 2), we find that when players are symmetric with respect to just benefits (or just costs), the one with the lowest cost (or highest benefit) cooperates with a high probability across all types of public goods, even for a high value of average cost.

5 Application: Introspection Learning in a Game with Cooperation and Rewards

In all the examples that we have studied so far, players can only choose between two actions (pure strategies). Introspection dynamics is particularly useful when players can use larger strategy sets. As such, in this section, we study the stationary behavior of players in the $N-$player, $16-$strategy cooperation and rewarding game from Pal and Hilbe [57]. In this game, there are two stages: in stage 1, players decide whether or not they contribute to a linear public good and in stage 2, they decide whether or not they reward their peers. When a player contributes to the public good, they pay a cost $c_i$ but generate a benefit worth $r_i c_i$ that is equally shared by everyone. When a player rewards a peer, they provide them a benefit of $\rho $ while incurring the cost of rewarding, $\gamma _i$, to themselves. In between the stages, players get full information about the contribution of their peers. In the rewarding stage, players have four possible strategies: they can either reward all the peers who contributed (social rewarding), reward all the peers who defected (antisocial rewarding), reward all peers irrespective of contribution (always rewarding) or reward none of the peers (never rewarding). Before stage 1 commences, player i knows with some probability, $\lambda _i$, the rewarding strategy of all their peers. In stage 1, players can have four possible strategies: they can either contribute or defect unconditionally or they can be conditional cooperators or conditional defectors. Conditional cooperators (or defectors) contribute (or do not contribute) when they have no information about their peers (which happens with probability $1 - \lambda _i$). When a conditional player, i, knows the rewarding strategy of all their peers (which happens with probability $\lambda _i$) and finds that there are $n_{\textrm{SR}}$ social rewarders and $n_{\textrm{AR}}$ antisocial rewarders among their peers, they cooperate if and only if the marginal gain from rewards for choosing cooperation over defection outweighs the effective cost of cooperation. That is,

$$\begin{aligned} \rho (n_{\textrm{SR}} - n_{\textrm{AR}}) \ge c_i \left( 1 - \frac{r_i}{N} \right) . \end{aligned}$$

(27)

Combining the two stages, players can use one of 16 possible strategies (4 in stage 1 and 4 in stage 2). In the simple case where players are identical, one can characterize the Nash equilibria of the game and identify the conditions which allow an equilibrium where all players contribute in the first stage and reward peers in second stage [57]. In the symmetric case, full cooperation and rewarding is feasible in equilibrium when all players have sufficient information about each other and the reward benefit $\rho $ is neither too high, nor too low. In this section, we study three simple cases of asymmetry between players to demonstrate how these asymmetric players may learn to play the game through introspection dynamics. The three specific examples that we show demonstrate that with introspection dynamics, asymmetric players can end up taking different roles in the long-run to produce the public good. To this end, we consider a $3-$player game in which players 1 and 2 are identical but player 3 is asymmetric to them in some aspect. In each case, the asymmetric player either has a) a higher cost of rewarding $\gamma _3 > \gamma _1$, b) low productivity $r_3 < r_1$, or c) less information about peers $\lambda _3 < \lambda _1$ than their peers. We use Eq. (7) to exactly compute the expected abundances of the 16 strategies for each player.

In the case where player 3 is asymmetric with respect to their cost of rewarding, the long-run outcome of introspection reflects a division in labor between the players in producing the public good (Fig. 4a). The players to whom rewarding is less costly (player 1 and player 2), reward cooperation with a higher probability than to whom rewarding is very costly (player 3). In return, player 3 learns to respond by contributing with more probability than their co-players. With these specific parameters, one player takes up the role of providing the highest per-capita contribution while the others compensate with costly rewarding. When the asymmetric player differs only in their productivity, a different effect may appear in the long-run (Fig. 4b). In this case, the less productive player free-rides on the cooperation of their higher productive peers, but eventually reward the cooperation of their peers nonetheless. The asymmetric player free-rides but does not second-order free ride. The probability with which the less productive player rewards others in the long-run is slightly higher than the probability with which the contributing individuals reward each other. Finally, we consider the case where the asymmetric individual differs from others in terms of the information players have about others’ rewarding strategy (Fig. 4c). In this case, the asymmetric player knows others’ strategy with a considerably less chance than their peers. In the long-run, the asymmetric player cooperates less on average than their peers. This is because the asymmetric individual faces less instances where they can opportunistically cooperate with their co-players. However, both types of player reward cooperation almost equally and just enough to sustain cooperation.

6 Discussion and Conclusion

We introduce introspection dynamics in N-player (a)symmetric games. In this learning model, at each time, one of the N players updates (or not) their strategy by comparing the payoffs of two strategies only: the one being currently played and a random prospective one. Clearly, this assumption implies a simple cognitive process. Players do not optimize over the entire set of strategies as, for example, in best-response models [13, 25, 36]. One such model of particular interest due to its connections to introspection dynamics is the logit dynamics [2, 7, 13]. In Appendix 1, we compare introspection dynamics and logit dynamics. We show that the two processes have equivalent stationary distributions for 2-strategy games, potential games and additive games. We also note that there are games for which the stationary distributions do not match. For example, we find coordination games with multiple Nash equilibria for which introspection dynamics and logit dynamics select a different equilibrium. Interestingly, whether one of the dynamics is better at selecting the higher payoff equilibrium in coordination games has no trivial answer and remains to be investigated.

Furthermore, although conceptually similar, our model is also simpler than typical reinforcement learning models. For example, while we only have selection strength as a parameter (apart from the payoffs), in Macy and Flache [40], there is a learning rate parameter (which could be comparable to our selection strength) but also an aspiration parameter which sets a payoff reference. In our model, the payoff reference is always the current one. All in all, while at each single time step individuals are restricted to reason over two strategies only, as they iterate this step over time, they are able to fully explore the whole set of strategies, in a trial-and-error fashion.

Importantly, our model is also much simpler computationally than the stochastic evolutionary game theory framework. While they both can involve solving the stationary distribution of a Markov process, they differ greatly in the state space size. Population models typically assume individuals play multiple games against (potentially all) other players in a population. As such, the state is defined by the number of players playing each strategy in the population(s). The number of states rapidly increases with the population size, the number of strategies, size of interaction and types of players (in the case of asymmetric games). One can see how the mathematical analysis of multiplayer asymmetric games can become cumbersome. To deal with this issue, previous models frequently resorted to additional approximations, like low mutation rate [21, 85] and weak selection [88]. On the contrary, in introspection dynamics, the states of the Markov process correspond to the outcome of a single (focal) game: for a N-player game, where player i has $m_i$ possible actions, there are $m_1 \times m_2 \times \cdots \times m_N$ states. This feature hugely reduces our state space size, which is key for obtaining exact results.

Here, we thus provide a general explicit formula, Eq. (7), that easily computes the stationary distribution of any multiplayer asymmetric game under introspection dynamics. Note that this formula is useful for the exploration of many-strategy games in the full range of selection strength. Additionally, we show that it is possible to obtain some analytical expressions for the long-run average strategy abundances. We start by analyzing the set of additive games, for which the gain from switching between any two actions is constant, regardless of what co-players do. Due to this simple feature, additive games allow for the most general close-form expression for the stationary distribution (regarding the number of players, of strategies, and asymmetry of the game). We also find that for additive games, the joint distribution of strategies factorizes over the marginal distribution of strategies. For more general games, we provide the stationary distribution formula for 2-strategy, symmetric games. Finally, we study several examples of social dilemmas. From those, we see that, despite the differences to other models pointed out above, we recover some previous results qualitatively [31]. We also conclude that players that have a lower cost or a higher benefit of cooperation learn to cooperate more frequently.

Introspection dynamics is rather broad in its scope. Here, we mainly focus on introducing a general framework. Still, we provide some examples to illustrate how it can be applied. Besides the generic public goods game, we study a 2-stage game, where players can choose among 16 strategies. There, individuals can reward their co-players condition on their previous cooperative (or not) behavior. Clearly, there are a number of ways in which our model can be further employed. For example, other researchers recently studied multiplayer games considering multiple games played concurrently [86], fluctuating environments [11], continuous strategies [48], or repeated interactions [34, 87]. Also, a number of previous works considered complex population structures [12, 14, 60, 65,66,67,68]. As defined above, introspection dynamics does not consider a population of players, making it simple to work with. However, it could be equally applicable to population models. In that case, players would obtain average payoffs either from well-mixed or network-bounded interactions, as usual, but update their strategies introspectively.

Change history

17 October 2023
A Correction to this paper has been published: https://doi.org/10.1007/s13235-023-00538-3

References

Abou Chakra M, Traulsen A (2014) Under high stakes and uncertainty the rich should lend the poor a helping hand. J Theor Biol 341:123–130
Article MathSciNet Google Scholar
Alós-Ferrer C, Netzer N (2010) The logit-response dynamics. Games Econ Behav 68(2):413–427
Article MathSciNet Google Scholar
Antal T, Nowak MA, Traulsen A (2009) Strategy abundance in $2\times 2$ games for arbitrary mutation rates. J Theor Biol 257:340–344
Article Google Scholar
Antal T, Traulsen A, Ohtsuki H, Tarnita CE, Nowak MA (2009) Mutation-selection equilibrium in games with multiple strategies. J Theor Biol 258:614–622
Article MathSciNet Google Scholar
Archetti M, Scheuring I (2012) Review: game theory of public goods in one-shot social dilemmas without assortment. J Theor Biol 299:9–20
Article MathSciNet Google Scholar
Archetti M, Scheuring I, Hoffman M, Frederickson ME, Pierce NE, Yu DW (2011) Economic game theory for mutualism and cooperation. Ecol Lett 14(12):1300–1312
Article Google Scholar
Auletta V, Ferraioli D, Pasquale F, Penna P, Persiano G (2011) Convergence to equilibrium of logit dynamics for strategic games. In: Annual acm symposium on parallelism in algorithms and architectures, pp 197–206
Axelrod R (1984) The evolution of cooperation. Basic Books, New York
Google Scholar
Barfuss W, Donges JF, Kurths J (2019) Deterministic limit of temporal difference reinforcement learning for stochastic games. Phys Rev E 99:043305
Article Google Scholar
Barfuss W, Donges JF, Vasconcelos VV, Kurths J, Levin SA (2020) Caring for the future can turn tragedy into comedy for long-term collective action under risk of collapse. Proc Natl Acad Sci USA 117(23):12915–12922
Article Google Scholar
Baron JW, Galla T (2018) How successful are mutants in multiplayer games with fluctuating environments? Sojourn times, fixation and optimal switching. R Soc Open Sci 5(3):172176
Article MathSciNet Google Scholar
Bin W, Traulsen A, Gokhale CS (2013) Dynamic properties of evolutionary multi-player games in finite populations. Games 4(2):182–199
Article MathSciNet Google Scholar
Blume LE (1993) The statistical mechanics of strategic interaction. Games Econ Behav 5:387–424
Article MathSciNet Google Scholar
Broom M, Rychtar J (2012) A general framework for analysing multiplayer games in networks using territorial interactions as a case study. J Theor Biol 302:70–80
Article MathSciNet Google Scholar
Broom M, Cannings C, Vickers GT (1997) Multi-player matrix games. Bull Math Biol 59(5):931–952
Article Google Scholar
Broom M, Pattni K, Rychtář J (2019) Generalized social dilemmas: the evolution of cooperation in populations with variable group size. Bull Math Biol 81:4643–4674
Article MathSciNet Google Scholar
Bukowski M, Miekisz J (2004) Evolutionary and asymptotic stability in symmetric multi-player games. Int J Game Theory 33(1):41–54
Article MathSciNet Google Scholar
Couto MC, Giaimo S, Hilbe C (2022) Introspection dynamics: a simple model of counterfactual learning in asymmetric games. New J Phys 24(6):63010
Article MathSciNet Google Scholar
Cressman R, Tao Y (2014) The replicator equation and other game dynamics. Proc Natl Acad Sci USA 111:10810–10817
Article MathSciNet Google Scholar
Fowler JH (2005) Altruistic punishment and the origin of cooperation. Proc Natl Acad Sci USA 102(19):7047–7049
Article Google Scholar
Fudenberg D, Imhof LA (2006) Imitation processes with small mutations. J Econ Theory 131:251–262
Article MathSciNet Google Scholar
Fudenberg D, Levine D (1998) The theory of learning in games. MIT Press, Cambridge
Google Scholar
Fudenberg D, Nowak MA, Taylor C, Imhof LA (2006) Evolutionary game dynamics in finite populations with strong selection and weak mutation. Theor Popul Biol 70:352–363
Article Google Scholar
Galla T, Farmer JD (2013) Complex dynamics in learning complicated games. Proc Natl Acad Sci USA 110(4):1232–1236
Article MathSciNet Google Scholar
Gaunersdorfer A, Hofbauer J (1995) Fictitious play, shapley polygons, and the replicator equation. Games Econ Behav 11:279–303
Article MathSciNet Google Scholar
Gaunersdorfer A, Hofbauer J, Sigmund K (1991) The dynamics of asymmetric games. Theor Popul Biol 29:345–357
Article MathSciNet Google Scholar
Gokhale CS, Traulsen A (2010) Evolutionary games in the multiverse. Proc Natl Acad Sci USA 107:5500–5504
Article MathSciNet Google Scholar
Gokhale CS, Traulsen A (2011) Strategy abundance in evolutionary many-player games with multiple strategies. J Theor Biol 238:180–191
Article MathSciNet Google Scholar
Gokhale CS, Traulsen A (2012) Mutualism and evolutionary multiplayer games: revisiting the Red King. Proc R Soc B 279:4611–4616
Article Google Scholar
Gokhale CS, Traulsen A (2014) Evolutionary multiplayer games. Dyn Games Appl 4:468–488
Article MathSciNet Google Scholar
Hauert C, Michor F, Nowak MA, Doebeli M (2006) Synergy and discounting of cooperation in social dilemmas. J Theor Biol 239:195–202
Article MathSciNet Google Scholar
Hauser OP, Hilbe C, Chatterjee K, Nowak MA (2019) Social dilemmas among unequals. Nature 572:524–527
Article Google Scholar
Henrich J, McElreath R, Barr A, Ensminger J, Barrett C, Bolyanatz A, Cardenas JC, Gurven M, Gwako E, Henrich N, Lesorogol C, Marlowe F, Tracer D, Ziker J (2006) Costly punishment across human societies. Science 312:1767–1770
Article Google Scholar
Hilbe C, Wu B, Traulsen A, Nowak MA (2015) Evolutionary performance of zero-determinant strategies in multiplayer games. J Theor Biol 374:115–124
Article MathSciNet Google Scholar
Hofbauer J (1996) Evolutionary dynamics for bimatrix games: a Hamiltonian system? J Math Biol 34:675–688
Article MathSciNet Google Scholar
Hofbauer J, Hopkins E (2005) Learning in perturbed asymmetric games. Games Econ Behav 52(1):133–152
Article MathSciNet Google Scholar
Hofbauer J, Sigmund K (1998) Evolutionary games and population dynamics. Cambridge University Press, Cambridge
Book Google Scholar
Kroumi D, Lessard S (2022) Average abundancy of cooperation in multi-player games with random payoffs. J Math Biol 85(3):1–31
Article MathSciNet Google Scholar
Kurokawa S, Ihara Y (2009) Emergence of cooperation in public goods games. Proc R Soc B 276:1379–1384
Article Google Scholar
Macy MW, Flache A (2002) Learning dynamics in social dilemmas. Proc Natl Acad Sci USA 99:7229–7236
Article Google Scholar
Manh Hong Duong and The Anh Han (2020) On equilibrium properties of the replicator–mutator equation in deterministic and random games. Dyn Games Appl 10(3):641–663
Article MathSciNet Google Scholar
Maynard Smith J (1982) Evolution and the theory of games. Cambridge University Press, Cambridge
Book Google Scholar
Maynard Smith J, Price GR (1973) The logic of animal conflict. Nature 246:15–18
Article Google Scholar
McAvoy A, Hauert C (2015) Asymmetric evolutionary games. PLoS Comput Biol 11(8):1–26
Article Google Scholar
McAvoy A, Kates-Harbeck J, Chatterjee K, Hilbe C (2022) Evolutionary instability of selfish learning in repeated games. PNAS Nexus 1–15
Merhej R, Santos FP, Melo FS, Santos FC (2022) Cooperation and learning dynamics under wealth inequality and diversity in individual risk perception. J Artif Intell Res 74:733–764
Article MathSciNet Google Scholar
Milinski M, Röhl T, Marotzke J (2011) Cooperative interaction of rich and poor can be catalyzed by intermediate climate targets. Clim Change 109:807–814
Article Google Scholar
Molina C, Earn DJ (2017) Evolutionary stability in continuous nonlinear public goods games. J Math Biol 74(1–2):499–529
Article MathSciNet Google Scholar
Monderer D, Shapley LS (1996) Potential games. Games Econ Behav 14:124–143
Article MathSciNet Google Scholar
Nowak MA (2006) Five rules for the evolution of cooperation. Science 314:1560–1563
Article Google Scholar
Nowak MA (2006) Evolutionary dynamics. Harvard University Press, Cambridge
Book Google Scholar
Nowak MA, Highfield R (2011) SuperCooperators: altruism, evolution, and why we need each other to succeed. Free Press, Los Angeles
Google Scholar
Nowak MA, Sigmund K (1990) The evolution of stochastic strategies in the prisoner’s dilemma. Acta Appl Math 20:247–265
Article MathSciNet Google Scholar
Nowak MA, Sasaki A, Taylor C, Fudenberg D (2004) Emergence of cooperation and evolutionary stability in finite populations. Nature 428:646–650
Article Google Scholar
Ohtsuki H (2010) Stochastic evolutionary dynamics of bimatrix games. J Theor Biol 264:136–142
Article MathSciNet Google Scholar
Pacheco JM, Santos FC, Souza MO, Skyrms B (2009) Evolutionary dynamics of collective action in n-person stag hunt dilemmas. Proc R Soc B 276:315–321
Article Google Scholar
Pal S, Hilbe C (2022) Reputation effects drive the joint evolution of cooperation and social rewarding. Nat Commun 13(1):5928
Article Google Scholar
Palm G (1984) Evolutionary stable strategies and game dynamics for n-person games. J Math Biol 19(3):329–334
Article MathSciNet Google Scholar
Pangallo M, Sanders JB, Galla T, Farmer JD (2022) Towards a taxonomy of learning dynamics in $2\times 2$ games. Games Econ Behav 132:1–21
Article Google Scholar
Pattni K, Broom M, Rychtar J (2017) Evolutionary dynamics and the evolution of multiplayer cooperation in a subdivided population. J Theor Biol 429:105–115
Article MathSciNet Google Scholar
Peña J (2011) Group size diversity in public goods games. Evolution 66:623–636
Article Google Scholar
Peña J, Nöldeke G (2016) Variability in group size and the evolution of collective action. J Theor Biol 389:72–82
Article MathSciNet Google Scholar
Peña J, Nöldeke G (2018) Group size effects in social evolution. J Theor Biol 457:211–220
Article MathSciNet Google Scholar
Peña J, Lehmann L, Nöldeke G (2014) Gains from switching and evolutionary stability in multi-player matrix games. J Theor Biol 346:23–33
Article Google Scholar
Peña J, Nöldeke G, Lehmann L (2015) Evolutionary dynamics of collective action in spatially structured populations. J Theor Biol 382:122–136
Article MathSciNet Google Scholar
Peña J, Wu B, Traulsen A (2016) Ordering structured populations in multiplayer cooperation games. J R Soc Interface 13(114):20150881
Article Google Scholar
Perc M, Gómez-Gardeñes J, Szolnoki A, Floría LM, Moreno Y (2013) Evolutionary dynamics of group interactions on structured populations: a review. J R Soc Interface 10(80):20120997
Article Google Scholar
Qi S, McAvoy A, Plotkin JB (2022) Evolution of cooperation with contextualized behavior. Sci Adv 8(6):1–11
Google Scholar
Ramírez MA, Smerlak M, Traulsen A, Jost J (2023) Diversity enables the jump towards cooperation for the Traveler’s Dilemma. Sci Rep 1–9
Sandholm TW, Crites RH (1996) Multiagent reinforcement learning in the iterated prisoner’s dilemma. BioScience 37:147–166
Google Scholar
Santos FC, Pacheco JM (2011) Risk of collective failure provides an escape from the tragedy of the commons. Proc Natl Acad Sci USA 108:10421–10425
Article Google Scholar
Schmid L, Chatterjee K, Hilbe C, Nowak MA (2022) Direct reciprocity between individuals that use different strategy spaces. PLoS Comput Biol 18(6):1–29
Article Google Scholar
Schuster P, Sigmund K (1981) Coyness, philandering and stable strategies. Anim Behav 29:186–192
Article Google Scholar
Sekiguchi T (2022) Fixation probabilities of strategies for trimatrix games and their applications to triadic conflict. Dyn Games Appl
Sekiguchi T (2013) General conditions for strategy abundance through a self-referential mechanism under weak selection. Phys A 392(13):2886–2892
Article MathSciNet Google Scholar
Sekiguchi T, Ohtsuki H (2017) Fixation probabilities of strategies for bimatrix games in finite populations. Dyn Games Appl 7:93–111
Article MathSciNet Google Scholar
Skyrms B (2003) The stag-hunt game and the evolution of social structure. Cambridge University Press, Cambridge
Book Google Scholar
Taylor PD (1979) Evolutionarily stable strategies with two types of player. J Appl Probab 16(1):76–83
Article MathSciNet Google Scholar
Taylor PD, Jonker L (1978) Evolutionarily stable strategies and game dynamics. Math Biosci 40:145–156
Article MathSciNet Google Scholar
Traulsen A, Hauert C (2009) Stochastic evolutionary game dynamics. In: Schuster HG (ed) Reviews of nonlinear dynamics and complexity. Wiley-VCH, Weinheim, pp 25–61
Chapter Google Scholar
Tuyls K, Pérolat J, Lanctot M, Ostrovski G, Savani R, Leibo JZ, Ord T, Graepel T, Legg S (2018) Symmetric decomposition of asymmetric games. Sci Rep 8(1):1–20
Article Google Scholar
Tuyls K, Nowe A, Lenaerts T, Manderick B (2005) An evolutionary game theoretic perspective on learning in multi-agent systems. In: Information, interaction, and agency, pp 133–166
van Veelen M (2009) Group selection, kin selection, altruism and cooperation: when inclusive fitness is right and when it can be wrong. J Theor Biol 259:589–600
Article MathSciNet Google Scholar
Vasconcelos VV, Santos FC, Pacheco JM, Levin SA (2014) Climate policies under wealth inequality. Proc Natl Acad Sci USA 111:2212–2216
Article Google Scholar
Veller C, Hayward LK (2016) Finite-population evolution with rare mutations in asymmetric games. J Econ Theory 162:93–113
Article MathSciNet Google Scholar
Venkateswaran VR, Gokhale CS (2019) Evolutionary dynamics of complex multiple games. Proc R Soc B 286:20190900
Article Google Scholar
Wang X, Couto MC, Wang N, An X, Chen B, Dong Y, Hilbe C, Zhang B (2023) Cooperation and coordination in heterogeneous populations. Philos Trans R Soc B 378(1876):20210504
Article Google Scholar
Wild G, Traulsen A (2007) The different limits of weak selection and the evolutionary dynamics of finite populations. J Theor Biol 247:382–390
Article MathSciNet Google Scholar
Zhang X, Peng P, Zhou Y, Wang H, Li W (2022) Evolutionary game-theoretical analysis for general multiplayer asymmetric games

Download references

Acknowledgements

This work was supported by the European Research Council Starting Grant 850529 (E-DIRECT) and by the Max Planck Society. We would like to thank Christian Hilbe and the members of the Research Group Dynamics of Social Behavior for valuable feedback.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Max Planck Research Group Dynamics of Social Behavior, Max Planck Institute for Evolutionary Biology, 24306, Ploen, Germany
Marta C. Couto & Saptarshi Pal

Authors

Marta C. Couto
View author publications
You can also search for this author in PubMed Google Scholar
Saptarshi Pal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marta C. Couto.

Ethics declarations

Conflict of interest

The authors declare that they have no competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Evolutionary Games and Applications" edited by Christian Hilbe, Maria Kleshnina and Kate$\check{\textrm{r}}$ina Sta$\check{\textrm{n}}$ková.

The original online version of this article was revised: The figure title and axes labels of panels (b) and (c) have been corrected.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 1222 KB)

Appendices

Appendix 1: Comparison Between Introspection and Logit-Response Dynamics

In this section, we discuss some similarities between introspection dynamics and the widely studied logit-response dynamics (or perturbed best-response dynamics) [2, 13]. In the logit-response dynamics, at every time step, some set of players are chosen to update their strategies. The new strategy that a player adopts is drawn from a probability mass function over all of their strategies. The players construct this probability distribution by exponentially weighing the payoffs that they would receive if they switch to the new strategy (conditioned on all co-players’ strategies remaining fixed). Mathematically, the probability that player i switches to action $a_i \in \textbf{A}_i$ when the co-players are currently playing $\textbf{a}_{-i}$, is given by

$$\begin{aligned} p^{\textrm{LD}}_{a_i}(\textbf{a}_{-i}) := \frac{\displaystyle e^{\beta \pi _i(a_i,\textbf{a}_{-i})}}{\displaystyle \sum _{a' \in \textbf{A}_i} e^{\beta \pi _i(a',\textbf{a}_{-i})}} \end{aligned}$$

(28)

where the scalar $\beta $ is akin to selection strength in the introspection dynamics.

Note that while making comparisons between the two processes, we only consider a special case of the logit-response dynamics called the asynchronous learning logit-response dynamics [13]. From here on, we refer to the asynchronous learning logit-response dynamics as simply logit-response dynamics. In this process, at every time step, exactly one player is uniform randomly chosen to update their strategy. The transition probability of going from state $\textbf{a}$ to state $\textbf{b}$ in the logit-response dynamics can thus be expressed as

$$\begin{aligned} \textrm{T}^\textrm{LD}_{\textbf{a}, \textbf{b}} = {\left\{ \begin{array}{ll} \frac{1}{N} \cdot p^\textrm{LD}_{b_{j}} (\textbf{a}_{-j}) \quad \quad &{}\text { if }\textbf{b}\in \textrm{Neb}(\textbf{a}) \quad \text {and,} \quad j = \textrm{I}(\textbf{a},\textbf{b})\\ \\ 0 \quad &{}\text { if } \textbf{b}\notin \textrm{Neb}(\textbf{a}) \\ \\ 1 - \sum _{\textbf{c}\ne \textbf{b}} \textrm{T}_{\textbf{a},\textbf{c}} \quad &{}\text { if } \textbf{a}= \textbf{b}\end{array}\right. }. \end{aligned}$$

(29)

Throughout the rest of this section, we will refer to the transition matrix of the logit-response dynamics with $\textbf{T}^{\textrm{LD}}$ and the transition matrix of the introspection dynamics, as given by Eq. (4), with $\textbf{T}^{\textrm{ID}}$. We refer to the respective unique stationary distributions at finite $\beta $ with $\textbf{u}^\textrm{LD}$ and $\textbf{u}^\textrm{ID}$. Our first comparison between the two processes is for games with two actions.

Proposition 5

In games where $m_i = 2$, for all $i \in \{1,2,...,N\}$, the transition matrix and the stationary distribution for introspection dynamics and logit-response dynamics are the same. That is, $\textbf{T}^\textrm{LD}= \textbf{T}^\textrm{ID}$ and $\textbf{u}^\textrm{LD}= \textbf{u}^\textrm{ID}$.

This proposition states that for games where every player has two available actions (strategies), the processes of introspection and logit-response are equivalent.

We also compare the two processes on two more classes of games—additive and potential games [49]. We have already defined an additive game earlier. Monderer and Shapley [49] define potential games as any game where a scalar function $\phi : \textbf{A}\mapsto \mathbb {R}$ exists such that for all i, $a_i, a'_i \in \textbf{A}_i$ and $\textbf{a}_{-i} \in \textbf{A}_{-i}$,

$$\begin{aligned} \pi _i(a_i, \textbf{a}_{-i}) - \pi _i(a'_i, \textbf{a}_{-i}) = \phi (a_i, \textbf{a}_{-i}) - \phi (a'_i, \textbf{a}_{-i}). \end{aligned}$$

(30)

Here, the scalar function $\phi $ is called the potential of the game. Following this definition, one can see a potential game as a game where it follows that

$$\begin{aligned} \pi _i(a_i, \textbf{a}_{-i}) = \phi (a_i, \textbf{a}_{-i}) + \sigma (\textbf{a}_{-i}), \end{aligned}$$

(31)

where $ \sigma (\textbf{a}_{-i}):= \pi _i(a', \textbf{a}_{-i}) - \phi (a',\textbf{a}_{-i})$, for a fixed $a' \in \textbf{A}_i$, is independent of $a_i$. We show that, for potential games, introspection dynamics and logit-response dynamics have the same stationary distribution.

Proposition 6

For a potential game with potential $\phi $, the stationary distributions of introspection dynamics (with a finite $\beta $) and logit-response dynamics (with the same finite parameter $\beta $) are the same and given by

$$\begin{aligned} \textbf{u}^\textrm{ID}_\textbf{a}= \textbf{u}^\textrm{LD}_\textbf{a}= \frac{\displaystyle e^{\beta \phi (\textbf{a})}}{\displaystyle \sum _{\textbf{a}' \in \textbf{A}} e^{\beta \phi (\textbf{a}')}}. \end{aligned}$$

(32)

The above proposition states that both the processes (with the same finite $\beta $) lead to identical stationary distribution for any potential game. The proof of this proposition relies on proposition 1 from Alós-Ferrer and Netzer [2] where a closed-form expression of the stationary distribution of the logit-response dynamics for a potential game was provided. We obtain a similar result for additive games.

Proposition 7

For an $N-$player additive game, the stationary distributions of introspection dynamics (with a finite $\beta $) and logit-response dynamics (with the same finite $\beta $) are the same and given by

$$\begin{aligned} \textbf{u}^\textrm{ID}_\textbf{a}= \textbf{u}^\textrm{LD}_\textbf{a}= \prod _{j=1}^N \frac{1}{\displaystyle \sum _{a' \in \textbf{A}_j} e^{\beta f_j(a', a_j)}} \end{aligned}$$

(33)

where $f_j(a', a_j)$ is the co-player independent payoff difference given by Eq. (9).

To summarize, the two propositions above state that while the transition matrices of the two processes may be different from each other, the long-run stationary behavior of the players in potential and additive games are the same.

Appendix 2: Proofs

Proof of Proposition 1

Since $\beta $ is finite, the stationary distribution $\textbf{u}= (\textrm{u}_\textbf{a})_{\textbf{a}\in \textbf{A}}$ of the process is unique. The stationary distribution also satisfies the equalities in Eqs. (5) and (6). Before continuing through the remainder of the proof, we introduce some short-cut notation that we will be using:

$$\begin{aligned} \textrm{I}_{\textbf{b}}&:= \textrm{I}(\textbf{b},\textbf{a}), \quad {\textit{iff}} \quad \textbf{b}\in \textrm{Neb}(\textbf{a}) \end{aligned}$$

(34)

$$\begin{aligned} \tau _{j,a_j}&:= \frac{1}{\displaystyle \sum _{a' \in \textbf{A}_j} e^{\beta f_j(a', a_j)}} . \end{aligned}$$

(35)

In order to show that the candidate stationary distribution, as proposed in Eq. (10) is the stationary distribution of the process, we need to show that the following are true:

$$\begin{aligned} \textrm{T}_{\textbf{a},\textbf{a}} \textrm{u}_\textbf{a}+&\sum _{\textbf{b}\ne \textbf{a}} \textrm{T}_{\textbf{b}, \textbf{a}} \textrm{u}_{\textbf{b}} = \textrm{u}_\textbf{a}\quad \forall \textbf{a}\in \textbf{A}\end{aligned}$$

(36)

$$\begin{aligned}&\sum _{\textbf{a}\in \textbf{A}} \textrm{u}_\textbf{a}= 1 . \end{aligned}$$

(37)

Using our short-cut notation $\tau $ and the expression for our candidate stationary distribution in Eq. (10), we can express the stationary distribution as:

$$\begin{aligned} \textrm{u}_\textbf{a}= \prod _{j=1}^N \tau _{j,a_j} . \end{aligned}$$

(38)

Using this expression, the left-hand side of Eq. (36) can be simplified further with the steps:

$$\begin{aligned}&\textrm{T}_{\textbf{a},\textbf{a}} \textrm{u}_\textbf{a}+ \sum _{\textbf{b}\ne \textbf{a}} \textrm{T}_{\textbf{b}, \textbf{a}} \textrm{u}_{\textbf{b}} \end{aligned}$$

(39)

$$\begin{aligned}&\quad = \left( 1 - \frac{1}{N} \sum _{\textbf{b}\in \textrm{Neb}(\textbf{a})} \frac{1}{m_{\textrm{I}_\textbf{b}}-1} \cdot p_{a_{\textrm{I}_\textbf{b}} \rightarrow b_{\textrm{I}_\textbf{b}}} \right) \textrm{u}_{\textbf{a}}+ \frac{1}{N}\sum _{\textbf{b}\in \textrm{Neb}(\textbf{a})} \frac{1}{m_{\textrm{I}_\textbf{b}}-1} \cdot p_{b_{\textrm{I}_\textbf{b}} \rightarrow a_{\textrm{I}_\textbf{b}}} \cdot \textrm{u}_{\textbf{b}} \end{aligned}$$

(40)

$$\begin{aligned}&\quad = \textrm{u}_\textbf{a}+ \frac{1}{N} \sum _{\textbf{b}\in \textrm{Neb}(\textbf{a})} \left( \prod _{k \ne I_\textbf{b}} \tau _{k,a_k} \right) \left( p_{b_{\textrm{I}_\textbf{b}} \rightarrow a_{\textrm{I}_\textbf{b}}} \cdot \tau _{\textrm{I}_\textbf{b}, a_{\textrm{I}_\textbf{b}}} - p_{a_{\textrm{I}_\textbf{b}} \rightarrow b_{\textrm{I}_\textbf{b}}} \cdot \tau _{\textrm{I}_\textbf{b}, b_{\textrm{I}_\textbf{b}}} \right) \cdot \left( \frac{1}{m_{\textrm{I}_\textbf{b}}-1} \right) . \end{aligned}$$

(41)

For an additive game, the expressions for $p_{b_{\textrm{I}_\textbf{b}} \rightarrow a_{\textrm{I}_\textbf{b}}}$ and $p_{a_{\textrm{I}_\textbf{b}} \rightarrow b_{\textrm{I}_\textbf{b}}}$ can be simply written as

$$\begin{aligned} p_{b_{\textrm{I}_\textbf{b}} \rightarrow a_{\textrm{I}_\textbf{b}}}&=\frac{1}{1 + \displaystyle e^{\beta f_{\textrm{I}_\textbf{b}}(b_{\textrm{I}_\textbf{b}}, a_{\textrm{I}_\textbf{b}})}} \end{aligned}$$

(42)

$$\begin{aligned} p_{a_{\textrm{I}_\textbf{b}} \rightarrow b_{\textrm{I}_\textbf{b}}}&= \frac{1}{1 + \displaystyle e^{\beta f_{\textrm{I}_\textbf{b}}(a_{\textrm{I}_\textbf{b}}, b_{\textrm{I}_\textbf{b}})}} . \end{aligned}$$

(43)

Using the above expressions and the expression for $\tau $ in Eq. (35), it can be shown that:

$$\begin{aligned} \left( p_{b_{\textrm{I}_\textbf{b}} \rightarrow a_{\textrm{I}_\textbf{b}}} \cdot \tau _{\textrm{I}_\textbf{b}, a_{\textrm{I}_\textbf{b}}} - p_{a_{\textrm{I}_\textbf{b}} \rightarrow b_{\textrm{I}_\textbf{b}}} \cdot \tau _{\textrm{I}_\textbf{b}, b_{\textrm{I}_\textbf{b}}} \right) = 0 . \end{aligned}$$

(44)

After plugging the equality in Eq. (44) into Eq. (41), we see that the left-hand side of Eq. (36) simplifies to $\textrm{u}_{\textbf{a}}$. Now, to complete the proof we must check if Eq. (37) holds for our candidate distribution. Summing up the elements of the stationary distribution $\textrm{u}_\textbf{a}$ for all states $\textbf{a}\in \textbf{A}$:

$$\begin{aligned} \sum _{\textbf{a}\in \textbf{A}} \textrm{u}_\textbf{a}&= \sum _{\textbf{a}\in \textbf{A}} \prod _{k=1}^N \tau _{k,a_k} = \sum _{\textbf{a}\in \textbf{A}} \frac{\displaystyle \prod _{k=1}^N e^{\beta \pi _k(a_k, \textbf{q}_{-k})}}{\displaystyle \quad \prod _{k=1}^N \sum _{a' \in \textbf{A}_k} e^{\beta \pi _k(a',\textbf{q}_{-k})}} \end{aligned}$$

(45)

where $\textbf{q}_{-1}, \textbf{q}_{-2},...,\textbf{q}_{-N}$ are any arbitrary tuples from $\textbf{A}_{-1}, \textbf{A}_{-2},...,\textbf{A}_{-N}$, respectively. The denominator in the above expression can be taken out completely from the first sum. That is,

$$\begin{aligned} \sum _{\textbf{a}\in \textbf{A}} \textrm{u}_\textbf{a}=&\sum _{\textbf{a}\in \textbf{A}} \frac{\displaystyle \prod _{k=1}^N e^{\beta \pi _k(a_k, \textbf{q}_{-k})}}{\displaystyle \quad \prod _{k=1}^N \sum _{a' \in \textbf{A}_k} e^{\beta \pi _k(a',\textbf{q}_{-k})}} \end{aligned}$$

(46)

$$\begin{aligned} =&\left( \displaystyle \prod _{k=1}^N \left( e^{\beta \pi _k(a_{k,1}, \textbf{q}_{-k})}+ \cdots + e^{\beta \pi _k(a_{k,m_k}, \textbf{q}_{-k})} \right) \right) ^{-1} \cdot \left( \sum _{\textbf{a}\in \textbf{A}} \displaystyle \prod _{k=1}^N e^{\beta \pi _k(a_k, \textbf{q}_{-k})} \right) \end{aligned}$$

(47)

Multiplying out the sums in the denominator of the above expression, we get

$$\begin{aligned} \sum _{\textbf{a}\in \textbf{A}} \textrm{u}_\textbf{a}=&\left( \displaystyle \prod _{k=1}^N \left( e^{\beta \pi _k(a_{k,1}, \textbf{q}_{-k})}+ \cdots + e^{\beta \pi _k(a_{k,m_k}, \textbf{q}_{-k})} \right) \right) ^{-1}\nonumber \\&\cdot \left( \sum _{\textbf{a}\in \textbf{A}} \displaystyle \prod _{k=1}^N e^{\beta \pi _k(a_k, \textbf{q}_{-k})} \right) \end{aligned}$$

(48)

$$\begin{aligned} =&\left( \sum _{\textbf{a}\in \textbf{A}} \displaystyle \prod _{k=1}^N e^{\beta \pi _k(a_k, \textbf{q}_{-k})} \right) ^{-1} \left( \sum _{\textbf{a}\in \textbf{A}} \displaystyle \prod _{k=1}^N e^{\beta \pi _k(a_k, \textbf{q}_{-k})} \right) = 1 . \end{aligned}$$

(49)

The step from Eq. (48) to Eq. (49) involves multiplying out all the sums of exponents (where each term in the sum of exponents corresponds to payoff that player k receives by playing their actions against co-player composition, $\textbf{q}_{-k}$). Therefore, the stationary distribution sums up to 1. The candidate distribution we propose for the additive game is the unique stationary distribution of the process.$\square $

Proof of Proposition 2

Just like the previous proof, $\textbf{p}_{-1}, \textbf{p}_{-2},...,\textbf{p}_{-N} $ are any arbitrary tuples from $\textbf{A}_{-1}, \textbf{A}_{-2},...,\textbf{A}_{-N}$, respectively. In the steps below, we always decompose the expression $f_j(a,b)$ to $\pi _j(a,\textbf{p}_{-j}) - \pi _j(b,\textbf{p}_{-j})$. When $\textbf{u}= (\textrm{u}_\textbf{a})_{\textbf{a}\in \textbf{A}}$ is the unique stationary distribution of the $N-$player additive game under finite selection introspection dynamics, it is given by the closed form expression in Eq. (10). We use this expression to calculate the marginal distribution of actions played at a particular state $\textbf{a}$, $(\xi _{j,a_j})_{j \in \{1,2,...,N\}}$.

$$\begin{aligned} \xi _{j,a_j}&= \sum _{\textbf{q}\in \textbf{A}_{-j}} \textrm{u}_{(a_j,\textbf{q})} \end{aligned}$$

(50)

$$\begin{aligned}&= \sum _{\textbf{q}\in \textbf{A}_{-j}} \left( \sum _{a' \in \textbf{A}_j} \displaystyle e^{\beta f_j(a',a_j)}\right) ^{-1} \prod _{k \ne j} \left( \sum _{a' \in \textbf{A}_k} \displaystyle e^{\beta f_k(a',q_k)}\right) ^{-1} \end{aligned}$$

(51)

$$\begin{aligned}&= \left( \prod _{k=1}^N \sum _{a' \in \textbf{A}_k} \displaystyle e^{\beta \pi _k (a', \textbf{p}_{-k})}\right) ^{-1} \cdot \displaystyle e^{\beta \pi _j(a_j, \textbf{p}_{-j})} \cdot \left( \sum _{\textbf{q}\in \textbf{A}_{-j}} \prod _{k \ne j} e^{\beta \pi _k(q_k, \textbf{p}_{-k})} \right) \end{aligned}$$

(52)

$$\begin{aligned}&= \left( \sum _{a' \in \textbf{A}_j} \displaystyle e^{\beta \pi _j(a',\textbf{p}_{-j})}\right) ^{-1} \cdot \displaystyle e^{\beta \pi _j(a_j, \textbf{p}_{-j})}\cdot \left( \prod _{k\ne j} \sum _{a' \in \textbf{A}_k} \displaystyle e^{\beta \pi _k(a',\textbf{p}_{-k})}\right) ^{-1} \cdot \left( \sum _{\textbf{q}\in \textbf{A}_{-j}} \prod _{k \ne j} e^{\beta \pi _k(q_k,\textbf{p}_{-k})} \right) \end{aligned}$$

(53)

$$\begin{aligned}&= \left( \sum _{a' \in \textbf{A}_j} \displaystyle e^{\beta \pi _j(a',\textbf{p}_{-j})}\right) ^{-1} \cdot \displaystyle e^{\beta \pi _j(a_j, \textbf{p}_{-j})}\cdot \left( \sum _{\textbf{q}\in \textbf{A}_{-j}} \prod _{k\ne j} \displaystyle e^{\beta \pi _k(q_k,\textbf{p}_{-k})} \right) ^{-1} \cdot \left( \sum _{\textbf{q}\in \textbf{A}_{-j}} \prod _{k \ne j} e^{\beta \pi _k(q_k,\textbf{p}_{-k})} \right) . \end{aligned}$$

(54)

The interchange of the sum and the product between the expressions in Eqs. (53) and (54) can be carried out by observing that when all the sums are multiplied out, one is left with sums of terms, each of which is a exponential with power equal to sum of payoffs that co-players of j (here k) receive when they play their respective strategies from $\textbf{q}$ (that is $q_k$) against co-players that play $\textbf{p}_{-k}$. This is similar to the step between Eqs. (48) and (49) in the proof of Proposition 2. Thus,

$$\begin{aligned} \xi _{j,a_j}&= \left( \sum _{a' \in \textbf{A}_j} \displaystyle e^{\beta \left( \pi _j(a',\textbf{p}_{-j}) - \pi _j(a_j,\textbf{p}_{-j})\right) } \right) ^{-1} \end{aligned}$$

(55)

$$\begin{aligned}&= \left( \sum _{a' \in \textbf{A}_j} \displaystyle e^{\beta f_j(a',a_j)} \right) ^{-1} . \end{aligned}$$

(56)

Using the expression in Eq. (56), we can confirm that for additive games, the product of the marginals is the stationary distribution,$\square $

$$\begin{aligned} \prod _{j=1}^N \xi _{j,a_j} = \textrm{u}_\textbf{a}. \end{aligned}$$

(57)

Proof of Proposition 3

Since we have demonstrated that the linear public goods game is an additive game, the proof of this theorem can be performed by directly using Proposition 1. Here, we provide an independent proof. The idea behind this proof is identical to the proof of Proposition 1.

Again, since $\beta $ is finite, the process will have a unique stationary distribution. Before continuing with the rest of the proof where we show that our candidate stationary distribution is the unique stationary distribution, we define the following short-cut notations for the ease of the proof:

$$\begin{aligned} \bar{a}_j&:= \{\textrm{D},\textrm{C}\} \setminus \{a_j\} \end{aligned}$$

(58)

$$\begin{aligned} p_j&:= \frac{1}{1 + \displaystyle e^{\beta f_j(\textrm{D}, \textrm{C})}} . \end{aligned}$$

(59)

In addition, we introduce an indicator function $\alpha (.)$ which maps the action $\textrm{C}$ to 1 and the action $\textrm{D}$ to 0. That is $\alpha (\textrm{C}):= 1$ and $\alpha (\textrm{D}):= 0$. Using these notations and Eqs. (1) and (14) and utilizing our shortcut notation from above, we can write the probability that a player j updates to $a_j$ from $\bar{a}_j$ while their co-players play $\textbf{a}_{-j}$ as

$$\begin{aligned} p_{\displaystyle \bar{a}_j \rightarrow a_j} (\textbf{a}_{-j}) = p_j \textrm{sign}(a_j) + \alpha (\bar{a}_j) . \end{aligned}$$

(60)

The candidate stationary distribution $\textbf{u}$ given in Eq. (15) can be written down using our short-cut notation as

$$\begin{aligned} \textrm{u}_\textbf{a}= \prod _{k = 1}^{N} p_k \textrm{sign}(a_k) + \alpha (\bar{a}_k) . \end{aligned}$$

(61)

This stationary distribution must satisfy the following properties, which are also given in Eqs. (5) and (6):

$$\begin{aligned}&\textrm{u}_\textbf{a}= \textrm{T}_{\textbf{a},\textbf{a}} \textrm{u}_\textbf{a}+ \sum _{\textbf{b}\ne \textbf{a}} \textrm{T}_{\textbf{b}, \textbf{a}} \textrm{u}_{\textbf{b}} \end{aligned}$$

(62)

$$\begin{aligned}&\sum _{\textbf{a}\in \textbf{A}} \textrm{u}_{\textbf{a}}= 1, \end{aligned}$$

(63)

where the terms in the right-hand side of Eq. (62) can be simplified using Eqs. (1) and (4) as follows:

$$\begin{aligned} \textrm{T}_{\textbf{a},\textbf{a}} = 1 - \sum _{k=1}^{N} \textrm{T}_{(a_k, \textbf{a}_{-k}), (\bar{a}_k,\textbf{a}_{-k})} = 1 - \frac{1}{N} \sum _{k=1}^{N} p_k {\text {sign}}(\bar{a}_k) + \alpha (a_k) \end{aligned}$$

(64)

and additionally, using Eq. (61) the second term can be simplified to

$$\begin{aligned} \sum _{\textbf{b}\ne \textbf{a}} \textrm{T}_{\textbf{b}, \textbf{a}} \textrm{u}_{\textbf{b}}&= \sum _{k = 1}^N \textrm{T}_{(\bar{a}_k,\textbf{a}_{-k}), (a_k, \textbf{a}_{-k})} \textrm{u}_{(\bar{a}_k,\textbf{a}_{-k})} \end{aligned}$$

(65)

$$\begin{aligned}&= \frac{1}{N} \sum _{k = 1}^N \left( p_k \text {sign}(a_k) +\alpha (\bar{a}_k) \right) \textrm{u}_{(\bar{a}_k,\textbf{a}_{-k})} \end{aligned}$$

(66)

$$\begin{aligned}&= \frac{\textrm{u}_\textbf{a}}{N} \sum _{k=1}^{N} p_k \text {sign}(\bar{a}_k) + \alpha (a_k) . \end{aligned}$$

(67)

Now, using Eqs. (64) and (67) one can show that the right-hand side of Eq. (62) is the element of the stationary distribution, corresponding to the state $\textbf{a}$, $\textrm{u}_a$. Now, to complete the proof, we must show that Eq. (63) is also true for our candidate stationary distribution. This can be done by decomposing the sum of the elements of the stationary distribution as follows

$$\begin{aligned} \sum _{\textbf{a}\in \textbf{A}} \textrm{u}_{\textbf{a}} =&\displaystyle \sum _{\textbf{a}\in \textbf{A}} \prod _{k=1}^N p_k \textrm{sign}(a_k) + \alpha (\bar{a}_k) \end{aligned}$$

(68)

$$\begin{aligned} =&\displaystyle \displaystyle \sum _{\textbf{a}\in \textbf{A}_{-\!N}} (1-p_N) \prod _{k=1}^{N-1} p_k \textrm{sign}(a_k) + \alpha (\bar{a}_k) + p_N \prod _{k=1}^{N-1} p_k \textrm{sign}(a_k) + \alpha (\bar{a}_k) \end{aligned}$$

(69)

$$\begin{aligned} =&\displaystyle \sum _{\textbf{a}\in \textbf{A}_{-\!N}} \prod _{k=1}^{N-1} p_k \textrm{sign}(a_k) + \alpha (\bar{a}_k) . \end{aligned}$$

(70)

When the above decomposition is performed $N-1$ more times, the sum of the right-hand side becomes 1. This proves that the candidate stationary distribution is also a probability distribution.$\square $

Proof of Proposition 4

By construction, the candidate stationary distribution given by Eqs. (19) and (20) is a probability distribution since it satisfies the condition in Eq. (6) and for any state $\textbf{a}$, $\textrm{u}_{\textbf{a}}$ is between 0 and 1. Again, since $\beta $ is finite the process will have a unique stationary distribution. Again, to show that the candidate stationary distribution is the unique stationary distribution, we need to check if Eq. (5) holds. That is, the condition in Eq. (62) must hold for all states $\textbf{a}$. We re-introduce some notations that we will use in this proof:

$$\begin{aligned} \bar{a}^j&:= \{\textrm{D},\textrm{C}\} \setminus \{a_j\} \end{aligned}$$

(71)

$$\begin{aligned} \alpha (a)&:= {\left\{ \begin{array}{ll} 1 \quad \text {if} \quad a = \textrm{C}\\ 0 \quad \text {if} \quad a = \textrm{D}\end{array}\right. } \end{aligned}$$

(72)

$$\begin{aligned} \mathcal {C}(\textbf{a})&= \sum _{j=1}^N \alpha (a_j) . \end{aligned}$$

(73)

For this process, since there are only two actions, the first term in the right-hand side of Eq. (62) can be simplified as

$$\begin{aligned} \textrm{u}_{\textbf{a}} \textrm{T}_{\textbf{a},\textbf{a}}&= \textrm{u}_\textbf{a}- \textrm{u}_{\textbf{a}} \sum _{k=1}^{N} \textrm{T}_{(a_k, \textbf{a}_{-k}),(\bar{a}_{k}, \textbf{a}_{-k})} \end{aligned}$$

(74)

$$\begin{aligned}&= \textrm{u}_{\textbf{a}} - \frac{\textrm{u}_{\textbf{a}}}{N} \sum _{k=1}^N \frac{1}{1 + \displaystyle e^{\textrm{sign}(\bar{a}_{k}) \beta f(N_k)}} . \end{aligned}$$

(75)

where the function $\textrm{sign}(.)$ is defined as in Eq. (16) andf(j) is the difference in payoffs between playing $\textrm{D}$ and $\textrm{C}$ when there are j co-players playing $\textrm{C}$. The term $N_k$ represents the number of co-players of k that play $\textrm{C}$ in state $\textbf{a}$. That is,

$$\begin{aligned} N_k := \sum _{j \ne k} \alpha (a_j) . \end{aligned}$$

(76)

The second term in the right-hand side of Eq. (62) can be simplified as

$$\begin{aligned} \sum _{\textbf{b}\ne \textbf{a}} \textrm{T}_{\textbf{b}, \textbf{a}} \textrm{u}_{\textbf{b}}&= \sum _{k=1}^N \textrm{T}_{(\bar{a}_k, \textbf{a}_{-k}),(a_k, \textbf{a}_{-k})} \textrm{u}_{(\bar{a}_k, \textbf{a}_{-k})} \end{aligned}$$

(77)

$$\begin{aligned}&= \frac{1}{N \Gamma } \sum _{k=1}^N \textrm{T}_{(\bar{a}_k, \textbf{a}_{-k}),(a_k, \textbf{a}_{-k})} \displaystyle \prod _{j=1}^{\mathcal {C}((\bar{a}_k, \textbf{a}_{-k}))} e^{-\beta f(j-1)} \end{aligned}$$

(78)

$$\begin{aligned}&= \frac{1}{N \Gamma } \sum _{k=1}^N \textrm{T}_{(\bar{a}_k, \textbf{a}_{-k}),(a_k, \textbf{a}_{-k})} \displaystyle \left( \prod _{j=1}^{N_k} e^{-\beta f(j-1)} \right) \cdot e^{-\beta \alpha (\bar{a}_k)f(-\alpha (a_k)+ N_k}) . \end{aligned}$$

(79)

From Eq. (78) to Eq. (79), we took out one term from the product that is present in our candidate distribution. This term accounts for the $k^{th}$ players action in the neighboring state $(\bar{a}_k, \textbf{a}_{-k})$ of $\textbf{a}$. For simplicity, we represent $\textrm{T}_{(\bar{a}_k, \textbf{a}_{-k}),(a_k, \textbf{a}_{-k})}$ with just $\textbf{T}$ in the next steps. We continue the simplification of Eq. (79) in the next steps by introducing terms that cancel each other,

$$\begin{aligned} \sum _{\textbf{b}\ne \textbf{a}} \textrm{T}_{\textbf{b}, \textbf{a}} \textrm{u}_{\textbf{b}}&= \frac{1}{N \Gamma } \sum _{k=1}^N \textbf{T}\cdot \displaystyle \left( \prod _{j=1}^{N_k} e^{-\beta f(j-1)} \right) \cdot \frac{e^{-\beta \alpha (\bar{a}_k)f(-\alpha (a_k)+ N_k)}}{e^{-\beta \alpha (a_k)f(-\alpha (\bar{a}_k) +N_k)}} \cdot e^{-\beta \alpha (a_k)f(-\alpha (\bar{a}_k) + N_k)} . \end{aligned}$$

(80)

The newly introduced term in Eq. (80) can be taken inside the product. Note that this term is 1 if the $k^{th}$ player plays $\textrm{D}$ in the state $\textbf{a}$. When this term is taken inside the product bracket, products of exponent $e^{-\beta f(j-1)}$ can be performed for j ranging from 1 to the number of cooperators in state $\textbf{a}$, $\mathcal {C}(\textbf{a})$. This product is then the candidate stationary distribution probability $\textrm{u}_\textbf{a}$. That is,

$$\begin{aligned} \sum _{\textbf{b}\ne \textbf{a}} \textrm{T}_{\textbf{b}, \textbf{a}} \textrm{u}_{\textbf{b}}&= \frac{1}{N \Gamma } \sum _{k=1}^N \textbf{T}\cdot \displaystyle \left( \prod _{j=1}^{N_k} e^{-\beta f(j-1)} \cdot e^{-\beta \alpha (a_k)f(-\alpha (\bar{a}_k) + N_k)} \right) \cdot \frac{e^{-\beta \alpha (\bar{a}_k)f(-\alpha (a_k)+ N_k)}}{e^{-\beta \alpha (a_k)f(-\alpha (\bar{a}_k) +N_k)}} \end{aligned}$$

(81)

$$\begin{aligned}&= \frac{1}{N} \sum _{k=1}^N \textbf{T}\cdot \left( \frac{1}{\Gamma } \prod _{j=1}^{\mathcal {C}(\textbf{a})} e^{-\beta f(j-1)}\right) \cdot \frac{e^{-\beta \alpha (\bar{a}_k)f(-\alpha (a_k)+ N_k)}}{e^{-\beta \alpha (a_k)f(-\alpha (\bar{a}_k) +N_k)}} \end{aligned}$$

(82)

$$\begin{aligned}&= \frac{1}{N} \sum _{k=1}^N \textrm{T}_{(\bar{a}_k, \textbf{a}_{-k}),(a_k, \textbf{a}_{-k})} \cdot \textrm{u}_\textbf{a}\cdot \frac{e^{-\beta \alpha (\bar{a}_k)f(-\alpha (a_k)+ N_k)}}{e^{-\beta \alpha (a_k)f(-\alpha (\bar{a}_k) +N_k)}} . \end{aligned}$$

(83)

The fraction inside the sum in Eq. (83) can be simplified using the $\textrm{sign}(.)$ function (in 16) leading to further simplification of Eq. (83):

$$\begin{aligned} \sum _{\textbf{b}\ne \textbf{a}} \textrm{T}_{\textbf{b}, \textbf{a}} \textrm{u}_{\textbf{b}}&= \frac{1}{N} \sum _{k=1}^N \textrm{T}_{(\bar{a}_k, \textbf{a}_{-k}),(a_k, \textbf{a}_{-k})} \cdot \textrm{u}_\textbf{a}\cdot e^{sign(a_k) \beta f(N_k)} . \end{aligned}$$

(84)

In Eq. (84), we can replace the element of the transition matrix $\textrm{T}_{(\bar{a}_k, \textbf{a}_{-k}),(a_k, \textbf{a}_{-k})}$ with

$$\begin{aligned} \textrm{T}_{(\bar{a}_k, \textbf{a}_{-k}),(a_k, \textbf{a}_{-k})} = \frac{1}{1 + \displaystyle e^{\textrm{sign}(a_k) \beta f(N_k)}} . \end{aligned}$$

(85)

Using the expression for the transition matrix element from Eq. (85) into Eq. (84) and by using Eq. (75), we can simplify further:

$$\begin{aligned} \sum _{\textbf{b}\ne \textbf{a}} \textrm{T}_{\textbf{b}, \textbf{a}} \textrm{u}_{\textbf{b}}&= \frac{\textrm{u}_\textbf{a}}{N} \sum _{k=1}^N \frac{1}{1 + \displaystyle e^{\textrm{sign}(a_k) \beta f(N_k)}} \cdot e^{sign(a_k) \beta f(N_k)} \end{aligned}$$

(86)

$$\begin{aligned}&= \frac{\textrm{u}_\textbf{a}}{N} \sum _{k=1}^N \frac{1}{1 + \displaystyle e^{\textrm{sign}(\bar{a}_k) \beta f(N_k)}} \end{aligned}$$

(87)

$$\begin{aligned}&= \textrm{u}_\textbf{a}- \textrm{u}_\textbf{a}\textrm{T}_{\textbf{a},\textbf{a}} . \end{aligned}$$

(88)

The final step in the previous simplification shows that Eq. (62) holds for any $\textbf{a}\in \{\textrm{C},\textrm{D}\}^N$. Therefore, the candidate distribution we propose in Eq. (19) is the unique stationary distribution of the symmetric N-player game with two strategies.$\square $

Proof of Corollary 1

To show this result, we count how many states are identical to a state $\textbf{a}\in \{\textrm{C},\textrm{D}\}^N$ in a symmetric game. When players are symmetric in a two-strategy game, states can be enumerated by counting the number of $\textrm{C}$ players in that state. This can also be confirmed by the expression of the stationary distribution in Eq. (19). Two distinct states $\textbf{a}, \textbf{a}'$ having the same number of cooperators (i.e., $\mathcal {C}(\textbf{a}') = \mathcal {C}(\textbf{a})$), have the same stationary distribution probability (i.e., $\textrm{u}_{\textbf{a}'} = \textrm{u}_{\textbf{a}}$).

In a game with N players, there can be k players playing $\textrm{C}$ in exactly $N \atopwithdelims ()k$ ways. As argued before, all of these states are identical and are also equiprobable in the stationary distribution. Therefore, the stationary distribution probability of having k, $\textrm{C}$ players, $\textrm{u}_{k}$, is

$$\begin{aligned} \textrm{u}_{k} = \sum _{\mathcal {C(\textbf{a})} = k} \textrm{u}_\textbf{a}= \frac{1}{\Gamma } {N \atopwithdelims ()k} \prod _{j=1}^k e^{-\beta f(j-1)} . \end{aligned}$$

(89)

where the normalization factor $\Gamma $ can also be simplified as

$$\begin{aligned} \Gamma = \sum _{k=0}^N {N \atopwithdelims ()k} \prod _{j=1}^k e^{-\beta f(j-1)} . \end{aligned}$$

(90)

$\square $

Proof of Proposition 5

We consider two arbitrary states $\textbf{a}$ and $\textbf{b}$ such that $\textbf{a}\in \textrm{Neb}(\textbf{b})$. Let $j = \textrm{I}(\textbf{a},\textbf{b})$ be the index of difference between these neighboring states. Taking into account that $m_i = 2, \forall i \in \{1,2,...,N\}$, it follows that $a_j$ and $b_j$ are the only two actions in the action set of player j, $\textbf{A}_j$. It can be shown that,

$$\begin{aligned} \textrm{T}^\textrm{ID}_{\textbf{a},\textbf{b}} = \frac{1}{N} \frac{1}{m_j - 1} p_{a_j \rightarrow b_j} (\textbf{a}_{-j})&= \frac{1}{N} \cdot \frac{1}{1 + \displaystyle e^{\beta (\pi _j(a_j, \textbf{a}_{-j}) - \pi _j(b_j,\textbf{a}_{-j}))}} \end{aligned}$$

(91)

$$\begin{aligned}&=\frac{1}{N} \cdot \frac{ \displaystyle e^{\beta \pi _j(b_j, \textbf{a}_{-j})}}{\displaystyle e^{\beta \pi _j(b_j, \textbf{a}_{-j})} + \displaystyle e^{\beta \pi _j(a_j, \textbf{a}_{-j})}} \end{aligned}$$

(92)

$$\begin{aligned}&= \frac{1}{N} \cdot \frac{\displaystyle e^{\beta \pi _j(b_j, \textbf{a}_{-j})}}{\displaystyle \sum _{a' \in \textbf{A}_j} e^{\beta \pi _j(a',\textbf{a}_{-j})}} \end{aligned}$$

(93)

$$\begin{aligned}&= \frac{1}{N} \cdot p^{\textrm{LD}}_{b_j} (\textbf{a}_{-j}) = \textrm{T}^\textrm{LD}_{\textbf{a},\textbf{b}}. \end{aligned}$$

(94)

The above equality is sufficient to show that $\textbf{T}^\textrm{LD}= \textbf{T}^\textrm{ID}$ (the other cases of the matrix 29 are trivial). Furthermore, when $\beta $ is finite, both the processes will also have identical stationary distributions. That is, $\textbf{u}^\textrm{LD}= \textbf{u}^\textrm{ID}$. $\square $

Proof of Proposition 6

In this proof, we will use the result from proposition 1 in Alós-Ferrer and Netzer [2]. There, it was shown that the stationary distribution of the logit-response dynamics for a potential game with potential $\phi $ (for a finite $\beta $) takes the form

$$\begin{aligned} \textbf{u}^\textrm{LD}_\textbf{a}= \frac{\displaystyle e^{\beta \phi (\textbf{a})}}{\displaystyle \sum _{\textbf{a}' \in \textbf{A}} e^{\beta \phi (\textbf{a}')}}. \end{aligned}$$

(95)

We will use this stationary distribution as the candidate stationary distribution for introspection dynamics and show that $\textbf{u}^\textrm{LD}\textbf{T}^\textrm{ID}= \textbf{u}^\textrm{LD}$. This is sufficient to prove that, with finite $\beta $, $\textbf{u}^\textrm{LD}= \textbf{u}^\textrm{ID}$. We will look at the value of the following expression for two arbitrary neighboring states, $\textbf{a}$ and $\textbf{b}$ (with j as the index of difference between the two states),

$$\begin{aligned}&\textbf{u}^\textrm{LD}_\textbf{a}\textrm{T}^\textrm{ID}_{\textbf{a},\textbf{b}} - \textbf{u}^\textrm{LD}_\textbf{b}\textrm{T}^\textrm{ID}_{\textbf{b},\textbf{a}} \end{aligned}$$

(96)

$$\begin{aligned}&\quad = \frac{1}{N(m_j - 1)} \cdot \frac{1}{\displaystyle \sum _{\textbf{a}' \in \textbf{A}} e^{\beta \phi (\textbf{a}')}} \cdot (e^{\beta \phi (\textbf{a})}p_{a_j \rightarrow b_j}(\textbf{a}_{-j}) - e^{\beta \phi (\textbf{b})} p_{b_j \rightarrow a_j}(\textbf{a}_{-j})) \end{aligned}$$

(97)

$$\begin{aligned}&\quad = \frac{1}{N(m_j - 1)\displaystyle \sum _{\textbf{a}' \in \textbf{A}} e^{\beta \phi (\textbf{a}')}} \cdot \frac{1}{\displaystyle e^{\beta \pi _j(\textbf{b})} + e^{\beta \pi _j(\textbf{a})}} \bigg ( e^{\beta \phi (\textbf{a})} e^{\beta \pi _j(\textbf{b})} - e^{\beta \phi (\textbf{b})} e^{\beta \pi _j(\textbf{a})} \bigg ) \end{aligned}$$

(98)

$$\begin{aligned}&\quad = \frac{1}{N(m_j - 1)\displaystyle \sum _{\textbf{a}' \in \textbf{A}} e^{\beta \phi (\textbf{a}')}} \cdot \frac{e^{\beta \sigma (\textbf{a}_{-j})}}{\displaystyle e^{\beta \pi _j(\textbf{b})} + e^{\beta \pi _j(\textbf{a})}} \bigg ( e^{\beta \phi (\textbf{a})} e^{\beta \phi (\textbf{b})} - e^{\beta \phi (\textbf{b})} e^{\beta \phi (\textbf{a})} \bigg ) \end{aligned}$$

(99)

$$\begin{aligned}&\quad = 0 \end{aligned}$$

(100)

Using the above result, one can then show that,

$$\begin{aligned} \textbf{u}^\textrm{LD}_\textbf{a}\textrm{T}^\textrm{ID}_{\textbf{a},\textbf{b}} - \textbf{u}^\textrm{LD}_\textbf{b}\textrm{T}^\textrm{ID}_{\textbf{b},\textbf{a}}&= 0 \end{aligned}$$

(101)

$$\begin{aligned} \implies \sum _{c \in \textrm{Neb}(\textbf{a})} \textbf{u}^\textrm{LD}_\textbf{a}\textrm{T}^\textrm{ID}_{\textbf{a},\textbf{c}} - \textbf{u}^\textrm{LD}_\textbf{c}\textrm{T}^\textrm{ID}_{\textbf{c},\textbf{a}}&= 0 \end{aligned}$$

(102)

$$\begin{aligned} \implies \textbf{u}^\textrm{LD}_\textbf{a}\bigg ( 1 - \sum _{c \in \textrm{Neb}(\textbf{a})} \textrm{T}^\textrm{ID}_{\textbf{a},\textbf{c}} \bigg ) + \sum _{c \in \textrm{Neb}(\textbf{a})} \textbf{u}^\textrm{LD}_\textbf{c}\textrm{T}^\textrm{ID}_{\textbf{c},\textbf{a}}&= \textbf{u}^\textrm{LD}_\textbf{a}\end{aligned}$$

(103)

$$\begin{aligned} \implies \textbf{u}^\textrm{LD}_\textbf{a}\textrm{T}^\textrm{ID}_{\textbf{a},\textbf{a}} + \sum _{c \in \textrm{Neb}(\textbf{a})} \textbf{u}^\textrm{LD}_\textbf{c}\textrm{T}^\textrm{ID}_{\textbf{c},\textbf{a}}&= \textbf{u}^\textrm{LD}_\textbf{a}\end{aligned}$$

(104)

$$\begin{aligned} \implies \textbf{u}^\textrm{LD}\textbf{T}^\textrm{ID}&= \textbf{u}^\textrm{LD}\end{aligned}$$

(105)

The candidate $\textbf{u}^\textrm{LD}$ is indeed the unique stationary distribution of introspection dynamics too. Therefore, the two processes have the same stationary distribution.$\square $

Proof of Proposition 7

The idea behind the proof of this proposition is similar to the proof of Proposition 6. We consider the stationary distribution of introspection dynamics for additive games, $\textbf{u}^\textrm{ID}$, from Eq. (10), to be the candidate stationary distribution of the logit-response dynamics. Then, showing that $\textbf{u}^\textrm{ID}\textbf{T}^\textrm{LD}= \textbf{u}^\textrm{ID}$ is equivalent to showing that $\textbf{u}^\textrm{LD}= \textbf{u}^\textrm{ID}$. Again, like in the previous proof, we look at the value of the following expression for two arbitrary neighboring states $\textbf{a}$ and $\textbf{b}$ (with j as the index of difference between the states),

$$\begin{aligned}&\textbf{u}^\textrm{ID}_\textbf{a}\textrm{T}^\textrm{LD}_{\textbf{a},\textbf{b}} - \textbf{u}^\textrm{ID}_\textbf{b}\textrm{T}^\textrm{LD}_{\textbf{b},\textbf{a}} \end{aligned}$$

(106)

$$\begin{aligned}&\quad = \underbrace{\frac{1}{N \displaystyle \sum _{a' \in \textbf{A}_j} e^{\beta \pi _j(a',\textbf{a}_{-j})}} \Bigg ( \displaystyle \prod _{l \ne j} \frac{1}{\displaystyle \sum _{a^{''} \in \textbf{A}_l} e^{\beta f_l(a^{''}, \textbf{a}_l)}} \Bigg )}_{\textrm{L} > 0} \Bigg ( \frac{\displaystyle e^{\beta \pi _j(b_j, \textbf{a}_{-j})}}{\displaystyle \sum _{a \in \textbf{A}_j} e^{\beta f_j(a,a_j)}} - \frac{\displaystyle e^{\beta \pi _j(a_j, \textbf{a}_{-j})}}{\displaystyle \sum _{a \in \textbf{A}_j} e^{\beta f_j(a,b_j)}} \Bigg ) \end{aligned}$$

(107)

The product of the first two terms, which we denote as $\textrm{L}$ is strictly greater than 0. We focus on the rest of the expression. Since, $f_j(a,a_j) = \pi _j(a,\textbf{p}) - \pi _j(a_j,\textbf{p})$, is independent of $\textbf{p}$ irrespective of the choice of $\textbf{p}\in \textbf{A}_{-j}$, we use $\textbf{p}= \textbf{a}_{-j}$ in the steps below,

$$\begin{aligned}&\textbf{u}^\textrm{ID}_\textbf{a}\textrm{T}^\textrm{LD}_{\textbf{a},\textbf{b}} - \textbf{u}^\textrm{ID}_\textbf{b}\textrm{T}^\textrm{LD}_{\textbf{b},\textbf{a}} \end{aligned}$$

(108)

$$\begin{aligned}&\quad = \textrm{L} \cdot \frac{1}{\displaystyle \sum _{a \in \textbf{A}_j} e^{\beta \pi _j(a,\textbf{a}_{-j})}} \bigg ( e^{\beta \pi _j(b_j, \textbf{a}_{-j})} e^{\beta \pi _j(a_j, \textbf{a}_{-j})} - e^{\beta \pi _j(a_j, \textbf{a}_{-j})} e^{\beta \pi _j(b_j, \textbf{a}_{-j})} \bigg ) \end{aligned}$$

(109)

$$\begin{aligned}&\quad = 0 \end{aligned}$$

(110)

Now, following the exact same steps from Eq. (101) to Eq. (105), we can show that $\textbf{u}^\textrm{ID}\textbf{T}^\textrm{LD}= \textbf{u}^\textrm{ID}$. Therefore, the two processes have the same stationary distribution, $\textbf{u}^\textrm{ID}= \textbf{u}^\textrm{LD}$.$\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Couto, M.C., Pal, S. Introspection Dynamics in Asymmetric Multiplayer Games. Dyn Games Appl 13, 1256–1285 (2023). https://doi.org/10.1007/s13235-023-00525-8

Download citation

Accepted: 07 August 2023
Published: 15 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s13235-023-00525-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introspection Dynamics in Asymmetric Multiplayer Games

Abstract

Similar content being viewed by others

Structure coefficients and strategy selection in multiplayer games

The modified stochastic game

Nonlinear and Multiplayer Evolutionary Games

1 Introduction

2 Model of Introspection Dynamics in Multiplayer Games

Definition 1

Definition 2

3 Additive Games and Their Properties Under Introspection Dynamics

Proposition 1

Proposition 2

3.1 Example of an Additive Game: Linear Public Goods Game with 2 Actions

Proposition 3

4 Games with Two Actions and Their Properties Under Introspection Dynamics

Proposition 4

Corollary 1

4.1 An Example of a Game with Two Actions: The General Public Goods Game

5 Application: Introspection Learning in a Game with Cooperation and Rewards

6 Discussion and Conclusion

Change history

17 October 2023

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 1222 KB)

Appendices

Appendix 1: Comparison Between Introspection and Logit-Response Dynamics

Proposition 5

Proposition 6

Proposition 7

Appendix 2: Proofs

Proof of Proposition 1

Proof of Proposition 2

Proof of Proposition 3

Proof of Proposition 4

Proof of Corollary 1

Proof of Proposition 5

Proof of Proposition 6

Proof of Proposition 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation