Abstract
We consider the repeated prisoner’s dilemma (PD). We assume that players make their choices knowing only average payoffs from the previous stages. A player’s strategy is a function from the convex hull \({\mathfrak {S}}\) of the set of payoffs into the set \(\{C,\,D\}\) (C means cooperation, D—defection). Smale (Econometrica 48:1617–1634, 1980) presented an idea of good strategies in the repeated PD. If both players play good strategies then the average payoffs tend to the payoff corresponding to the profile (C, C) in PD. We adopt the Smale idea to define semi-cooperative strategies—players do not take as a referencing point the payoff corresponding to the profile (C, C), but they can take an arbitrary payoff belonging to the \(\beta \)-core of PD. We show that if both players choose the same point in the \(\beta \)-core then the strategy profile is an equilibrium in the repeated game. If the players choose different points in the \(\beta \)-core then the sequence of the average payoffs tends to a point in \({\mathfrak {S}}\). The obtained limit can be treated as a payoff in a new game. In this game, the set of players’ actions is the set of points in \(\mathfrak {S}\) that corresponds to the \(\beta \)-core payoffs.
Similar content being viewed by others
1 Introduction
The strategic conflict between individual rationality and corporate optimality occurs in many real-life situations (comp. [6]). The prisoner’s dilemma (PD) is a simple model of that divergence of individual and group interests. Cooperation is an irrational strategy in PD, but regardless it is observed in many social dilemmas. The explanation of this phenomena was provided at the bases of different models. The fundamental reason motivating agents for cooperation is the reiteration of the game. In the model of the repeated prisoner’s dilemma (RPD), the game is reiterated infinitely many times with the same agents participation (comp. [13]). Infinite number of repetition is an approximation of a real situation when the number of repetition is large and random. A different model explaining these phenomena bases on the assumption that a large population of agents are matched afresh every period to play the PD (comp. [10]). Our aim is to consider yet another model.
We would like to consider the model of a population of agents matching afresh every period to play the repeated prisoner’s dilemma. The RPD has too complicated structure to analyse it using evolutionary games and population dynamics methods. Our aim in this paper is to replace the RPD by a simpler game.
In this simpler game, we assume that the set of strategies is equivalent to a subset of vector payoffs that correspond to the \(\beta \)-core in the PD. This can be justified in the following way. We assume that players are rational. So we restrict the set of vector payoffs to that payoffs that correspond to a Nash equilibrium strategy profile in the RPD. By Folk Theorem, we restrict to vector payoffs that are individually rational. Further restriction bases on Aumann results concerning strong equilibria in repeated games (comp. [2,3,4,5]). Strong equilibria are resistant not only to an individual player deviation but also to a coalition’s deviation. Aumann proved that strong equilibria in repeated games correspond to the \(\beta \)-core payoffs. In the considered case, \(\beta \)-core consists of individually rational and Pareto-optimal payoffs. Optimal choice of the full coalition should be Pareto optimal. So the postulation that players are individually rational and corporately effective provides as a conclusion that the set of strategies in the simpler game is equivalent to the set of vector payoffs illustrated in Fig. 1. By Aumann results, we know that to every payoff in the \(\beta \)-core it corresponds a strong equilibrium profile in the repeated game. The construction of that strategy profile is complex. The course of the repeated game seems heavy to forecast when players strategies—constructed as above— correspond to different points in the \(\beta \)-core.
Motivated by Smale idea [12], we construct semi-cooperative strategies that correspond to points in the \(\beta \)-core. Having a payoff v in the \(\beta \) core, we construct semi-cooperative strategies profile \((s_1(v),s_2(v))\) that is a Nash equilibrium in the RPD. Moreover, the semi-cooperative strategies have a supplementary crucial property. If the semi-cooperative strategy \(s_1(v)\) of player 1 and the semi-cooperative strategy \(s_2(v')\) of player 2 correspond to different points \(v,\,v'\) in the \(\beta \)-core then the vector payoff in the RPD corresponding to the strategy profile \((s_1(v),s_2(v'))\) is uniquely defined and described in the main result of the paper—Theorem 4.2. Thus, we obtain the payoff function in the simpler game that replace the RPD in a model of population dynamics.
We consider two players prisoner’s dilemma (PD) with payoffs given by
where C means to cooperate and D—to defect. The set of \(\beta \)-core payoffs for the PD is presented in Fig. 1 [bold segments with ends \((1,\,2.5),\,(2,\,2),\,(2.5,\,1)\)].
We assume that in the repeated game players know only both players average payoffs from the previous stages. So a player’s strategy is a function from the convex hull \(\mathfrak {S}\) of the set of vector payoffs into the set of his actions. The vector payoff function u given by (1), the strategy profile \(s:\mathfrak {S}\rightarrow \{C,\,D\}^2\) and an initial point \(\bar{x}_1\) determine a sequence of average payoffs \(\bar{x}_t\) by
The strategies of player 1 and player 2 corresponding to a point \(v=(v_1,\,v_2)\) in the \(\beta \)-core are presented in Fig. 2. The strategies are called semi-cooperative strategies and are determined by the point v and a positive constant \(\varepsilon \). We show that for an arbitrary initial point \(\bar{x}_1\), the sequence of average payoffs \(\bar{x}_t\) is convergent to the point v when the strategy profile \(s=s^v\) consists of the semi-cooperative strategies corresponding to the point v. The profile \(s^v\) is a Nash equilibrium. The case \(v=(2,\,2)\) was considered by Smale [12]. The idea of semi-cooperative strategies is motivated by Smale’s idea of good strategies.
The main problem, that we consider in the paper, is to study the limit properties of the dynamical system given by (2) in the case when players 1 and 2 choose different points v in the \(\beta \)-core—player 1 chooses point a and player 2 chooses point b. Our main result formulated in Theorem 4.2 states that for an arbitrary initial point the sequence of average payoffs is convergent and the limit is given by
The Smale approach to the repeated PD has been recently applied by Akin [1]. E. Akin showed that if player’s 1 strategy \(s_1\) is simple, i.e.
where \(L(x_1,x_2)=ax_1+bx_2+c\) is an affine map such that \(L(1,1),\,L(3,0)\le 0\le L(2,2),\,L(0,3)\), then every sequence of average payoffs is attracted by the interval \(\{x\in S:\;L(x)=0\}\), for an arbitrary strategy of player 2. If both players adopt simple strategies then every sequence of average payoffs tends to a point being the intersection of separation lines. In [1], the evolutionary dynamics is used to analyse competition among certain simple strategies. Simple strategies introduced in [1] correspond to the type of players that we call Balanced player or Egoist (comp. Fig. 4). Roughly speaking, Theorem 5.10 in [1] says that the balanced strategy is a globally stable equilibrium in a population consisting of balanced players and egoists.
We considered the replicator dynamics for a population consisting of Altruists [\(v=(1.5,\,2.25)\)], Balanced players (\(v=(2,\,2)\) and Egoists [\(v=(2.25,\,1.5)\)]. The payoff is given by
We obtained that the state \((\frac{2}{3},\,0,\,\frac{1}{3})\) is the evolutionary stable strategy. So the presence of Altruists completely changes the replicator dynamics.
The game with the payoff given by Theorem 4.2 has a continuous strategy set. In the next paper, we intend to analyse its replicator dynamics using methods presented in [11].
2 Smale’s Good Strategies in the Repeated Prisoner’s Dilemma
In this section, we provide a brief presentation of Smale’s approach to the repeated prisoner’s dilemma which is presented in [12].
Smale considers PD with payoffs given by (1). The players’ actions are interpreted as follows: C means to cooperate and D—to defect.Footnote 1 The game is symmetric, and the action D dominates the action C for each player (\(3>2\) and \(1>0\)). The Nash equilibrium is the pair of action (D, D), and the Nash payoff is (1, 1). The Nash payoff is not Pareto optimal. The Pareto frontier contains two segments: the first one is jointing (0, 3) to (2, 2), the second one—(2, 2) to (3, 0). Smale distinguishes one Pareto-optimal payoff (2, 2).
He constructs a strategy profile in the repeated PD that is a Nash equilibrium with the payoff equals to \((2,\,2)\). This kind of result can be treated as a special case of the Folk Theorem. What makes Smale’s approach not typical is the way of choosing actions in each repetition. At each stage, the players make their decision basing on the average vector payoffs from the previous repetitions. It means that the domain of strategies is no longer the set of histories, but now it is the convex hull of the payoffs. Such strategies are called memory strategies. Each player chooses his memory strategy before the iterated game is started. Players’ strategies are fixed during the iteration.
Let the function \(u:\{C,D\}^2\rightarrow {\mathfrak {S}} \) be given by (1) where \({\mathfrak {S}}\) denotes the convex hull of all possible payoffs, i.e. \({\mathfrak {S}}=\hbox {conv}\{(2,2),\)\((0,3),(3,0),(1,1)\}\). A memory strategy of player i is a map \(s_i:{\mathfrak {S}} \rightarrow \{C,D\}\). A strategy profile is the pair \(s=(s_1,s_2):{\mathfrak {S}} \rightarrow \{C,D\}^2\). The strategy profile s and an initial point \(x_1\in {\mathfrak {S}}\) determines the course of the repeated game in the following way:
The sequence \((x_t)_{t\geqslant 1}\) is the sequence of payoffs, and the sequence \((\overline{x}_t)_{t\geqslant 1}\) is the sequence of average payoffs in the repeated game.
Fix \(\varepsilon >0\). A good strategy of player 1 is a map \(s_1^*: {\mathfrak {S}} \rightarrow \{C,D\}\) given by
A good strategy of player 2 is a map \(s_2^*: {\mathfrak {S}} \rightarrow \{C,D\}\) given by
The good strategies are illustrated in Fig. 3.
The main result presented in section 1 of [12] is the following theorem.
Theorem 2.1
(Smale)
-
1.
If player 1 plays a good strategy \(s_1^*\) and player 2 plays an arbitrary strategy \(s_2\) then the sequence of average payoffs \(\overline{x}_t =(\overline{x}^1_t,\overline{x}^2_t )\) satisfies
$$\begin{aligned} \displaystyle \liminf _{t\rightarrow \infty } \overline{x}^1_t \geqslant 1 \qquad \text{ and } \qquad \limsup _{t\rightarrow \infty } \overline{x}^2_t \leqslant 2 \end{aligned}$$for every \(x_1\in {\mathfrak {S}}\).
-
2.
If both players play good strategies \(s_1^*, s_2^*\), then
$$\begin{aligned} \displaystyle \overline{x}_t \xrightarrow [t\rightarrow \infty ]{} (2,2) \end{aligned}$$for every \(x_1\in {\mathfrak {S}}\).
If the payoff in the repeated game is defined as the upper limit of average payoffs then the strategy profile \(s^*=(s_1^*,\,s_2^*)\) is a Nash equilibrium in the set of memory strategies.
The Banach limit Lim is a continuous linear functional defined on the space \(l^\infty \) of bounded scalar sequencesFootnote 2 that is an extension of the functional which associates any convergent sequence with its limit. If the payoff is defined as a Banach limit of the sequence of the average payoffs then the Nash equilibrium \(s^*\) has an additional interesting property. The construction of good strategies guarantees that the deviating player’s payoff will not exceed the good strategy player’s payoff by more than \(\varepsilon \). We define the payoff in the repeated game by
Proposition 2.2
Suppose player 1 plays a good strategy \(s_1^*\). If \(s=(s_1^*,s_2)\), where \(s_2\) is an arbitrary memory strategy of player 2, then
for every \(x_1\in \mathcal {S}\).
It means that if player 1 plays a good strategy, then his payoff is not smaller than his opponent’s payoff minus \(\varepsilon \). But the constant \(\varepsilon \) is controlled by the player 1, so he can choose it as small as he wish. In this sense, we can say that good strategies not only are Nash equilibria in the set of memory strategies, but also they are safe Nash equilibria.
3 Some Properties of the Dynamical Systems Generated by Memory Strategies
We consider a normal form game \(G=\left( {\mathcal {N}},(A_i)_{i\in {\mathcal {N}}},(u_i)_{i\in {\mathcal {N}}}\right) \), where \({\mathcal {N}}=\{1,\ldots , N\}\) is the set of players, \(A_i\) is a finite set of actions of player i and \(u_i:A=A_1\times A_2\times \cdots \times A_N\rightarrow {\mathbb {R}}\) is the payoff function of player i.
A memory strategy of player i is a function \(s_i:{\mathfrak {S}}\rightarrow A_i\), where \({\mathfrak {S}}:= \hbox {conv}\{u(a) \, : \, a\in A\}\) is the convex hull of the set of vector payoffs \(u=(u_1,\,u_2,\ldots ,u_N)\). The strategy profile \(s=(s_1,\,s_2,\ldots ,\,s_N)\) determines a map \(f^s:{\mathfrak {S}}\rightarrow {\mathfrak {S}}\) by
and the dynamical system \(\beta ^s=(\beta ^s_t)_{t\ge 1}\)
We say that a sequence \((\overline{x}_t)_{t\ge t_0}\subset {\mathfrak {S}}\) is a trajectory of the dynamical system \(\beta ^s\) if
Observe that if \(\overline{x}_{t_0}\) is the given average payoff after stage \(t_0\) then the trajectory \((\overline{x}_t)_{t\ge t_0}\subset {\mathfrak {S}}\) of the dynamical system \(\beta ^s\) given by (5) is the sequence of average payoffs
Since
and the set \({\mathfrak {S}}\) is bounded, then
So, for every \(\varepsilon >0\), there exists T such that for an arbitrary trajectory \((\overline{x}_t)_{t\ge t_0}\) of the dynamical system \(\beta ^s\) it holds
The following proposition is the deterministic version of the Blackwell approachability result (see [7]). An elementary proof is presented in [8].
Proposition 3.1
Suppose that a set \(W\subset {\mathfrak {S}}\) is closed and a trajectory \((\overline{x}_t)_{t\ge t_0}\) of the dynamical system \(\beta ^s\) satisfies
Then,
The point \(y_t\) in (8) is a proximal point in the set W to the point \(\overline{x}_t\). If the set W is convex and closed and \(f^s(\overline{x}_t)\in W\) for \(t\ge t_0\) then (8) holds true. As a corollary from Proposition 3.1, we obtain that
Corollary 3.2
If the set \(W\subset {\mathfrak {S}}\) is closed and convex and a trajectory \((\overline{x}_t)_{t\ge t_0}\) of the dynamical system \(\beta ^s\) satisfies
then
Taking \(W=(-\,\infty ,c]\) in Proposition 3.1, we obtain the following property of real sequences.
Corollary 3.3
Suppose that \((a_n)_{n=1}^\infty \) is a bounded sequence in \( {\mathbb {R}}\) and \((\bar{a}_n)_{n=1}^\infty \) is the sequence of arithmetic means, i.e. \(\bar{a}_n=\frac{1}{n}\sum _{k=1}^n a_k\).If we have
for almost all n and a fixed constant \(c\in {\mathbb {R}}\), then
Definition 3.4
Let s be a memory strategies profile. We say that a set \(Z\subset {\mathfrak {S}} \) is:
-
1.
invariant for the dynamical system \(\beta ^s\) iff
$$\begin{aligned}\exists _{t_Z\ge 1} \quad \forall _{x\in Z} \quad \forall _{t\geqslant t_Z} \quad \frac{tx+f^s(x)}{t+1}\in Z, \end{aligned}$$ -
2.
an escape set for the dynamical system \(\beta ^s\) iff every trajectory \((\overline{x}_t)_{t\ge t_0}\) of the dynamical system \(\beta ^s\) satisfies
$$\begin{aligned} \forall _{\tau \ge t_0}\quad \exists _{t>\tau } \quad \overline{x}_{t}\notin Z. \end{aligned}$$ -
3.
an absorbing set for the dynamical system \(\beta ^s\) iff every trajectory \((\overline{x}_t)_{t\ge t_0}\) of the dynamical system \(\beta ^s\) satisfies
$$\begin{aligned} \forall _{\tau \ge t_0}\quad \exists _{t>\tau } \quad \overline{x}_{t}\in Z. \end{aligned}$$
In the next section, we study limit properties of some dynamical systems generated by memory strategies. To show that a trajectory is convergent to a point we will construct a family of absorbing and invariant neighbourhoods of the limit points. Invariance is usually easy to check. To show that a neighbourhood is absorbing, we shall use the following lemmas.
Hereafter to the end of the section, we fix a memory strategies profile s and consider the dynamical system \(\beta ^s\).
Lemma 3.5
Let \(V=\overline{\mathrm{conv}}f^s({\mathfrak {S}})\), \(\varepsilon >0\) and \( V^{\epsilon }=A\cup B\cup C\cup Z\), where the sets A, B, C are pairwise disjoint. Suppose that the set \(B\cup C\cup Z\) is invariant and A, C are the escape sets. If there exist a convex set \(W\subset V^\varepsilon \) and \(\delta >0\) such that \(f^s(B\cup C)\subset W\) and \(W^\delta \cap (B\setminus Z)=\emptyset \) then the set Z is absorbing.
Proof
Suppose, contrary to our claim, that the set Z is not absorbing. Then, there exists a trajectory \((\overline{x}_t)_{t\ge t_0}\) such that
where \(t_{\epsilon }\) is from Corollary 3.2, such that \(\overline{x}_t\in V^{\epsilon }\) for all \(t>t_{\epsilon }\). Since the set \(B\cup C\cup Z\) is invariant, there exists \(t_2:=t_{B\cup C\cup Z}\) such that \(\beta ^s_t(\bar{x})\in B\cup C\cup Z\) for every \(\bar{x}\in B\cup C\cup Z\) and \(t>t_2\). Since A is an escape set
So \(\bar{x}_{t_3}\in B\cup C\) and by the invariance of the set \(B\cup C\cup Z\), we obtain that
By Corollary 3.2, there exists \(t_\delta >t_3\) such that
Since C is an escape set, there exists \(\bar{t}>t_\delta \) such that \(\bar{x}_{\bar{t}}\notin C\). So \(\bar{x}_{\bar{t}}\in (B\setminus Z)\cap W^\delta \), which contradicts the assumption that \((B\setminus Z)\cap W^\delta =\emptyset \). \(\square \)
Lemma 3.6
Assume that a set \(D\subset {\mathfrak {S}}\) is invariant and absorbing and \(D=B\cup Z\). If there exists a closed convex set \(W\subset {\mathfrak {S}}\) and \(\varepsilon >0\) such that \(f^s(B)\subset W\) and \(W^{\epsilon }\cap (B\backslash Z)=\emptyset \) then the set Z is absorbing.
Proof
Suppose, contrary to our claim, that the set Z is not absorbing. Then, there exists a trajectory \((\overline{x}_t)_{t\ge t_0}\) such that
Since the set D is invariant and absorbing, there exists \(t_2>t_1\) such that
So \(\overline{x}_t\in B\setminus Z\) for \(t>t_2\). Thus, \(f^s(\overline{x}_t)\in W\), for \(t>t_2\) and by Corollary 3.2 there exists \(t_\varepsilon >t_2\) such that \(\overline{x}_{t}\in W^{\epsilon }\) for \(t>t_\varepsilon \). This is a contradiction to the assumption \(W^{\epsilon }\cap (B\setminus Z)=\emptyset \). \(\square \)
Lemma 3.7
Suppose that \(Z\subset {\mathfrak {S}}\) and \(f^s(Z)\subset W\), where W is a closed convex subset of \({\mathfrak {S}}\). If there exists \(\varepsilon >0\) such that \(W^\varepsilon \cap Z=\emptyset \) then Z is an escape set.
Proof
Suppose, contrary to our claim, that Z is not an escape set. So there exists a trajectory \((\overline{x}_{t})_{t\ge t_0}\) such that
So \(f^s(\overline{x}_{t})\in W\) for \(t>\tau \). By Corollary 3.2, there exists \(t_{\epsilon }>\tau \) such that \(\overline{x}_{t}\in W^{\epsilon }\) for all \(t>t_{\epsilon }\), which contradicts to (10) and the assumption \(W^\varepsilon \cap Z=\emptyset \). \(\square \)
4 Semi-cooperative Strategies
In this section, we introduce semi-cooperative strategies in the repeated PD that are a generalisation of Smale’s good strategies. The semi-cooperative strategy of a player is determined by the choice of a point v in the \(\beta \)-core of PD and a positive constant. If both players choose the same point v then the obtained strategy profile is a Nash equilibrium and the vector payoff in the repeated game equals to v. This can be regarded as a very special case of Robert Aumann results presented in [2,3,4,5], where it was shown that each payoff from the \(\beta \)-core of the stage game can be received as a strong Nash equilibrium in the repeated game. Much more interesting is the situation when players choose semi-cooperative strategies corresponding to different points in the \(\beta \)-core. The limits of trajectories of the dynamical system determined by a semi-cooperative profile are described in Theorem 4.2, which is the main result of the paper.
To recall the definition of the \(\beta \)-core assume that G is a normal form game like in Sect. 3. A correlated strategy \(c^{{\mathcal {K}}}\) of the coalition \({\mathcal {K}}\subset {\mathcal {N}}\) is a probability distributions over (the finite set) \(A^{{\mathcal {K}}}=\prod _{i\in {\mathcal {K}} } A_i\). The set of correlated strategies of the coalition \({\mathcal {K}}\) is denoted by \(C^{{\mathcal {K}}}\). The correlated strategy \(c^{{\mathcal {K}}}\) of the coalition \({\mathcal {K}}\) and the correlated strategy \(c^{{\mathcal {N}}\setminus {\mathcal {K}}}\) of the anti-coalition \({\mathcal {N}}\setminus {\mathcal {K}}\) determine a correlated strategy \(c=\left( c^{\mathcal {K}},c^{{\mathcal {N}}\backslash {\mathcal {K}}}\right) \in C^{{\mathcal {N}}}\) of the full coalition. In the usual way, we extend payoffs functions \(u_i\) onto the set of correlated strategies \(C^{{\mathcal {N}}}\).
A correlated strategy \(\tilde{c} \in C^{{\mathcal {N}}}\) belongs to the \(\beta \)-core \(\left( \tilde{c}\in {\mathcal {C}}_\beta (G)\right) \) iff
Taking \({\mathcal {K}}={\mathcal {N}}\) in (11), we obtain that
which is the weak Pareto optimal condition.
Taking a coalition \({\mathcal {K}}=\{i\}\) in (11), we obtain that the payoff \(u(\tilde{c})\) is individually rational, i.e
Hereafter, G denotes the considered PD with payoffs given by (1). The set of Pareto-optimal and individually rational payoffs for G is illustrated in Fig. 1. The \(\beta \)-core for PD (in fact its image by u) is the sum of intervals
Let us fix a point v in the \(\beta \)-core different to the end points, i.e
Roughly speaking, player 1 cooperates if the average payoff is located below the line going through the points \((1,\,1)\) and v. A semi-cooperative strategy of player 1, determined by the point v and a positive constant \(\varepsilon >0\), is a map \(s_1^{v,\varepsilon }:{\mathfrak {S}}\rightarrow \{C,D\}\) given by
where
and
A semi-cooperative strategy of player 2, determined by the point v and \(\varepsilon >0\), is a map \(s_2^{v,\varepsilon }:{\mathfrak {S}}\rightarrow \{C,D\}\) given by the formula:
where
and
Semi-cooperative strategies are illustrated in Fig. 2.
We say that player 1 is: an egoist if \(v_1>v_2\); an altruist if \(v_1<v_2\); a balanced player if \(v_1=v_2(=2)\). The second player is: an egoist if \(v_2>v_1\); an altruist if \(v_2<v_1\); a balanced player if \(v_1=v_2(=2)\). This is illustrated in Fig. 4.
If both players choose the same point \(v=(v_1,v_2)\in u({\mathcal {C}}_\beta (G))\) to determine their semi-cooperative strategies, then the strategy profile is a Nash equilibrium. We obtain the following result which is similar to Smale’s one.
Theorem 4.1
Suppose that both players play semi-cooperative strategies \(s_1^{v,\epsilon _1}, s_2^{v,\epsilon _2}\) determined by the same \(v\in u({\mathcal {C}}_\beta (G))\,\backslash \,\)\( \{(1,2.5),\)\((2.5,1)\}\) and positive constants \(\varepsilon _1,\,\varepsilon _2>0\). If \((\bar{x}_t)_{t\ge 1}\) is an arbitrary trajectory of the dynamical system determined by the strategy profile \((s_1^{v,\epsilon _1}, s_2^{v,\epsilon _2})\), then
If player 1 plays the semi-cooperative strategy \(s_1^{v,\epsilon _1}\) and player 2 plays an arbitrary memory strategy \(s_2\) then an arbitrary trajectory \((\bar{x}_t)_{t\ge 1}\) of the dynamical system determined by the strategy profile \((s_1^{v,\epsilon _1}, s_2)\) satisfies:
Proof
Fix \(v\in u({\mathcal {C}}_\beta (G))\,\backslash \,\)\( \{(1,2.5),\)\((2.5,1)\}\) and \(\varepsilon _1,\,\varepsilon _2>0\). Let \((\bar{x}_t)_{t\ge 1}\) be a trajectory of the dynamical system \(\beta ^{s^v}\), where \(s^{v}\,=\,(s_1^{v,\epsilon _1}, s_2^{v,\epsilon _2}) \) and \(f^v=u\circ s^v\).
Let
The set \(\Delta \) is convex; the sets \(\Delta \), \(\Omega _1\), \(\Omega _1\) are pairwise disjoint and \({\mathfrak {S}}\,=\,\Delta \,\cup \,\Omega _1\,\cup \,\Omega _1\). This situation is illustrated in Fig. 5.
Observe that
To show that the trajectory \((\bar{x}_t)_{t\ge 1}\) converges to v we construct an invariant and absorbing neighbourhood \(O_{\delta }(v)\) of the point v. Fix \(\delta \in \left( 0,\min \left\{ \epsilon _1, \epsilon _2 \right\} \right) \). We will denote by \(l(a,\,b)\) the line going through the points \(a,\,b\in {\mathbb {R}}^2\). Set
where \(P_{\delta } = (v_1-\delta ,v_2-\delta )\) and \(R_{\delta }\) is the intersection point of \(l_2\) and the line \(\{x_1=v_1\}\). The half-plane over (under) a line \(l\subset {\mathbb {R}}^2\) will be denoted by e(l) (h(l)). The neighbourhood \(O_{\delta }(v)\) illustrated in Fig. 6 is given by
To show that the neighbourhood \(O_{\delta }(v)\) is invariant we divide it into three parts: \(Z_1:=\Omega _1\cap O_{\delta }(v)\), \(Z_2:=\Omega _2\cap O_{\delta }(v)\), \(Z_3:= \Delta \cap O_{\delta }(v)\). By (7), there exists \(t_1>1\) such that for any \(t>t_1\)
If \(\bar{x}\in Z_1\) then \(f^v(\bar{x})=(3,0)\). So \(\beta ^{s^v}_t(\overline{x})\in e(l_1)\). If \(t>t_1\) then \(\beta ^{s^v}_t(\overline{x})\in \{x\in {\mathfrak {S}}:\,x_1<v_1+\delta ,\,x_2>v_2-\delta \}\). Since \(e(l_1)\cap \{x\in {\mathfrak {S}}:\,x_1<v_1+\delta ,\,x_2>v_2-\delta \}\subset O_\delta (v)\) we obtain that \(\beta ^{s^v}_t(\overline{x})\in O_\delta (v)\).
If \(\bar{x}\in Z_2\) then \(f^v(\bar{x})=(0,3)\). So \(\beta ^{s^v}_t(\overline{x})\in e(l_2)\cap e(l_3)\). If \(t>t_1\) then \(\beta ^{s^v}_t(\overline{x})\in \{x\in {\mathfrak {S}}:\,v_1-\delta<x_1<v_1+\delta \}\). Since \(e(l_2)\cap e(l_3)\cap \{x\in {\mathfrak {S}}:\,v_1-\delta<x_1<v_1+\delta \}\subset O_\delta (v)\), we obtain that \(\beta ^{s^v}_t(\overline{x})\in O_\delta (v)\).
If \(\bar{x}\in Z_3\) then \(f^v(\bar{x})=(2,2)\). So \(\beta ^{s^v}_t(\overline{x})\in e(l_1)\cap e(l_2)\cap e(l_3)\). If \(t>t_1\) then \(\beta ^{s^v}_t(\overline{x})\in \{x\in {\mathfrak {S}}:\,x_1<v_1+\delta \}\). Since \(e(l_1)\cap e(l_2)\cap e(l_3)\cap \{x\in {\mathfrak {S}}:\,x_1<v_1+\delta \}= O_\delta (v)\), we obtain that \(\beta ^{s^v}_t(\overline{x})\in O_\delta (v)\).
So the neighbourhood \(O_{\delta }(v)\) is invariant.
To show that the set \(O_{\delta }(v)\) is absorbing we set \(V=\hbox {conv}\,f^v({\mathfrak {S}})=\hbox {conv}\{(2,2),(0,3),(3,0)\}\) and
The set \(B\cup C\cup Z\) is invariant, and A and C are escape sets. We have \(f^v(B\cup C)\subset W\). There exists a constant \(\theta >0\) such that \(W^{\theta } \cap (B\backslash Z)= \emptyset \) (see Fig. 7). By Lemma 3.5, we obtain that the set \( Z\,=\,O_{\delta }(v)\) is absorbing.
Since the diameter of the neighbourhoods \(O_{\delta }(v)\) tends to zero as \(\delta \rightarrow 0\), we obtain the convergence of the trajectory \((\bar{x}_t)\) to the point v.
Now, we consider the dynamics of the system when player 2 chooses an arbitrary memory strategy and player 1 plays the semi-cooperative strategy \(s_1^{v,\epsilon _1}\). If \(\bar{x}^2_t>v_2\) then player 1 defects in the next stage and so player’s 2 payoff belongs to \(\{0,\,1\}\). By Corollary 3.3, we obtain that
The proof of the inequality
is similar. \(\square \)
If players have no opportunity to agree the choice of the point v then we should not expect that they will choose the same point. We can treat the choice of the point v as an action of a player in a new game. To define payoffs in this new game, we have to know the payoffs in the repeated PD when players 1 and 2 play semi-cooperative strategies corresponding to points a, b, respectively, in the \(\beta \)-core. The main result of the paper concerns this situation in the following.
Theorem 4.2
Let \(a=(a_1,a_2),b=(b_1,b_2)\in u({\mathcal {C}}_\beta (G))\,\backslash \,\)\( \{(1,2.5),\)\((2.5,1)\}\) and \(\varepsilon _1,\varepsilon _2>0\). If \((\bar{x}_t)\) is an arbitrary trajectory of the dynamical system determined by the strategy profile \((s_1^{a,\varepsilon _1},\,s_2^{b,\varepsilon _2})\), then
where \(y^{\varepsilon _1,\varepsilon _2}\) is a point in \({\mathfrak {S}}\) and
The case \(a=b\) was considered in Theorem 4.1.
Proof
Set \(s^{*}\,=\,(s_1^{a,\epsilon _1}, s_2^{b,\epsilon _2}) \), \(f^*=u\circ s^*\) and
For \(\delta \in (0,\delta _0)\) there exists \(t_\delta >1\) such that the condition (17) is satisfied for \(t>t_\delta \).
We denote by \(\beta ^*\) the dynamical system determined by the strategy \(s^*\), i.e. \(\beta ^*_t(\bar{x})=\frac{t\bar{x}+f^*(\bar{x})}{t+1}\). Set
If \(a_1\le b_1\) then \(\Omega _3=\emptyset \). The case \(a_1=b_1\) has been proved in Theorem 4.1.
In the case \(a_1< 2\le b_1\), we set \(V=\hbox {conv}\{(0,\,3),\,(2,\,2),\,(3,\,0)\}\) and observe that \(V^\varepsilon \cap \Delta \) is absorbing (comp. Corollary 3.2). Since \(f^*(x)=(2,\,2)\) for \(x\in V^\varepsilon \cap \Delta \) and \((2,\,2)\in V^\varepsilon \cap \Delta \), every trajectory is convergent to \((2,\,2)\).
The case \(a_1\le 2<b_1\) is symmetric to the above one.
The case \(a_1<b_1<2\) is illustrated in Fig. 8.
We construct two neighbourhoods of the point b:
where the lines \(l_1,\,l_2,\,l_3\) are given by (15) for \(P_{\delta } = (a_1-\delta ,a_2-\delta )\) and \(R_{\delta }\) being the intersection point of \(l_2\) and the line \(\{x_1=b_1\}\). Invariance of the sets \(U_\delta (b)\), \(O_\delta (b)\) is a conclusion from its construction (see Fig. 9). By Lemma 3.5, we obtain that the set \(O_\delta (b)\) is absorbing using the same notation as in (18). To obtain that \(U_\delta (b)\) is absorbing we apply Lemma 3.6 setting \(D=O_\delta (b)\), \(Z=U_\delta (b)\), \(B=D\setminus Z\), \(W=\hbox {conv}\{(3,\,0),\,(2,\,2)\}\).
The case \(2< a_1 < b_1\) is symmetric to the case considered above.
The last situation is \(b_1<a_1\). Two possible choices of a and b are presented in Fig. 10.
Obviously, we have
The limit of an arbitrary trajectory is a point \(y=y^{\varepsilon _1\,\varepsilon _2}\) which is the intersection \(cl(\Delta ) \,\cap \, cl(\Omega _3)\). Let \(P_{\delta }\in \Omega _3\) be the unique point satisfying \(dist(P_{\delta },\Omega _1)= dist(P_{\delta },\Omega _2)=\delta \). Set
We construct the neighbourhood of the set \(\Delta \):
which is illustrated in Fig. 11.
We show that \(O_{\delta }(\Delta )\) is invariant. Fix \(\bar{x}=(x_1,\,x_2)\in O_\delta (\Delta )\) and \(t>t_\delta \). Set \(C:=h(l_1)\cap h(l_2)\cap h(l_3)\cap e (l_4)\).
Consider the case \(\bar{x}\in \Omega _1\) and \(x_2>y_2\). Then \(\bar{x}\) and \(f^*(\bar{x})=(3,\,0)\) belong to \(h(l_2)\cap h(l_3)\). So \(\beta ^*_t(\bar{x}) \in h(l_2)\cap h(l_3)\). Since \(\hbox {dist}(\bar{x},h(l_4))\ge \varepsilon _1>\delta \)\(\hbox {dist}(\beta ^*_t(\bar{x}),\,h(l_4))>0\) and so \(\beta ^*_t(\bar{x})\in e(l_4)\). Since \(\hbox {dist}(\bar{x},\,e(l_3))<\varepsilon _2\)\(\hbox {dist}(\beta ^*_t(\bar{x}),\,e(l_3))<\varepsilon _2+\delta \). We have \(\{z\in {\mathfrak {S}}:\;\hbox {dist}(z,e(l_3))<\varepsilon _2+\delta \}\cap h(l_2)\subset h(l_1)\). So \(\beta ^*_t(\bar{x})\in h(l_1)\). Thus, we obtain that \(\beta ^*_t(\bar{x})\in C\subset O_\delta (\Delta )\).
If \(\bar{x}\in \Omega _1\) and \(x_2\leqslant y_2\) then \(\beta ^*_t(\bar{x})\in O_\delta (\Delta )\).
The case \(\bar{x}\in \Omega _2\) is symmetric to the above one.
In the case \(\bar{x}\in \Omega _3\), observe that \(\bar{x}\) and \(f^*(\bar{x})=(1,\,1)\) belong to the convex set C. Thus, \(\beta ^*_t(\bar{x})\in C\).
In the last case \(\bar{x}\in \Delta \), we have \(\beta ^*_t(\bar{x})\in \Delta ^\delta \).
To obtain that \(O_{\delta }(\Delta )\) is absorbing, we set
and we apply Lemma 3.5. The above sets are illustrated in Fig. 12. The set \(B\cup C\cup Z\) is invariant, and the sets A, C are escape sets. We have \(f^*(B\cup C)\subset W\). The neighbourhood \(O_{\delta }(\Delta )\) is defined in such way that there exists a constant \(\theta >0\) such that
By Lemma 3.5, the neighbourhood \(O_{\delta }(\Delta )\) is absorbing.
We define the neighbourhood \(O_{\delta }(y)\) of the point y by
where \(l_6\) is the line given by the equation: \(x_1+x_2=y_1+y_2-\delta \).
Using similar arguments as in the proof of the invariance of \(O_\delta (\Delta )\), we show that \(O_{\delta }(y)\) is invariant. Set \(D=O_{\delta }(\Delta )\), \(Z=O_{\delta }(y)\), \(B=D\setminus Z\) and \(W=\hbox {conv}\{(0,\,3),\,(2,\,2),\,(3,\,0)\}\). By Lemma 3.6, we obtain that \(O_{\delta }(y)\) is absorbing.
Since the diameter of \(O_\delta (y)\) tends to zero as \(\delta \rightarrow 0\), y is the limit of an arbitrary trajectory in the considered case.
Using elementary calculations, we obtain that the distance between the point \(y^{\varepsilon _1\,\varepsilon _2}\) and the point \((1,\,1)\) equals to
where \(\alpha \) denotes the angle between lines \(l((1,\,1),\,a)\) and \(l((1,\,1),\,b)\). \(\square \)
References
Akin E (2017) Good strategies for the iterated prisoner’s dilemma: Smale vs Markov. J Dyn Games 4:217–253
Aumann RJ (1959) Acceptable points in general cooperative n-person games. In: Luce RD, Tucker AW (eds) Contributions in the theory of games IV, annals of mathematics studies 40, pod red. Princeton University Press, Princeton, pp 287–324
Aumann RJ (1960) Acceptable points in games of perfect information. Pac J Math 10:381–417
Aumann RJ (1961) The core of a cooperative game without side payments. Trans Am Math Soc 98:539–552
Aumann RJ (1967) A survey of cooperative games without side payments. In: Shubik M (ed) Essays in mathematical economics in honor of Oscar Morgenstern pod red. Princeton University Press, Princeton, pp 3–27
Axelrod R (1984) The evolution of cooperation. Basic Books Inc., Publishers, New York
Blackwell D (1956) An analog of the minimax theorem for vector payoffs. Pac J Math 6:1–8
Kufel T, Plaskacz S, Zwierzchowska J (2018) Strong and safe Nash equilibrium in some repeated 3-player games (Preprint)
Lax PD (2002) Functional analysis. Wiley, New York
Palomino F, Vega-Redondo F (1999) Convergence of aspirations and (partial) cooperation in the prisoner’s dilemma. Int J Game Theory 28:465–488
Ruijgrok M, Ruijgrok TW (2015) An effective replicator equation for games with a continuous strategy set. Dyn Games Appl 5:157–179
Smale S (1980) The prisoner’s dilemma and dynamical systems associated to non-cooperative games. Econometrica 48:1617–1634
Sorin S (1992) Chapter 4. Repeated games with complete information. In: Aumann RJ, Hart S (eds) Handbook of game theory, volume 1, red. Elsevier Science Publishers B.V, Amsterdam
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Plaskacz, S., Zwierzchowska, J. Dynamical Systems Associated with the \(\beta \)-Core in the Repeated Prisoner’s Dilemma. Dyn Games Appl 9, 217–235 (2019). https://doi.org/10.1007/s13235-018-0262-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13235-018-0262-x