Dynamical systems associated to the $\beta$-core in the repeated prisoner's dilemma

We consider the repeated prisoner's dilemma (PD). We assume that players make their choices knowing only average payoffs from the previous stages. A player's strategy is a function from the convex hull $\mathfrak{S}$ of the set of payoffs into the set $\{C,\,D\}$ ($C$ means cooperation, $D$ -- defection). S. Smale in \cite{smale} presented an idea of good strategies in the repeated PD. If both players play good strategies then the average payoffs tends to the payoff corresponding to the profile $(C,C)$ in PD. We adopt the Smale idea to define semi-cooperative strategies - players do not take as a referencing point the payoff corresponding to the profile $(C,C)$, but they can take an arbitrary payoff belonging to the $\beta$-core of PD. We show that if both players choose the same point in the $\beta$-core then the strategy profile is an equilibrium in the repeated game. If the players choose different points in the $\beta$-core then the sequence of the average payoffs tends to a point in $\mathfrak{S}$. The obtained limit can be treated as a payoff in a new game. In this game the set of players' actions is the set of points in $S$ that corresponds to the $\beta$-core payoffs.


Introduction
Robert Aumann in papers [2], [3], [4], [5] showed that if a payoff p of a normal form game corresponds to a strategy profile belonging to the β-core then there exists a strong equilibrium in the repeated game providing the payoff p. The construction of the strong equilibrium profile in the repeated game is rather complex and bases on the assumption that all players know the full history of the game. Our aim is to consider also the situation when players choose strategies in the repeated game corresponding to different points in the β-core. Then, in general, the course of the game seems heavy to forecast.
We consider two players Prisoner's Dilemma (PD) with payoffs given by where C means to cooperate and Dto defect. The set of β-core payoffs for PD is presented in Figure 1 (bold segments with ends (1, 2.5), (2, 2), (2.5, 1)). The β-core consists of payoffs which are Pareto optimal and individually rational. We assume that in the repeated game players know only both players average payoffs from the previous stages. So, a player's strategy is a function from the convex hull S of the set of vector payoffs into the set of his actions. The vector payoff function u given by (1), the strategy profile s : S → {C, D} 2 and an initial pointx 1 determine a sequence of average payoffsx t byx t+1 = tx t + u(s(x t )) t + 1 The strategies of player 1 and player 2 corresponding to a point v = (v 1 , v 2 ) in the β-core are presented in Figure 1. The strategies are called semi-cooperative strategies and are determined by the point v and a positive constant ε. We show that for an arbitrary initial point x 1 , the sequence of average payoffsx t is convergent to the point v when the strategy profile s = s v consists of the semi-cooperative strategies corresponding to the point v. The profile s v is a Nash equilibrium. The case v = (2, 2) was considered by S. Smale in [16]. The idea of semi-cooperative strategies is motivated by Smale's idea of good strategies.
The main problem, that we consider in the paper, is to study the limit properties of the dynamical system given by (2) in the case when players 1 and 2 choose different points v in the β-core -player 1 chooses point a and player 2 chooses point b. Our main result formulated in Theorem 4.2 states that for an arbitrary initial point the sequence of average payoffs is convergent and the limit is given by The Smale approach to the repeated PD has been recently applied in [1]. E. Akin showed that if player's 1 strategy s 1 is simple, i.e.
where L(x 1 , x 2 ) = ax 1 +bx 2 +c is an affine map such that L(1, 1), L(3, 0) ≤ 0 ≤ L(2, 2), L(0, 3), then every sequence of average payoffs is attracted by the interval {x ∈ S : L(x) = 0}, for arbitrary strategy of player 2. If both players adopt simple strategies then every sequence of average payoffs tends to a point being the intersection of separation lines. In [1], the evolutionary dynamics is used to analyze competition among certain simple strategies. The game with the payoff given by Theorem 4.2 has a continuous strategy set. In the next paper we intend to analyze its replicator dynamics using methods presented in [15].

Smale's good strategies in the repeated prisoner's dilemma
In this section we provide a brief presentation of Smale's approach to the repeated prisoner's dilemma which was presented in [16]. Smale considers PD with payoffs given by (1). The players' actions are interpreted as follows: C means to cooperate and Dto defect 1 . The game is symmetric and the action D dominates the action C for each player (3 > 2 and 1 > 0). The Nash equilibrium is the pair of action (D, D) and the Nash payoff is (1, 1). The Nash payoff is not Pareto optimal. The Pareto frontier contains two segments: the first one is jointing (0, 3) to (2, 2), the second one -(2, 2) to (3, 0). Smale distinguishes one Pareto optimal payoff (2, 2).
He constructs a strategy profile in the repeated PD that is a Nash equilibrium with the payoff equals to (2,2). This kind of result can be treated as a special case of the Folk Theorem. What makes Smale's approach not typical is the way of choosing actions in each repetition. At each stage, the players make their decision basing on the average vector payoffs from the previous repetitions. It means that the domain of strategies is no longer the set of histories but now it is the convex hull of the payoffs. Such strategies are called memory strategies. Each player chooses his memory strategy before the iterated game is started. Players' strategies are fixed during the iteration.
Let the function u : {C, D} 2 → S be given by (1) where S denotes the convex hull of all possible payoffs, i.e. S = conv{(2, 2), (0, 3), (3, 0), (1, 1)}. A memory strategy of player i is a map s i : S → {C, D}. A strategy profile is the pair s = (s 1 , s 2 ) : S → {C, D} 2 . The strategy profile s and an initial point x 1 ∈ S determines the course of the repeated game in the following way: The sequence (x t ) t 1 is the sequence of payoffs and the sequence (x t ) t 1 is the sequence of average payoffs in the repeated game. Fix ε > 0. A good strategy of player 1 is a map s * 1 : S → {C, D} given by Originally, in the paper [16], Smale understands the game (1) in the meaning of the arms race. In his paper the action C is marked with E -easy -which means to disarm, and the action D is marked with T -toughmeans to arm.
A good strategy of player 2 is a map s * 2 : S → {C, D} given by The good strategies are illustrated on Figure 3.  for every x 1 ∈ S.
2. If both players play good strategies s * 1 , s * 2 , then for every x 1 ∈ S.
If the payoff in the repeated game is defined as the upper limit of average payoffs then the strategy profile s * = (s * 1 , s * 2 ) is a Nash equilibrium in the set of memory strategies. The Banach limit Lim is a continuous linear functional defined on the space l ∞ of bounded scalar sequences 2 that is an extension of the functional which associates any convergent sequence with its limit. If the payoff is defined as a Banach limit of the sequence of the average payoffs then the Nash equilibrium s * has an additional interesting property. The construction of good strategies guarantees that the deviating player's payoff will not exceed the good strategy player's payoff by more than ε. We define the payoff in the repeated game by 2 The definition and properties of Banach limit can be found in [9].

Proposition 2.2.
Suppose player 1 plays a good strategy s * 1 . If s = (s * 1 , s 2 ), where s 2 is an arbitrary memory strategy of player 2, then It means that if player 1 plays a good strategy, then his payoff is not smaller than his opponent's payoff minus ε. But the constant ε is controlled by the player 1, so he can choose it as small as he wish. In this sense, we can say that good strategies not only are Nash equilibria in the set of memory strategies, but also they are safe Nash equilibria.

Some properties of the dynamical systems generated by memory strategies
We consider a normal form game G = (N , is the convex hull of the set of vector payoffs u = (u 1 , u 2 , . . . , u N ). The strategy profile s = (s 1 , s 2 , . . . , s N ) determines a map f s : S → S by and the dynamical system β s = (β s t ) t≥1 We say that a sequence (x t ) t≥t 0 ⊂ S is a trajectory of the dynamical system β s if Observe that if x t 0 is the given average payoff after stage t 0 then the trajectory (x t ) t≥t 0 ⊂ S of the dynamical system β s given by (5) is the sequence of average payoffs So, for every ε > 0, there exists T such that for an arbitrary trajectory (x t ) t≥t 0 of the dynamical system β s it holds The following proposition is the deterministic version of the Blackwell approachability result (see [8]). An elementary proof is presented in [14]. Proposition 3.1. Suppose that a set W ⊂ S is closed and a trajectory (x t ) t≥t 0 of the dynamical system β s satisfies Then The point y t in (8) is a proximal point in the set W to the point x t . If the set W is convex and closed and f s (x t ) ∈ W for t ≥ t 0 then (8) holds true. As a corollary from Proposition 3.1 we obtain that Corollary 3.2. If the set W ⊂ S is closed and convex and a trajectory (x t ) t≥t 0 of the dynamical system β s satisfies then Taking W = (−∞, c] in Proposition 3.1 we obtain the following property of real sequences.

Corollary 3.3.
Suppose that (a n ) ∞ n=1 is a bounded sequence in R and (ā n ) ∞ n=1 is the sequence of arithmetic means, i.e.ā n = 1 n n k=1 a k .If we have for almost all n and a fixed constant c ∈ R, then lim sup n→∞ā n ≤ c. Definition 3.4. Let s be a memory strategies profile. We say that a set Z ⊂ S is: 2. an escape set for the dynamical system β s iff every trajectory (x t ) t≥t 0 of the dynamical system 3. an absorbing set for the dynamical system β s iff every trajectory (x t ) t≥t 0 of the dynamical system β s satisfies In the next section we study limit properties of some dynamical systems generated by memory strategies. To show that a trajectory is convergent to a point we will construct a family of absorbing and invariant neighbourhoods of the limit points. Invariance is usually easy to check. To show that a neighbourhood is absorbing we shall use the following lemmas.
Hereafter to the end of the section we fix a memory strategies profile s and consider the dynamical system β s . Proof. Suppose, contrary to our claim, that the set Z is not absorbing. Then there exists a Since A is an escape set then So,x t 3 ∈ B ∪ C and by the invariance of the set B ∪ C ∪ Z we obtain that Since C is an escape set then there existst > t δ such thatxt / ∈ C. So,xt ∈ (B \ Z) ∩ W δ , which contradicts the assumption that (B \ Z) ∩ W δ = ∅. Proof. Suppose, contrary to our claim, that the set Z is not absorbing. Then there exists a trajectory (x t ) t≥t 0 such that Since the set D is invariant and absorbing then there exists t 2 > t 1 such that Thus, f s (x t ) ∈ W , for t > t 2 and by Corollary 3.2 there exists t ε > t 2 such that x t ∈ W for t > t ε . This is a contradiction to the assumption W ∩ (B \ Z) = ∅.
Lemma 3.7. Suppose that Z ⊂ S and f s (Z) ⊂ W , where W is a closed convex subset of S. If there exists ε > 0 such that W ε ∩ Z = ∅ then Z is an escape set.
Proof. Suppose, contrary to our claim, that Z is not an escape set. So, there exists a trajectory So f s (x t ) ∈ W for t > τ . By Corollary 3.2, there exists t > τ such that x t ∈ W for all t > t , which contradicts to (10) and the assumption W ε ∩ Z = ∅.

Semi-cooperative strategies
In this section we introduce semi-cooperative strategies in the repeated PD that are a generalisation of Smale's good strategies. The semi-cooperative strategy of a player is determined by the choice of a point v in the β-core of PD and a positive constant. If both players choose the same point v then the obtained strategy profile is a Nash equilibrium and the vector payoff in the repeated game equals to v. This can be regarded as a very special case of Robert Aumann results presented in [2], [3], [4] and [5], where it was shown that each payoff from the β-core of the stage game can be received as a strong Nash equilibrium in the repeated game. Much more interesting is the situation when players choose semi-cooperative strategies corresponding to different points in the β-core. The limits of trajectories of the dynamical system determined by a semi-cooperative profile are described in Theorem 4.2, which is the main result of the paper.
To recall the definition of the β-core assume that G is a normal form game like in Section 3. A correlated strategy c K of the coalition K ⊂ N is a probability distributions over (the finite set) A K = i∈K A i . The set of correlated strategies of the coalition K is denoted by C K . The correlated strategy c K of the coalition K and the correlated strategy c N \K of the anticoalition N \ K determine a correlated strategy c = c K , c N \K ∈ C N of the full coalition. In the usual way we extend payoffs functions u i onto the set of correlated strategies C N . A correlated strategyc ∈ C N belongs to the β-core (c ∈ C β (G)) iff Taking K = N in (11) we obtain that which is the weak Pareto optimum condition. Taking a coalition K = {i} in (11) we obtain that the payoff u(c) is individually rational, i.e Hereafter G denotes the considered PD with payoffs given by (1). The set of Pareto optimal and individually rational payoffs for G is illustrated in Figure 1. The β-core for PD (in fact its image by u) is the sum of intervals Let us fix a point v in the β-core different to the end points, i.e v ∈ u(C β (G)) \ {(1, 2.5), (2.5, 1)}.
Roughly speaking, player 1 cooperates if the average payoff is located below the line going through the points (1, 1) and v. A semi-cooperative strategy of player 1, determined by the point v and a positive constant ε > 0, is a map s v,ε 1 : S → {C, D} given by and A semi-cooperative strategy of player 2, determined by the point v and ε > 0, is a map s v,ε 2 : S → {C, D} given by the formula: and Semi-cooperative strategies are illustrated in Figure 1.
We say that player 1 is: . This is illustrated in Figure 4. If both players choose the same point v = (v 1 , v 2 ) ∈ u(C β (G)) to determine their semicooperative strategies, then the strategy profile is a Nash equilibrium. We obtain the following result which is similar to Smale's one. Theorem 4.1. Suppose that both players play semi-cooperative strategies s v, 1 1 , s v, 2 2 determined by the same v ∈ u(C β (G)) \ {(1, 2.5), (2.5, 1)} and positive constants ε 1 , ε 2 > 0. If (x t ) t≥1 is an arbitrary trajectory of the dynamical system determined by the strategy profile (s v, 1 1 , s v, 2 2 ), then If player 1 plays the semi-cooperative strategy s v, 1 1 and player 2 plays an arbitrary memory strategy s 2 then an arbitrary trajectory (x t ) t≥1 of the dynamical system determined by the strategy profile (s v, 1 1 , s 2 ) satisfies: Proof. Fix v ∈ u(C β (G)) \ {(1, 2.5), (2.5, 1)} and ε 1 , ε 2 > 0. Let (x t ) t≥1 be a trajectory of the The set ∆ is convex; the sets ∆, Ω 1 , Ω 1 are pairwise disjoint and S = ∆ ∪ Ω 1 ∪ Ω 1 . This situation is illustrated in Figure 5.
To show that the trajectory (x t ) t≥1 converges to v we construct an invariant and absorbing neighborhood O δ (v) of the point v. Fix δ ∈ (0, min { 1 , 2 }). We will denote by l(a, b) the line going through the points a, b ∈ R 2 . Set where P δ = (v 1 − δ, v 2 − δ) and R δ is the intersection point of l 2 and the line {x 1 = v 1 }. The half-plane over (under) a line l ⊂ R 2 will be denoted by e(l) (h(l)). The neighborhood O δ (v) illustrated in Figure 6 is given by To show that the neighbourhood O δ (v) is invariant we divide it into three parts: . By (7) there exists t 1 > 1 such that for any Ifx ∈ Z 1 then f v (x) = (3, 0). So β s v t (x) ∈ e(l 1 ). If t > t 1 then β s v t (x) ∈ {x ∈ S : To show that the set O δ (v) is absorbing we set V = convf v (S) = conv{(2, 2), (0, 3), (3, 0)} and The set B ∪ C ∪ Z is invariant and A and C are escape sets. We have f v (B ∪ C) ⊂ W . There exists a constant θ > 0 such that W θ ∩ (B\Z) = ∅ (See Figure 7). By Lemma 3.5, we obtain Since the diameters of the neighbourhoods O δ (v) tends to zero as δ → 0, then we obtain the convergence of the trajectory (x t ) to the point v.
Now, we consider the dynamics of the system when player 2 chooses an arbitrary memory strategy and player 1 plays the semi-cooperative strategy s v, 1 1 . Ifx 2 t > v 2 then player 1 defects in the next stage and so, player's 2 payoff belongs to {0, 1}. By Corollary 3.3, we obtain that lim sup t→∞ The proof of the inequality lim inf t→∞ x 1 t 1 is similar.
If players have no opportunity to agree the choice of the point v then we should not expect that they will choose the same point. We can treat the choice of the point v as an action of a player in a new game. To define payoffs in this new game we have to know the payoffs in the repeated PD when players 1 and 2 play semi-cooperative strategies corresponding to points a, b, respectively, in the β-core. The main result of the paper concerns this situation in the following. a = (a 1 , a 2 ), b = (b 1 , b 2 ) ∈ u(C β (G)) \ {(1, 2.5), (2.5, 1)} and ε 1 , ε 2 > 0. If (x t ) is an arbitrary trajectory of the dynamical system determined by the strategy profile (s a,ε 1 1 , s b,ε 2 2 ), then
The case a = b was considered in Theorem 4.1.
We construct two neighbourhoods of the point b: where the lines l 1 , l 2 , l 3 are given by (15) for P δ = (a 1 −δ, a 2 −δ) and R δ being the intersection point of l 2 and the line {x 1 = b 1 }. Invariance of the sets U δ (b), O δ (b) is a conclusion from its construction (see Figure 9). By Lemma 3.5, we obtain that the set O δ (b) is absorbing using the same notation as in (18). To obtain that U δ (b) is absorbing we apply Lemma 3.6 setting The case 2 < a 1 < b 1 is symmetric to the case considered above. The last situation is b 1 < a 1 . Two possible choices of a and b are presented in Figure 10. Obviously, we have Figure 10: On the left -the partition of S when both players are egoists and on the rightthe partition of S when b 1 < a 1 and player 1 is the altruist and the player 2 is the egoist The limit of an arbitrary trajectory is a point y = y ε 1 ε 2 which is the intersection cl(∆)∩cl(Ω 3 ). Let P δ ∈ Ω 3 be the unique point satisfying dist(P δ , Ω 1 ) = dist(P δ , Ω 2 ) = δ. Set l 1 = l(P δ , (0, 3)) l 2 = l(P δ , (3, 0)) l 3 = l ((1, 1), b) l 4 = l ((1, 1), a) l 5 = l(P δ , y ε 1 ε 2 ) We construct the neighbourhood of the set ∆: O δ (∆) := ∆ δ ∪ (h(l 1 ) ∩ h(l 2 ) ∩ h(l 3 ) ∩ e(l 4 )) which is illustrated in Figure 11.