A complete folk theorem for finitely repeated games

This paper analyzes the set of pure strategy subgame perfect Nash equilibria of any finitely repeated game with complete information and perfect monitoring. The main result is a complete characterization of the limit set, as the time horizon increases, of the set of pure strategy subgame perfect Nash equilibrium payoff vectors of the finitely repeated game. This model includes the special case of observable mixed strategies.

payoff defined by Wen (1994). The main theorem nests earlier results of Benoit and Krishna (1985), Smith (1995), and Demeze-Jouatsa and Wilson (2019). Whether non-Nash outcomes of the stage-game can be sustained via subgame perfect Nash equilibria of the finitely repeated game depends on whether players can be incentivized to abandon their short term interests and to follow some collusive paths that have greater long-run average payoffs. There are two extreme cases. On the one hand, in any finite repetition of a stage-game that has a unique Nash equilibrium payoff vector such as the prisoners' dilemma, only the stage-game Nash equilibrium payoff vector is sustainable by subgame perfect Nash equilibria of finite repetitions of that stage-game. On the other hand, for stage-games in which all players receive different Nash equilibrium payoffs such as the battle of sexes, the limit perfect folk theorem holds: any feasible and individually rational payoff vector of the stage-game is achievable as the limit payoff vector of a sequence of subgame perfect Nash equilibria of the finitely repeated game as the time horizon goes to infinity. Benoit and Krishna (1985) established that for the limit perfect folk theorem to hold, it is sufficient that the dimension of the set of feasible payoff vectors of the stagegame equals the number of players and that each player receives distinct payoffs at Nash equilibria of the stage-game. 1 Smith (1995) provided a weaker, necessary and sufficient condition for the limit perfect folk theorem to hold. Smith (1995) showed that it is necessary and sufficient that the Nash decomposition of the stage-game is complete. 2 The distinct Nash payoffs condition and the full dimensionality of the set of feasible payoff vectors as in Benoit and Krishna (1985) or the complete Nash decomposition of Smith (1995) allow us to construct credible punishment schemes and to (recursively) leverage the behavior of any player near the end of the finitely repeated game. These are essential to generate a limit perfect folk theorem. In the case that the stage-game admits a unique Nash equilibrium payoff vector, Benoit and Krishna (1985) demonstrated that the set of subgame perfect Nash equilibrium payoff vectors of the finitely repeated game is reduced to the unique stage-game Nash equilibrium payoff vector. A part of the puzzle remains unresolved. Namely, for stage-games that do not admit a complete Nash decomposition, what is the exact range of payoff vectors that are achievable as the limit payoff vector of a sequence of subgame perfect Nash equilibria of finite repetitions of that stage-game? 3 1 Fudenberg and Maskin (1986) introduced the notion of full dimensionality of the set of feasible payoff vectors and used it to provide a sufficient condition for the perfect folk theorem for infinitely repeated games. 2 The Nash decomposition of a normal form game is a strictly increasing sequence of non-empty groups of players. Players of the first group are those who receive at least two distinct Nash equilibrium payoffs in the stage-game. The second group of players of the Nash decomposition, if any, contains each player of the first group as well as some new players. New players are those who receive at least two distinct Nash equilibrium payoffs in the new game that is obtained from the stage-game by setting the utility function of each player of the first group equal to a constant. This idea can be iterated. After a finite number of iterations, the player set no longer changes. The Nash decomposition is complete if its last element equals the whole set of players.
If the stage-game has an incomplete Nash decomposition, then the set of players naturally breaks up into two blocks where the first block contains all the players whose behavior can recursively be leveraged near the end of the finitely repeated game (see Footnote 2 for details). In contrast, it is not possible to control short run incentives of players of the second block. Therefore, each player of the second block has to play a stage-game pure best response at any profile that occurs on a pure strategy subgame perfect Nash equilibrium play path. Stage-game action profiles eligible for pure strategy subgame perfect Nash equilibrium play paths of the finitely repeated game are therefore exactly the stage-game pure Nash equilibria of what one could call the effective one shot game, the game obtained from the initial stage-game by setting the utility function of each player of the first block equal to a constant. This restriction of the set of eligible actions for pure strategy subgame perfect Nash equilibrium play paths has two main implications. Firstly, for a feasible payoff vector to be approachable via pure strategy subgame perfect Nash equilibria of the finitely repeated game, it has to be in the convex hull of the set of payoffs to profiles of actions that are Nash equilibria of the effective one shot game. I introduce the concept of enforceable payoff vector. I call a payoff vector enforceable if it belongs to the convex hull of the set of payoff vectors to profile of actions that are Nash equilibria of the effective one shot game. Secondly, as subgame perfect Nash equilibria are protected against unilateral deviations even off-equilibrium paths, any player of the second block has to be at her best response at any action profile occurring on a credible punishment path. Therefore, only pure Nash equilibria of the effective one shot game are eligible for credible punishment paths in any finite repetition of the original stagegame. Consequently, a player of the first block can guarantee herself a payoff that is This game admits two pure Nash equilibrium profiles with respective payoff vectors (2,2,0,0) and (1,1,0,0), which are both strictly Pareto-dominated, for instance by the feasible payoff ( 17 8 , 19 8 , 1, 1). One might wonder if in the finite repetitions of this game players could cooperate and achieve efficiency via equilibrium strategies of the repeated game. As only players 1 and 2 receive distinct payoffs at pure Nash equilibria of the stage-game, the distinct Nash payoff condition of Benoit and Krishna (1985) does not hold. Furthermore, given any fixed profile of action of players 1 and 2, the 2-player induced game (played by players 3 and 4) does not admit a pure Nash equilibrium that has a payoff that is different from (0, 0). This means that the complete Nash decomposition condition of Smith (1995), which is necessary and sufficient for the finite horizon perfect folk theorem, does not hold. As a consequence, the finite horizon perfect folk theorem does not hold for this game. If it is immediate that players can cooperate and achieve some payoffs that are weakly Pareto-superior to any stage-game Nash equilibrium payoff, for instance (2.7, 2.7, 0, 0) in 10 repetitions, it is not clear what is the exact set of payoffs vectors that can be achieved via subgame perfect equilibria of the finitely repeated game. strictly greater than her effective minimax payoff. I call this new reservation payoff the enforceable minimax payoff. The main finding of this paper says that, as the time horizon increases, the set of payoff vectors to pure strategy subgame perfect Nash equilibria of the finitely repeated game converges to the set of enforceable payoff vectors that dominate the enforceable minimax payoff vector. The paper proceeds as follows. In Sect. 2 I introduce the model and the definitions. Section 3 states the main finding of the paper and sketches the proof, and Sect. 4 concludes the paper. Proofs are provided in the Appendix.

The stage-game
be a stage-game where the set of players N = {1, . . . , n} is finite and where for all player i ∈ N the set A i of actions of player i is compact. Given a player i ∈ N and an action profile a = (a 1 , . . . , a n ) ∈ A, let u i (a) denote the stage-game utility of player i given the action profile a. Given an action profile a ∈ A, i ∈ N a player, and a i ∈ A i an action of player i, let (a i , a −i ) denote the action profile in which all players except player i choose the same action as in a, while player i chooses a i . A stage-game pure best response of player i to the action profile a is an action b i (a) ∈ A i that maximizes the stage-game payoff of player i given that the choice of other players is given by a −i . An action profile a ∈ A is a pure Nash equilibrium of the stage-game G (denoted by a ∈ Nash(G)) if u i (a i , a −i ) ≤ u i (a) for all player i ∈ N and all action a i ∈ A i . Each stage-game considered in this paper is compact in the sens that each A i is compact, and u is continuous. A stage-game could for instance be finite, the mixed extension of another finite stage-game, or a game with a continum of actions for some players.
Let γ be a real number that is strictly greater than any payoff a player might receive in the stage-game G. 4 A player is said to have distinct pure Nash payoffs in the stagegame if there exist two pure Nash equilibria of the stage-game in which this player receives different payoffs. Let τ (G) = (N , A, (u i ) i∈N ) be the normal form game where the utility function of player i is defined by Let G 0 := G and G l+1 := τ (G l ) for all l ≥ 0. For all l ≥ 0, let N l be the set of players with a utility function that is constant to γ in the game G l . As N is finite, there is an h ∈ [0, +∞) such that N l+1 = N l for all l ≥ h. Let A = Nash(G h ) be the set of pure Nash equilibria of the game G h . We call A the enforceable action set. The set of enforceable payoff vectors of the game G is defined as the convex hull Conv[u( A)] of the set u( A) = {u(a) | a ∈ A}. The sequence 0 N 1 · · · N h is the Nash decomposition of the game G, and the Nash decomposition is complete if N h = N . 5 Let ∼ be the equivalence relation defined on the set of players as follows: player i is equivalent to j (denoted by i ∼ j) if there exist α i j > 0 and β i j ∈ R such that for all a ∈ A, we have u i (a) = α i j · u j (a) + β i j . For all i ∈ N , let J (i) be the equivalence class of player i and let The payoff μ i is the enforceable minimax of player i in the stage-game G. 6 Call a payoff vector e-rational if it dominates the enforceable minimax payoff vector μ.
The name "enforceable action" comes from Fudenberg et al. (2009) notion of enforceability, which requires an action to be incentive compatible given some set of continuation payoffs. The concepts of enforceable payoff and enforceable minimax are respective generalizations of the classic concepts of feasible payoff and effective minimax, viewed as indicators to derive the perfect folk theorem for finitely repeated games. If each player receives (recursively) distinct payoffs at Nash equilibria of the stage-game, then the behavior of each player can be leveraged if the game is finitely repeated. In that case, the enforceable action set A equals the whole set A of action profiles, the enforceable minimax equals the classic effective minimax, and the set of enforceable payoffs vectors equals the set of feasible payoff vectors. In the other case, the set of enforceable actions A is a proper subset of the whole set of profile of pure actions, the set of enforceable payoff vectors is a proper subset of the classic set of feasible payoff vectors, and the enforceable minimax of a player can be strictly greater 5 While being equivalent to Smith's (1995) definition of Nash decomposition, ours is simpler and requires to analyse no more than n simple transformations of the stage-game, while Smith's definition requires, in many cases, to analyse at least 2 n−1 subgames, n being the number of players. Smith (1995) proved that having a complete Nash decomposition is a necessary and sufficient condition for the limit perfect folk theorem to hold. Under a complete Nash decomposition, the set of enforceable payoff vectors equals the classic set of feasible payoff vectors and the enforceable minimax payoff vector equals the classic effective minimax payoff vector. In that case, the main result (see Theorem 1) says that any feasible payoff vector that dominates the effective minimax payoff vector is approachable via pure strategy subgame perfect Nash equilibria of the finitely repeated game. That is the message of the limit perfect folk theorem. Benoit and Krishna (1985) showed that, if the dimension of the set of feasible payoff vectors of the stage-game equals the number of players and each player receives at least two distinct payoffs at pure Nash equilibria of the stage-game, then the limit perfect folk theorem holds. This result is a particular case of the main result of this paper, Theorem 1. Indeed, under the distinct stage-game Nash equilibrium payoffs condition of Benoit and Krishna (1985), the Nash decomposition of the stage-game equals ∅ N h = N which is complete and therefore the set of the enforceable payoff vectors equals the classic set of the feasible payoff vectors and the enforceable minimax payoff vector equals the classic effective minimax payoff vector. Furthermore, under the full dimensionality condition, the effective minimax payoff vector equals the minimax payoff vector. than her effective minimax. 7 Figure 1 uses the example of Footnote 3 to illustrate the differences between our newly introduced concepts and the classic ones. Only payoffs of players 1 and 2 are displayed. In that game, pure action profiles where player 1 chooses OM and player 2 chooses C are not enforceable. The effective minimax payoff vector equals (1, 0, 0, 0) and the enforceable minimax payoff vector equals (1, 1, 0, 0). 8

The finitely repeated game
Let G be the stage-game. Given T > 0, let G(T ) denote the T -fold repeated game obtained by repeating the stage-game T times. A pure strategy of player i in the repeated game G(T ) is a contingent plan that provides for each history the action chosen by player i given this history. That is, a strategy is a function σ i : T t=1 A t−1 → A i where A 0 contains only the empty history. 9 The strategy profile σ = (σ 1 , . . . , σ n ) of G(T ) generates a play path π(σ ) = [π 1 (σ ), . . . , π T (σ )] ∈ A T and player i ∈ N receives a sequence (u i (π t (σ )) 1≤t≤T of payoffs. The preferences of player i ∈ N among strategy profiles are represented by the average payoff for all i ∈ N and for all pure strategies σ i of player i. A strategy profile σ = (σ 1 , . . . , σ n ) is a pure strategy subgame perfect Nash equilibrium of G(T ) if given any t ∈ {1, . . . , T } and any history h t ∈ A t−1 , the restriction σ |h t of σ to the history h t is a Nash equilibrium of the finitely repeated game G(T − t + 1).
For any T > 0, let E(T ) be the set of pure strategy subgame perfect Nash equilibrium payoff vectors of G(T ). Let E be such that the Hausdorff distance between E(T ) and E goes to 0 as T goes to infinity. 10 The set E is the Hausdorff limit of the set of pure strategy subgame perfect Nash equilibrium payoff vectors of the finitely repeated game. As I show later in the Appendix A, the limit set E exists, is nonempty, convex, and compact.

Main result
Theorem 1 Let G be a compact stage-game. As the time horizon increases, the set of pure strategy subgame perfect Nash equilibrium payoff (average payoff) vectors of the finitely repeated game converges (in the Hausdorff sense) to the set of enforceable and e-rational payoff vectors.
A constructive proof of Theorem 1 is provided in the appendix. It uses four main lemmata. Lemma 3 states that as the time horizon increases, the set of pure strategy subgame perfect Nash equilibrium payoffs of the finitely repeated game converges to a well defined set, ASPNE(G), which is the set of payoffs that are approachable via pure strategy subgame perfect Nash equilibria. Lemmata 4 and 5 together say that the limit set of the set of pure strategy subgame perfect Nash equilibrium payoff vectors, which equals the set ASPNE(G), is included in the set of enforceable and e-rational payoff vectors. Lemma 6 states that every enforceable and e-rational payoff vector belongs to the set ASPNE(G). The enforceability and the e-rationality can therefore be observed as necessary and sufficient conditions on feasible payoffs to be approachable via pure strategy subgame perfect Nash equilibria of the finitely repeated game.
Theorem 1 assumes no discounting. This assumption is without loss of generality. One can indeed check that as the discount factor goes to 1, the discounted average converges to the average payoff. Therefore, if the average payoff of a player to a path π is strictly greater than her average payoff to another path π , then the discounted average payoff of that player to the path π is strictly greater than her discounted average to π , given that the discount factor is high enough. One can also make use of the payoff continuation lemma for finitely repeated games and prove a stronger result. 11 With a fixed discount factor, one can show that the limit set of the set of pure strategy subgame perfect Nash equilibrium payoffs of the discounted finitely repeated game equals the set of enforceable and e-rational payoffs, given that the discount factor exceeds a threshold δ.

Conclusion
This paper analysed the set of pure strategy subgame perfect Nash equilibrium payoff vectors of the finitely repeated games with complete information. The main finding is an effective folk theorem. It is a complete characterization of the limit set, as the time horizon increases, of the set of pure strategy subgame perfect Nash equilibrium payoff vectors of the finitely repeated game. As the time horizon increases, the limiting set always exists, is compact, convex and can be strictly in-between the convex hull of the set of stage-game Nash equilibrium payoff vectors and the classic set of feasible and individually rational payoff vectors. This finding exhibits the exact range of cooperative payoffs that players can achieve via subgame perfect Nash equilibria of the finitely repeated game.
One point of this work is that it provides a full characterization of the optimal punishment payoff of finitely repeated games with complete information and perfect monitoring Krishna 1985, Gossner andHörner 2010).
The method of this paper applies to the Nash equilibrium case. In this particular case, to leverage the behaviour of a player near the end of the finitely repeated game, it is necessary and sufficient that the latter player either has a pure Nash equilibrium payoff that is strictly greater than her pure minimax payoff, or that there exists a recursive Nash equilibrium in which the latter player receives a payoff that is different from her pure minimax payoff. Pure actions eligible for pure strategy Nash equilibrium play paths of the finitely repeated game are therefore the pure Nash equilibrium profiles of a new stage-game obtained from the original stage-game by setting the utility functions of players whose behaviour can be leverage near the end of the finitely repeated game to a constant. I refer to the convex hull of the set of original payoffs to eligible profiles as the set of Nash-feasible payoffs. As the time horizon increases, the set of pure strategy Nash equilibrium payoff vectors of the finitely repeated game converges to the set of Nash-feasible payoffs vectors that dominate the pure minimax payoff vector. This characterization of the limit set of the set of Nash equilibrium payoff vectors of the finitely repeated game nests early results of Benoit and Krishna (1987) and González-Díaz (2006).
One might wonder if similar method applies in the case that players can employ unobservable mixed strategies (Gossner 1995), in the case that the monitoring technology is imperfect (see Fudenberg et al. 2007 for the infinite horizon case, and with public monitoring), or in the case that equilibrium strategies are protected against renegotiation (Benoit and Krishna 1993).

Funding Open Access funding provided by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
In this section, I show that the limit set of the set of pure strategy subgame perfect Nash equilibrium payoff vectors of any finitely repeated game is well defined. Precisely, I prove that for any compact stage-game, the set of feasible payoff vectors that are approachable via pure strategy subgame perfect Nash equilibria of the finitely repeated game equals the limit set E. As corollary, I obtain that the limit set E is a compact and convex subset of the set of feasible payoff vectors of the stage-game. The main ingredient of this proof is the conjunction lemma-henceforth, Lemma CBK-established by Benoit and Krishna (1985). Lemma CBK states that the conjunction of two subgame perfect Nash equilibrium play paths is a subgame perfect Nash equilibrium play path of the corresponding finitely repeated game.
Let G be a compact stage-game and let ASPNE(G) be the set of all feasible payoff vectors of G that are approachable via pure strategy subgame perfect Nash equilibrium payoff vectors of the finitely repeated game. 12

Proof of Lemma 1
The reader can check that ASPNE(G) is a closed subset of the set of feasible payoff vectors which is compact. The set ASPNE(G) is therefore compact. Since ASPNE(G) is closed, its convexity holds if z = 1 2 (x + y) ∈ ASPNE(G) for all x, y ∈ ASPNE(G), which follows from Lemma CBK.
Lemma 3 As the time horizon increases, the set of pure strategy subgame perfect Nash equilibrium payoff vectors of the finitely repeated game converges (in the Hausdorff sense) to the set ASPNE(G).

Proof of Lemma 4 If
Let's proceed by induction on the time horizon T . For T = 1, the pure strategy subgame perfect Nash equilibrium σ is a pure Nash equilibrium of the stage-game G, and Nash(G) = Nash(G 0 ) ⊆ Nash(G h ). Suppose that T > 1 and that the support of any pure strategy subgame perfect Nash equilibrium play path of the finitely repeated game G(t) with t ∈ {1, . . . , T − 1} is included in the set Nash(G h ) and let's show that {π 1 (σ ), . . . , π T (σ )} ⊆ Nash(G h ). The restriction σ |π 1 (σ ) of σ to the history π 1 (σ ) is a pure strategy subgame perfect Nash equilibrium of the game G(T − 1) and the induction hypothesis implies that the support {π 2 (σ ) . . . π T (σ )} of the play path π(σ |π 1 (σ ) ) generated by the strategy profile σ |π 1 (σ ) is included in Nash(G h ). It remains to show that π 1 (σ ) ∈ Nash(G h ). At this point I proceed by contradiction. Assume that π 1 (σ ) / ∈ Nash(G h ). Then, in the game G h , there exists a player i ∈ N who has a strict incentive to deviate from the pure action profile π 1 (σ ). This player has to be in the block N \N h since any player of the block N h has a constant utility function in the game G h . Let σ i be a pure strategy one shot deviation of player i from σ that consists in playing a stage-game pure best response b i [π 1 (σ )] to π 1 (σ ) in the first round of the finitely repeated game G(T ), and conforming to σ i from the second round on. At the pure strategy profile (σ i , σ −i ), player i receives u i (π 1 ) + e (with e > 0) in the first round. Let h 1 = (b i (π 1 (σ )), π 1 (σ ) −i ) be the observed history after this first round and σ |h 1 be the restriction of σ to the history h 1 . We have (σ i , σ −i ) |h 1 = σ |h 1 and σ |h 1 is a pure strategy subgame perfect Nash equilibrium of G(T − 1). By induction hypothesis, the support of the play path generated by σ |h 1 is included in Nash(G h ). Therefore, at the profile (σ i , σ −i ) player i receives the sequence of stage-game payoffs {u i (π 1 ) + e, n i , . . . , n i } where n i is her unique stage-game pure Nash equilibrium payoff. Since player i receives {u i (π 1 (σ )), n i , . . . n i } at the strategy profile σ , we . This contradicts the fact that σ is a pure strategy subgame perfect Nash equilibrium of G(T ) and concludes the proof.
Let F be the set of enforceable payoff vectors. We have the following corollary.
Corollary 1 Let G be a compact normal form game, let T > 0, and let σ be a pure strategy subgame perfect Nash equilibrium of G(T ). Then the average payoff vector u T (σ ) belongs to the set F. Wen (1994) shows that any subgame perfect Nash equilibrium payoff vector of the infinitely repeated game weakly dominates the effective minimax payoff vector. This domination also holds for finitely repeated games. The following lemma provides a sharper lower bound of the set of equilibrium payoffs of the finitely repeated game. The lemma says that, any pure strategy subgame perfect Nash equilibrium payoff vector of the finitely repeated game weakly dominates the enforceable minimax payoff vector.

C. Necessity of the e-rationality
Lemma 5 Let G be a compact normal form game, let T ≥ 1, and let σ be a pure strategy subgame perfect Nash equilibrium of the finitely repeated game G(T ). Then the average payoff vector u T (σ ) dominates the enforceable minimax payoff vector of the stage-game.
Proof of Lemma 5 I proceed by induction on the time horizon T . At T = 1, pure strategy subgame perfect Nash equilibria of the game G(T ) are pure Nash equilibria of the stage-game G and u T (σ ) dominates μ. 13 Assume that T > 1 and that the average payoff vector to any pure strategy subgame perfect Nash equilibrium of the finitely repeated game G(t) with 0 < t < T dominates the enforceable minimax payoff vector μ. Let us show that the payoff vector u T (σ ) dominates μ. Let π 1 (σ ) be the action profile played in the first round of the game G(T ) according to σ . The restriction σ |π 1 (σ ) of the strategy σ to the history π 1 (σ ) is a pure strategy subgame perfect Nash equilibrium of the finitely repeated game G(T − 1) and by induction hypothesis, we have that the payoff vector u T −1 (σ |π 1 (σ ) ) dominates μ. Suppose now that u T (σ ) does not dominates μ. Then there exists a player i ∈ N such that u T i (σ ) < μ i . It follows that u i [π 1 (σ )] < μ i since u T i (σ ) is a convex combination of u i [π 1 (σ )] and u T −1 i (σ |π 1 (σ ) ). Moreover, as π 1 (σ ) ∈ Nash(G h ), we have u j [π 1 (σ )] < μ j for all j ∈ J (i). From the definition of μ, there exists a player i 0 ∈ J (i) and a pure action a i 0 ∈ A i 0 of player i 0 such that u i 0 [a i 0 , π 1 (σ ) −i 0 ] ≥ μ i 0 . Consider the pure strategy one shot deviation σ i 0 of player i 0 from σ in which she For each g ∈ {1, . . . , h}, let e N g−1 , f N g−1 ∈ × i∈N g−1 A i be two profiles of actions of players of the bloc N g−1 such that there exists two Nash equilibria z(e N g−1 ) and z( f N g−1 ) respectively for games G(e N g−1 ) and G( f N g−1 ), with distinct payoff for each player of the block N g \N g−1 , where G(e N g−1 ) (respectively G( f N g−1 )) is a stagegame with players N \N g−1 obtained from G by fixing the actions of players of the block N g−1 to e N g−1 (respectively f N g−1 ). Define c g = min i∈N g \N g−1 ||u(z(e N g−1 )) − u(z( f N g−1 ))||. Let y g denote alternating between the action profiles z(e J g−1 ) (in even periods) and z( f J g−1 ) (in odd periods). Let z i,g be the Nash equilibrium profile among z(e N g−1 ) and z( f N g−1 ) which is the worst for player i ∈ N g .
The 5-phase strategy profile is adjusted as follows. The phase length variablesnamely q (Phase 3), r (Phase 4), and t g (qp + r p) (g = 1, 2, . . . , h, Phases 2 and 5)-will be chosen at the end of the construction, along with the reward vectors x j (∀ j ∈ N h ) used in Phase 4. Early (late) deviations are those occurring up to (after) period T − t h (qp + r p) − (qp + r p), ie deviation is "early" if there is still time to run Phases 3 and 4 before period T − t h (qp + r p) + 1.
Strategy profiles.
1. (Main Path) Play a l at period t = l[mod p] + t h (qp +r p) until period T − t h (qp + r p).
[After an early deviation by i ∈ N h , go to Phase 3; after a late deviation by i ∈ N g , go to Phase 5.] Go to Phase 2. 2. (Good Recursive Nash) For g = h, . . . , 1: Play y g in periods T − t g (qp + r p) + 1, . . . , T − t g−1 (qp + r p). [After a deviation by i ∈ N g with g < g, start Phase 5.] 3. (Minmax Phase for i): Play w i for qp periods. [If any j ∈ N \J (i) deviates early, start Phase 4; if any j ∈ N g deviates late, start Phase 5 with i ← j. If any j ∈ J (i) deviates early, set i ← j and restart Phase 3.] Then set j ← i and start Phase 4. 4. (Reward Phase) Repeat the path π p, j for r rounds. [If any i ∈ N h deviates early, restart Phase 3; if any i ∈ N g deviates late, start Phase 5.] Then return to Phase 1. 5. (Bad Recursive Nash) Play z g ,i until period T −t g (qp +r p). [If j ∈ N g deviates, where g < g , set g ← g and i ← j and restart Phase 5.] Then go to Phase 2.
So along the equilibrium path, the sequence of action profiles is a l , . . . , a p ; a 1 , . . . , a p ; · · · ; a 1 , . . . , a p Length of phases: Let ρ be the largest gap between best and worst payoffs across all players in G. For all i ∈ N h , let π p ,i be a sequence of length p of pure Nash equilibria of the effective game such that the sequence of average payoffs x i = u(π p ,i ), i ∈ N h satisfies x i 0 ∀i ∈ N h , x i i < x j i ∀ j / ∈ (N \N h ) ∪ J (i), x i = x j ∀ j ∈ J (i), and x i i < y i ∀i ∈ N h . (Such vectors exist following Abreu et al. (1994)). There is no loss of generality to assume that p = p. Choose q such that ρ < qp · x i i and r such that ρ + max 0, qp · y j − u j (w i ) < r p · (x i j − x j j ) for all i ∈ N h and j ∈ N h \J (i).
For any number k, let ψ g (k) be the least even number above 2kρ/c g , so that that a player i ∈ N g is willing to play k periods of any action followed by ψ g (k) periods of y g , if deviations switch each y g to z g,i . Recursively define s h (m) = ψ h (m) and (∀g = 1, 2, . . . , h − 1)s g (m) = ψ g (m + s g+1 (m) + · · · + s h (m)) Then set t 0 (m) = 0 and t g (m) = s 1 (m) + · · · + s g (m), for g = 1, 2, . . . , h. Subgame perfect verification: By construction, no one-shot deviation by a player of the block N h is profitable (see Smith (1995), Demeze-Jouatsa and Wilson (2019)). Observe that only Nash equilibria of the effective game appears on equilibrium paths of each subgame. Therefore, no player of the block N \N h can profitably deviate from the constructed strategy.