An epistemic approach to explaining cooperation in the finitely repeated Prisoner’s Dilemma

Cao, Vi

doi:10.1007/s00182-021-00785-x

An epistemic approach to explaining cooperation in the finitely repeated Prisoner’s Dilemma

Original Paper
Open access
Published: 04 July 2021

Volume 51, pages 53–85, (2022)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Game Theory Aims and scope Submit manuscript

An epistemic approach to explaining cooperation in the finitely repeated Prisoner’s Dilemma

Download PDF

Vi Cao ORCID: orcid.org/0000-0001-7082-0360¹

2692 Accesses
Explore all metrics

Abstract

We use epistemic game theory to explore rationales behind cooperative behaviors in the finitely repeated Prisoner’s Dilemma. For a class of type structures that are sufficiently rich, the set of outcomes that can arise when each player i is rational and satisfies $(m_i-1)$th order strong belief of rationality is the set of paths on which each player i defects in the last $m_i$ rounds. We construct one sufficiently rich type structure to elaborate on how different patterns of cooperative behaviors arise under sufficiently weak epistemic conditions. In this type structure, the optimality of forgiving the opponent’s past defection and the belief that one’s defection will be forgiven account for the richness of the set of behavior outcomes.

Cooperation Mechanisms for the Prisoner’s Dilemma with Bayesian Games

Apology and forgiveness evolve to resolve failures in cooperative agreements

Article Open access 09 June 2015

Cooperation in stochastic games: a prisoner’s dilemma experiment

Article 19 July 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The finitely repeated Prisoner’s Dilemma has a unique Nash equilibrium, in which both players defect in every round. This equilibrium outcome is inconsistent with experimental evidence: experimental subjects typically cooperate to some extent [see Oskamp and Perlman (1965), Morehous (1966), Selten and Stoecker (1986), Andreoni and Miller (1993), Cooper et al. (1996), Dal Bo and Frechette (2011), Normann and Wallace (2012), Kagel and McGee (2014), and Embrey et al. (2017)]. Various modifications to the finitely repeated Prisoner’s Dilemma have been made to obtain an equilibrium path on which players cooperate. For instance, Kreps et al. (1982) assume there is a small probability that one player is irrational and plays a Tit for Tat strategy (one-sided incomplete information). Under this assumption, in any sequential equilibrium that is not Pareto-dominated by any other sequential equilibria, the path of play has bilateral defection only in the last few rounds and bilateral cooperation in all other rounds. Fudenberg and Maskin (1986) also assume each player is irrational with a small probability (two-sided incomplete information). When the number of rounds is sufficiently large, by varying the form of irrationality, any individually rational payoffs of the stage game can be approximated by sequential equilibrium payoffs of the finitely repeated game. Instead of introducing irrational types, Neyman (1999) assumes an exponentially small departure from the common knowledge assumption on the number of rounds. He shows that when the number of rounds is sufficiently large, any individually rational payoffs of the stage game can be approximated by subgame perfect equilibrium payoffs of the finitely repeated game. For other works that use the equilibrium approach to rationalize cooperation, see Radner (1980), Radner (1986), Neyman (1985), Neyman (1998), Samuelson (1987), Hirshleifer and Rasmusen (1989), and Dijkstra and van Assen (2017).

This paper uses epistemic theory instead of equilibrium theory to explain why rational players cooperate. We attach to the finitely repeated Prisoner’s Dilemma a type structure that captures players’ interactive beliefs as in Battigalli and Siniscalchi (1999). Our objective is to characterize the set of all behavior outcomes that can arise when players are rational and satisfy some orders of strong belief of rationality (henceforth SB). As defined in Battigalli and Siniscalchi (2002), a player satisfies first order SB if, whenever possible, she believes that her opponent is rational; a player satisfies second order SB if (a) she satisfies first order SB, and (b) whenever possible, she believes that her opponent is rational and satisfies first order SB; and so on. Our first result (Theorem 1) claims that there exists a type structure that satisfies the following: for each pair $(m_1, m_2) \in {\mathbb {N}}_+^2$, the set of outcomes that can arise when each player i is rational and satisfies $(m_i-1)$th order SB is the set of paths on which each player i defects in the last $m_i$ rounds. This result is consistent with Battigalli and Friedenberg (2012), who show that for any type structure, if players satisfy rationality and common SB^{Footnote 1}, then they defect in every round on path. In this paper, we focus on behavioral implications of epistemic conditions weaker than rationality and common SB. In particular, for each pair $(m_1, m_2) \in \{1, \ldots , M - 1 \}^2$ (where M is the number of rounds), we elaborate on how different patterns of cooperative behaviors arise when each player i is rational and satisfies $(m_i-1)$th order SB.

In the following, we give a sketch on how to construct the type structure that supports Theorem 1. In this type structure, for each pair $(m_1, m_2) \in {\mathbb {N}}_+^2$ and each path $\varvec{a}$ on which each player i defects in the last $m_i$ rounds, there must be a pair of beliefs^{Footnote 2} (one for each player) such that player i’s belief satisfies $(m_i-1)$th order SB and players’ rationality under these beliefs leads to $\varvec{a}$. To ensure the type structure contains all necessary beliefs, for each player i, we construct a mapping that maps each pair $(m_i, \varvec{a})$ (where $m_i \in {\mathbb {N}}$ and $\varvec{a}$ is a path on which player i defects in the last $m_i$ rounds) to a pair of strategy and epistemic type for player i. At each conditioning event, this epistemic type assigns probability one to some $(m^\prime _{-i}, \varvec{a^\prime })$ (where $m^\prime _{-i} \in {\mathbb {N}}$ and $\varvec{a^\prime }$ is a path on which player $-i$ defects in the last $m^\prime _{-i}$ rounds), which is mapped to a pair of strategy and epistemic type for player $-i$. For each pair $(m_1, m_2) \in {\mathbb {N}}_+^2$ and each path $\varvec{a}$ on which each player i defects in the last $m_i$ rounds, the strategy-epistemic type pairs associated with $(m_1, \varvec{a})$ and $(m_2, \varvec{a})$ satisfy the following desired properties: player i’s epistemic type satisfies $(m_i-1)$th order SB, player i’s strategy is rational given her epistemic type, and the strategy profile induces $\varvec{a}$.

With this type structure, for each pair $(m_1, m_2) \in \{1, \ldots , M \}^2$ (where M is the number of rounds), when each player i is rational and satisfies $(m_i-1)$th order SB, any path on which each player i defects in the last $m_i$ rounds (indexed $M - m_i + 1, \ldots , M$) is possible. Such richness of the set of possible outcomes is due to the optimality of forgiving the opponent’s past defection and the belief that one’s defection will be forgiven. For illustration, suppose player i satisfies $(m_i-1)$th order SB and is at some round $n < M - m_i + 1$. At the beginning of round n, player i believes that her opponent will cooperate in round n and will keep cooperating until round $M - m_i + 1$ if no one has defected since round n. If player i believes that her defection in round n will be forgiven (i.e., her opponent will still cooperate in round $n+1$ and keep cooperating until round $M - m_i + 1$ if no one has defected since round $n+1$), then it is optimal to defect in round n. If player i believes that her defection in round n will not be forgiven (i.e., her opponent will respond by defecting from round $n+1$ onwards), then it is optimal to cooperate in round n. Thus, any action by player i in each round $n < M - m_i + 1$ can be rationalized. Indeed, there are beliefs under which forgiving the opponent’s past defection is optimal: if a player believes that forgiving will lead to a cooperative phase whereas retaliating will lead to bilateral defection in all future rounds, then it is optimal to forgive.

With this type structure, for each pair $(m_1, m_2) \in \{1, \ldots , M \}^2$, when each player i is rational and satisfies $(m_i-1)$th order SB, any path on which some player i cooperates in some round $n \ge M - m_i + 1$ is impossible. We sketch out the proof below. Suppose player i satisfies first order SB and is at round $M - 1$. As discussed in the preceding paragraph, any past play is consistent with the hypothesis that player $-i$ is rational. Thus, player i assigns probability one to the event that player $-i$ is rational and will defect in round M regardless of what player i does in round $M - 1$. Under this belief, defecting in round $M - 1$ is optimal for player i. Next, suppose player i satisfies second order SB and is at round $M - 2$. As discussed in the preceding paragraph, any past play is consistent with the hypothesis that player $-i$ is rational and satisfies first order SB. Thus, player i assigns probability one to the event that player $-i$ is rational and satisfies first order SB, and will defect in round $M-1$ regardless of what player i does in round $M - 2$. Under this belief, defecting in round $M - 2$ is optimal for player i. And so on. By induction, if player i is rational and satisfies $(m_i-1)$th order SB, then she defects in each round $n \ge M - m_i + 1$ regardless of what player $-i$ has done by round n, as she assigns the highest possible order of SB to player $-i$ and expects that player $-i$ will defect from round $n + 1$ onwards.

We then use the type structure constructed for Theorem 1 to obtain the second result (Theorem 2): for any sufficiently rich type structure and each pair $(m_1, m_2) \in {\mathbb {N}}_+^2$, the set of outcomes that can arise when each player i is rational and satisfies $(m_i-1)$th order SB is the set of paths on which each player i defects in the last $m_i$ rounds. The notion of sufficiently rich type structure is introduced in Perea (2012).^{Footnote 3} Informally, a type structure ${\mathcal {T}}$ is sufficiently rich if: for each player i, each order $m_i \in {\mathbb {N}}_+$, and each history of past actions h, if there exists a type structure ${\mathcal {T}}^\prime $ such that (a) for each $j \in \{1, 2\}$ and each $m^\prime < m_i$, the behavioral implication of $(m^\prime -1)$th order SB for player j in the game with ${\mathcal {T}}^\prime $ is identical to that in the game with ${\mathcal {T}}$,^{Footnote 4} and (b) with ${\mathcal {T}}^\prime $, the history h is consistent with the hypothesis that player i is rational and satisfies $(m_i-1)$th order SB, then the type structure ${\mathcal {T}}$ must contain some belief for player i such that the belief satisfies $(m_i-1)$th order SB and rationality under this belief leads to some behavior consistent with history h. We show that the set of outcomes that can arise when each player i is rational and satisfies $(m_i-1)$th order SB for a sufficiently rich type structure coincides with the set of outcomes that can arise when each player i is rational and satisfies $(m_i-1)$th order SB for a complete type structure.^{Footnote 5} Thus, the notion of sufficiently rich type structure might be useful if one aims to study behavioral implications of an epistemic condition for a complete type structure but finds it more convenient to work with incomplete type structures. We also show that the type structure that supports Theorem 1 and its extensions (which are formed by adding new beliefs to the original type structure) are sufficiently rich. A complete type structure is one of these extensions and obviously sufficiently rich.

In the following, we show how to use the type structure constructed for Theorem 1 to obtain Theorem 2, and highlight the importance of the assumption that the type structure is sufficiently rich. Let ${\mathcal {T}}$ be a sufficiently rich type structure and ${\mathcal {T}}^*$ be the type structure constructed for Theorem 1. First, recall that with ${\mathcal {T}}^*$, when players are rational, any outcome with bilateral defection in the last round is possible. Consequently, as ${\mathcal {T}}$ is sufficiently rich, in the game with ${\mathcal {T}}$, any non-terminal history is consistent with the behavior of a rational player. It follows that in the game with ${\mathcal {T}}$, the set of outcomes that can arise when players are rational is the set of paths on which players defect in the last round. Next, recall that with ${\mathcal {T}}^*$, when players are rational and satisfy first order SB, any outcome with bilateral defections in the last two rounds is possible. Consequently, as ${\mathcal {T}}$ is sufficiently rich, in the game with ${\mathcal {T}}$, any non-terminal history (except the one on which player i defects in round $M - 1$) is consistent with the behavior of player i who is rational and satisfies first order SB. In addition, if a player is rational and satisfies first order SB, then she defects in round $M - 1$ as she assigns probability one to the event that her opponent is rational and will defect in round M. It follows that in the game with ${\mathcal {T}}$, the set of outcomes that can arise when players are rational and satisfy first order SB is the set of paths on which players defect in the last two rounds. And so on. If a type structure is not sufficiently rich, when each player i is rational and satisfies $(m_i - 1)$th order SB, the following might happen: (a) some path on which each player i defects in the last $m_i$ rounds is impossible, as some player i does not have any belief that satisfies $(m_i - 1)$th order SB and rationalizes this path, and (b) some path on which some player i cooperates in some round $n \ge M - m_i + 1$ is possible (we show (b) by means of an example).

Throughout the paper, we mostly work with sufficiently rich type structures. In Sect. 7, we give a conjecture for type structures that are not sufficiently rich. In particular, we conjecture that: for each pair $(m_1, m_2) \in \{ 1, \ldots , M - 1 \}^2$ with $m_1 > m_2$, and each path $\varvec{a}$ on which player 1 defects in the last $m_2 + 1$ rounds and player 2 defects in the last $m_2$ rounds, there exists an insufficiently rich type structure with which $\varvec{a}$ is possible when each player i is rational and satisfies $(m_i - 1)$th order SB. If this conjecture is correct, then we can characterize the set of outcomes that can arise when each player i satisfies rationality and $(m_i - 1)$th order SB across all type structures. The importance of studying different type structures has been discussed in Battigalli and Friedenberg (2012). The authors interpret a type structure as a context, in which the set of possible beliefs is a product of a history or social conventions. If an analyst does not know the context, she might need to study behavioral implications of an epistemic condition across all type structures.

Our work is related to some papers that provide procedures for deriving the set of possible outcomes under rationality and mth order strong belief of rationality (henceforth RmSBR) for each $m \in {\mathbb {N}}$. Battigalli and Siniscalchi (2002) provide a procedure for deriving such sets for a complete type structure: an outcome is possible under RmSBR if, with a complete type structure, it can arise under RmSBR. Brandenburger et al. (2019) provide a procedure for deriving such sets across all type structures: an outcome is possible under RmSBR if there exists a type structure with which it can arise under RmSBR. Our paper derives these sets for each sufficiently rich type structure. First, we construct a sufficiently rich type structure with which the set of outcomes that can arise under RmSBR is the set of paths on which players defect in the last $m + 1$ rounds. Then, we use this result to show that for any sufficiently rich type structure, the set of outcomes that can arise under RmSBR is the set of paths on which players defect in the last $m + 1$ rounds. For our work as well as the two aforementioned procedures, the most challenging task is to construct beliefs that satisfy mth order strong belief of rationality and rationalize all paths that have bilateral defections in the last $m + 1$ rounds. Our main contribution is our construction of the type structure that contains all such beliefs.

The remaining of the paper is organized as follows. Section 2 introduces basic epistemic concepts. Section 3 shows that for any type structure and any $m \in {\mathbb {N}}_+$, if an outcome arises under R$(m-1)$SBR, then it has bilateral defections in the last m rounds. Section 4 gives an illustrative example. Section 5 presents a type structure with which the set of outcomes that can arise when each player i is rational and satisfies $(m_i-1)$th order SB is the set of paths on which each player i defects in the last $m_i$ rounds. Section 6 defines a sufficiently rich type structure and studies behavioral implications of SB for these type structures. Section 7 discusses insufficiently rich type structures.

2 Epistemic game and epistemic conditions

In this section, we introduce the Finitely Repeated Prisoner’s Dilemma, its epistemic game, and epistemic conditions. Our epistemic framework follows Battigalli and Siniscalchi (2002), Battigalli and Friedenberg (2012), Friedenberg (2019), and Brandenburger et al. (2019).

2.1 Finitely repeated Prisoner’s Dilemma

Let the stage game be $P = (A_i, u_i)_{i=1,2}$, where for each player i, the action set $A_i = \{C, D\}$ and the payoff function $u_i: A_1 \times A_2 \rightarrow {\mathbb {R}}$ is described by the following payoff matrix:

	C (Cooperate)	D (Defect)
C (Cooperate)	$b_3, b_3$	$b_1, b_4$
D (Defect)	$b_4, b_1$	$b_2, b_2$

where $b_4> b_3> b_2 > b_1$. There are M rounds ($1< M < \infty $). In each round, the stage game is played and the outcome is observed. Let $a^m \in A \equiv A_1 \times A_2$ be the joint action played in round m and let $a^m_i$ be the action of player i that constitutes $a^m$. Let ${\mathcal {P}}$ denote the supergame.

Histories. A history h is a sequence of joint actions. Let $h^1 \equiv (a^0)$ denote the root of the game tree. For each $m \in \{ 2, \ldots , M \}$, let $h^m \equiv (a^1, \ldots , a^{m-1}) \in A^{m-1}$ be an mth-round history. Let H be the set of non-terminal histories and Z be the set of terminal histories.

Strategies. A strategy $s_i: H \rightarrow A_i$ maps each history to an action available at that history. Let $S_i$ be the set of strategies of player i. A strategy $s_i$ allows history $h \in H \cup Z$ if there is some strategy $s_{-i}$ such that the path induced by $(s_i, s_{-i})$ passes through h. For each history $h \in H \cup Z$, let $S_i(h)$ be the set of player i’s strategies that allow h. For each strategy $s_i \in S_i$, let $H(s_i) = \big \{ h \in H \cup Z ~|~ s_i \in S_i(h) \big \}$ be the set of histories allowed by $s_i$.

Payoffs. Let $\xi : S_1 \times S_2 \rightarrow A^M$ map each strategy profile $(s_1, s_2)$ to a path of play. For each $m \in \{1, \ldots , M \}$, let $\xi ^m: S_1 \times S_2 \rightarrow A$ map each strategy profile $(s_1, s_2)$ to a joint action at round m. The overall payoff is the sum of stage game payoffs: $\pi _i (s_i, s_{-i}) = \sum _{m = 1}^M u_i [ \xi ^m(s_i, s_{-i}) ]$.

Remark 1

Generally, we write a joint action $a \in A$ in order of $a_1a_2$. When focusing on a generic player i, we write a in order of $a_i a_{-i}$; for instance, CD means that player i chooses C and his opponent chooses D. We use ‘path of play’ and ‘outcome’ interchangeably, and write a generic outcome as $(a^1, \ldots , a^M)$ and $\varvec{a}$ interchangeably.

2.2 Type structure and epistemic game

As the game unfolds, each player might need to revise her belief conditional on her opponent’s past actions. Hence, specifying players’ beliefs at every history is necessary. At each history, each player forms a first-order conditional belief (a probabilistic assessment about the strategies of her opponent), a second-order conditional belief (a probabilistic assessment about the strategies and first-order conditional beliefs of her opponent), and so on. Players’ possible beliefs are captured by a type structure.

Definition 1

A type structure is a tuple

$$\begin{aligned} {\mathcal {T}} \equiv ({\mathcal {P}}; T_1, T_2; {\mathcal {E}}_1, {\mathcal {E}}_2; \beta _1, \beta _2), \end{aligned}$$

where for each i,

(a)
$T_i$ is a compact metric epistemic type set,
(b)
${\mathcal {E}}_i \otimes T_{-i} \equiv \{ S_{-i}(h) \times T_{-i} : h \in H \}$ is the set of conditioning events,
(c)
$\beta _i : T_i \rightarrow {\mathcal {C}}(S_{-i} \times T_{-i}, {\mathcal {E}}_i \otimes T_{-i})$ is a continuous belief map.

In this definition, ${\mathcal {C}}(S_{-i} \times T_{-i}, {\mathcal {E}}_i \otimes T_{-i})$ is a set of conditional probability systems, each of which specifies player i’s conditional beliefs about the strategies and epistemic types of her opponent at all histories.^{Footnote 6} For each epistemic type $t_i \in T_i$ and each history $h \in H$, let $\beta _{i, h} (t_i) (\cdot ) \equiv \beta _i(t_i)(\cdot |S_{-i}(h) \times T_{-i})$ be the conditional belief of type $t_i$ at h. Marginalizing $\beta _{i, h} (t_i)$ onto $S_{-i}$ gives the first-order conditional belief of type $t_i$. Since each type $t_{-i}$ also has a first-order conditional belief, $\beta _{i, h} (t_i)$ gives the second-order conditional belief of type $t_i$. And so on.

A epistemic game is a pair $({\mathcal {P}}, {\mathcal {T}})$, where ${\mathcal {T}}$ is a type structure. Each $({\mathcal {P}}, {\mathcal {T}})$ induces a set of states, viz. $S_1 \times T_1 \times S_2 \times T_2$. A state $(s_1, t_1, s_2, t_2)$ describes the strategies $(s_1, s_2)$ and beliefs $[\beta _1(t_1), \beta _2(t_2)]$ of players.

2.3 Rationality

Rationality requires that the player maximize her expected utility subject to her conditional belief at every history allowed by her strategy. Let player i’s expected utility from choosing $s_i$ under conditional belief $\beta _{i,h}(t_i)$ be

$$\begin{aligned} U_{i,h}(s_i, t_i) \equiv \displaystyle \sum _{s_{-i} \in S_{-i}} \pi _i(s_i, s_{-i}) \times \mathrm{marg}_{S_{-i}} \beta _{i,h}(t_i)(s_{-i}). \end{aligned}$$

Definition 2

A strategy-type pair $(s_i, t_i)$ is rational if for each $h \in H(s_i)$ and each $r_i \in S_i(h)$, we have $U_{i,h}(s_i, t_i) \ge U_{i,h}(r_i, t_i)$.

If $(s_i, t_i)$ is rational, then we say $s_i$ is a sequential best response for $t_i$. Let $R_i$ be the set of rational strategy-type pairs for player i and let $R \equiv R_1 \times R_2$ be the set of states at which each player is rational.

2.4 Strong belief of rationality

This section presents conditions on beliefs. First-order strong belief of rationality requires that the player believe her opponent is rational until she observes otherwise. Second-order strong belief of rationality requires that the player (a) satisfy first-order strong belief of rationality, and (b) believe her opponent is rational and satisfies first-order strong belief of rationality until she observes otherwise. An so on. The following definition of strong belief follows Battigalli and Siniscalchi (2002).

Definition 3

A type $t_i$ strongly believes an event $E \subseteq S_{-i} \times T_{-i}$ if, for each $h \in H$,

$$\begin{aligned} E \cap [S_{-i} (h) \times T_{-i}] \not = \emptyset \ \hbox {implies} \ \beta _{i, h} (t_i) (E) = 1. \end{aligned}$$

For each $m \in {\mathbb {N}}$, the condition of mth-order strong belief of rationality is defined as follows. Let $SB_i (E) \equiv S_i \times \{ t_i ~|~ t_i$ strongly believes $E \}$. Set $R_i^0 = S_i \times T_i$ and $R_i^1 = R_i$. Define inductively $R_i^{m+1} = R_i^m \cap SB_i(R_{-i}^m)$ for each $m \in {\mathbb {N}}_+$. Set $R_i^\infty = \bigcap _{m \in {\mathbb {N}}_+} R_i^m$. We say that $R_i^m$ is the set of player i’s strategy-type pairs that satisfy rationality and $(m-1)$th-order strong belief of rationality. Let $R^m = R_1^m \times R_2^m$ be the set of states at which there is rationality and $(m-1)$th-order strong belief of rationality. Let $R^\infty = R_1^\infty \times R_2^\infty $ be the set of states at which there is rationality and common strong belief of rationality. We say that an outcome $\varvec{a}$ is consistent with $R^m$ if there exists a state $(s_1, t_1, s_2, t_2) \in R^m$ such that $\xi (s_1, s_2) = \varvec{a}$.

3 A necessary condition for cooperation

In this section, we show that cooperation occurs only if at least one player fails rationality and $(M-1)$th-order strong belief of rationality. In other words, if both players satisfy rationality and $(M-1)$th-order strong belief of rationality, then they defect in every round on the path of play. Battigalli and Friedenberg (2012) show that, for any epistemic game $({\mathcal {P}}, {\mathcal {T}})$, rationality and common strong belief of rationality implies bilateral defection in every round. We show that, for any epistemic game $({\mathcal {P}}, {\mathcal {T}})$, rationality and $(m-1)$th-order strong belief of rationality (with $m \in \{ 1, \ldots , M \}$) implies bilateral defection in each round $n \ge M - m + 1$; equivalently, at any state in $R^m$, players defect in the last m rounds on path.

Proposition 1

For each epistemic game $({\mathcal {P}}, {\mathcal {T}})$ and each state $(s_1, t_1, s_2, t_2)$ thereof, if $(s_1, t_1, s_2, t_2) \in R^m$ for some $m \in \{ 1, \ldots , M \}$, then $\xi ^n(s_1, s_2) = DD$ for each $n \ge M - m + 1$.

The proof for Proposition 1 (see Appendix A) resembles that of Battigalli and Friedenberg (2012). For illustration, we assume the stage game is played for three rounds. For each $m \in \{1, 2, 3\}$, we will show that $R^m$ implies bilateral defection in the last m rounds. It is easy to show that players who satisfy $R^1$ defect at every third-round history; hence, $R^1$ implies bilateral defection in round 3. Fix a state $(s_i, t_i, s_{-i}, t_{-i}) \in R^2$ and a second-round history $h^2$ on the path induced by $(s_i, s_{-i})$. The history $h^2$ is consistent with rationality; hence, at $h^2$, player i is certain that her opponent is rational and will defect at every third-round history. Consequently, player i defects at $h^2$. Finally, fix a state $(s_i, t_i, s_{-i}, t_{-i}) \in R^3$. Ex ante, player i is certain that her opponent satisfies $R^2$ and that bilateral defection will ensue in the last two rounds. If $s_i(h^1) = C$, then the strategy $s_i$ is strictly worse than some strategy $r_i$ that defects at every history. It follows that $s_i(h^1) = D$.

Proposition 1 implies the following:

Corollary 1

For each epistemic game $({\mathcal {P}}, {\mathcal {T}})$ and each state $(s_1, t_1, s_2, t_2)$ thereof, if $(s_1, t_1, s_2, t_2) \in R^m$ for some $m \ge M$, then $\xi ^n(s_1, s_2) = DD$ for each $n \in \{ 1, \ldots , M \}$.

Corollary 1 claims that if both players satisfy rationality and $(M-1)$th-order strong belief of rationality, then they defect in every round on the path of play. In the next sections, we study how cooperation arises when rationality and $(M-1)$th-order strong belief of rationality does not hold.

4 An illustrative example

In Sect. 5, for an M-round Prisoner’s Dilemma, we construct an epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$ in which: for each $m \in \{ 1, \ldots , M - 1 \}$, any outcome that has bilateral defection in the last m rounds is consistent with $R^m$. In this section, we assume there are three rounds. A three-round Prisoner’s Dilemma has a relatively small set of non-terminal histories, which allows us to construct a relatively simple epistemic game $({\mathcal {P}}, {\mathcal {T}}^\prime )$ in which: for each $m \in \{ 1, 2 \}$, any outcome that has bilateral defection in the last m rounds is consistent with $R^m$. Although the epistemic game $({\mathcal {P}}, {\mathcal {T}}^\prime )$ presented in this section is simpler than the three-round Prisoner’s Dilemma epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$ presented in Sect. 5, it preserves key features of $({\mathcal {P}}, {\mathcal {T}}^*)$. We highlight these key features in Remark 2 at the end of this section.

The type structure ${\mathcal {T}}^\prime $ is constructed as follows. The epistemic type set for each player i is

$$\begin{aligned} T_i = \{ t_i^1, t_i^2, t_i^3, t_i^4, t_i^5 \}. \end{aligned}$$

Ex ante, type $t_i^1$ believes that her opponent has type $t_{-i}^1$ and plays grim trigger strategy $s_{-i}^G$ (which cooperates if and only if no one has defected). At any history h that is allowed by $s_{-i}^G$, type $t_i^1$ continues to believe that she is facing the strategy-type pair $(s_{-i}^G, t_{-i}^1)$. At any history h that is not allowed by $s_{-i}^G$, type $t_i^1$ believes that she is facing strategy-type pair $(s_{-i}^h, t_{-i}^1)$, where $s_{-i}^h$ allows h and defects at every history that does not precede h. We can show that a sequential best response for $t_i^1$ is a strategy $s_i^1$ that cooperates in round one, cooperates at history (CC), and defects at any other history.

Ex ante, type $t_i^2$ believes that her opponent has type $t_{-i}^1$ and plays strategy $s_{-i}^F$ that cooperates in round one, cooperates at every second-round history, and plays Tit-for-Tat in round three (formally, $s^F_{-i} (h) = D$ if and only if $h \in \{ (a^1, a^2) \in A^2 ~|~ a_{-i}^2 = D \}$). The superscript F in $s^F_{-i}$ is a mnemonic for ‘forgiving’: type $t_i^2$ believes that her defection in round one will be ‘forgiven’ by her opponent. At any history $h \not = h^1$, type $t_i^2$ believes that she is facing strategy-type pair $(\tilde{s}_{-i}^h, t_{-i}^1)$, where $\tilde{s}_{-i}^h$ allows h and has $\tilde{s}_{-i}^h (h^\prime ) =s^F_{-i} (h^\prime )$ for each $h^\prime $ that does not precede h. We can show that a sequential best response for $t_i^2$ is a strategy $s_i^2$ that defects in round one, cooperates at every second-round history, and defects at every third-round history.

At any history h that is allowed by grim trigger strategy $s_{-i}^G$, type $t_i^3$ believes that she is facing strategy-type pair $(s_{-i}^G, t_{-i}^1)$. At any history h that is not allowed by $s_{-i}^G$, type $t_i^3$ has the same belief as type $t_i^2$ does: $\beta _{i,h} (t_i^3) = \beta _{i,h} (t_i^2)$. We can show that a sequential best response for $t_i^3$ is a strategy $s_i^3$ that cooperates at each history $h \in \{ h^1, CC, CD, DD \}$ and defects at any other history.

At any history h that is allowed by strategy $s^3_{-i}$, type $t_i^4$ believes that she is facing strategy-type pair $(s_{-i}^3, t_{-i}^3)$. At the beginning of round two, if player $-i$ has defected in round one, type $t_i^4$ believes that she is facing $(s_{-i}^4, t_{-i}^4)$, where $s_{-i}^4$ defects at every history. At the beginning of round three, if player $-i$ has defected in the first two rounds, type $t_i^4$ believes she is facing $(s_{-i}^4, t_{-i}^4)$; if player $-i$ has cooperated in the first two rounds, type $t_i^4$ believes she is facing $(s_{-i}^3, t_{-i}^3)$; if player $-i$ has defected in round one and cooperated in round two, type $t_i^4$ believes she is facing $(s_{-i}^2, t_{-i}^2)$; if player $-i$ has cooperated in round one and defected in round two, type $t_i^4$ believes that she is facing $(s_{-i}^5, t_{-i}^5)$, where $s_{-i}^5$ cooperates in round one and defects at every other history. We can show that a sequential best response for $t_i^4$ is strategy $s_i^4$ that defects at every history.

At any history h that is allowed by strategy $s_{-i}^1$, type $t_i^5$ believes that she is facing strategy-type pair $(s_{-i}^1, t_{-i}^1)$. At any history h that is not allowed by $s_{-i}^1$, type $t_i^5$ has the same belief as type $t_i^4$ does: $\beta _{i,h} (t_i^5) = \beta _{i,h} (t_i^4)$. We can show that a sequential best response for $t_i^5$ is strategy $s_i^5$ that cooperates in round one and defects at every other history.

In Appendix B, we show that $\beta _i (t_i^{k_i})$ is a conditional probability system and $(s_i^{k_i}, t_i^{k_i})$ is a rational strategy-type pair for each $k_i \in \{1, 2, 3, 4, 5 \}$. For each outcome $\varvec{a}$ that has bilateral defection in round three, there is a state $(s_i^{k_i}, t_i^{k_i}, s_{-i}^{k_{-i}}, t_{-i}^{k_{-i}}) \in R$ such that the strategy profile $(s_i^{k_i}, s_{-i}^{k_{-i}})$ induces $\varvec{a}$. Specifically,

$$\begin{aligned} \begin{array}{ll} \xi (s^4_i, s^4_{-i}) = (DD, DD, DD) &{} \qquad \xi (s^5_i, s^5_{-i}) = (CC, DD, DD) \\ \xi (s^2_i, s^2_{-i}) = (DD, CC, DD) &{} \qquad \xi (s^1_i, s^1_{-i}) = (CC, CC, DD) \\ \xi (s^2_i, s^4_{-i}) = (DD, CD, DD) &{} \qquad \xi (s^1_i, s^5_{-i}) = (CC, CD, DD) \\ \xi (s^5_i, s^4_{-i}) = (CD, DD, DD) &{} \qquad \xi (s^3_i, s^4_{-i}) = (CD, CD, DD) \\ \xi (s^3_i, s^2_{-i}) = (CD, CC, DD) &{} \qquad \xi (s^5_i, s^2_{-i}) = (CD, DC, DD) \\ \end{array} \end{aligned}$$

By construction, at every history $h \in H$, epistemic types $t_i^4$ and $t_i^5$ assign probability one to a rational strategy-type pair. Hence, strategy-type pairs $(s_i^4, t_i^4)$ and $(s_i^5, t_i^5)$ satisfy rationality and first order strong belief of rationality. As seen above, for each outcome $\varvec{a}$ that has bilateral defection in the last two rounds, there is a state in $\{ (s_i^4, t_i^4), (s_i^5, t_i^5) \} \times \{ (s_{-i}^4, t_{-i}^4), (s_{-i}^5, t_{-i}^5) \} \subseteq R^2$ at which $\varvec{a}$ occurs. We note that $(s_i^{k_i}, t_i^{k_i}) \in R_i^1 {\setminus } R_i^2$ for $k_i \in \{1, 2, 3 \}$ (as $t_i^1$, $t_i^2$, and $t_i^3$ do not assign probability one to $R^1_{-i}$ ex ante) and $(s_i^{k_i}, t_i^{k_i}) \in R_i^2 {\setminus } R_i^3$ for $k_i \in \{4, 5 \}$ (as $t_i^4$ and $t_i^5$ do not assign probability one to $R^2_{-i}$ ex ante).

Remark 2

The epistemic game $({\mathcal {P}}, {\mathcal {T}}^\prime )$ has four key features. First, if $(s_i, t_i) \in R_i^m$, then $t_i$ always believes that her opponent will defect at every nth-round history with $n \ge M - m + 2$. Specifically, types $t_i^4$ and $t_i^5$ always believe that her opponent will defect at every third-round history. Second, if $(s_i, t_i) \in R_i^m {\setminus } R_i^{m+1}$, then ex ante type $t_i$ believes that her opponent will cooperate in each round $n \in \{ 1, \ldots , M - m + 1 \}$ if no defection has occurred. Third, there exists some strategy-type pair $(s_i, t_i) \in R_i^1$ that occasionally ‘forgives’ her opponent’s past defection. Specifically, on observing that player $-i$ has defected in round one, type $t_i^3$ believes that player $-i$ will cooperate in round two and play Tit-for-Tat in round three; thus, forgiving player $-i$’s defection and cooperating in round two will lead to outcome (CD, CC, DC) whereas defecting in round two will lead to outcome (CD, DC, DD), which implies that forgiving is optimal. Fourth, there exists some strategy-type pair $(s_i, t_i) \in R_i^1$ that occasionally defects due to the belief that her defection will be ‘forgiven’. For instance, strategy-type pair $(s^4_i, t^4_i)$ defects in round one as ex ante she believes she is facing $(s^3_{-i}, t^3_{-i})$ that forgives a first-round defection. The epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$ in Sect. 5 also has these features. In the next section, we will show how the optimality of forgiving and the belief that one’s defection will be forgiven generate a rich set of behavior outcomes at $R_1^{m_1} \times R_2^{m_2}$ for any pair $(m_1, m_2) \in \{ 1, \ldots , M \}^2$.

5 The richness of behaviors

In this section, we construct an epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$ such that, for each pair $(m_1, m_2) \in \{ 1, \ldots , M \}^2$, the set of outcomes consistent with $R_1^{m_1} \times R_2^{m_2}$ is the set of paths on which each player i defects in the last $m_i$. The following definition is useful for stating this result. For each pair $(m_1, m_2) \in {\mathbb {N}}_+^2$, define

$$\begin{aligned} A(m_1, m_2) \equiv \{ (a^1, \ldots , a^M) ~|~ a_i^n = D\ \text {for each} \ i\ \text {and each}\ n > M-m_i \}. \end{aligned}$$

When $(m_1, m_2) \in \{1, \ldots , M\}^2$, the set $A(m_1, m_2)$ is the set of paths on which each player i defects in the last $m_i$ rounds.

Theorem 1

There exists an epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$ that satisfies the following properties:

(i)
for each pair $(m_1, m_2) \in {\mathbb {N}}_+^2$ and each $\varvec{a} \in A(m_1, m_2)$, there is a state $(s_1, t_1, s_2, t_2) \in R_1^{m_1} \times R_2^{m_2}$ such that $\xi (s_1,s_2) = \varvec{a}$,
(ii)
for each pair $(m_1, m_2) \in {\mathbb {N}}_+^2$ and each $(s_1, t_1, s_2, t_2) \in R_1^{m_1} \times R_2^{m_2}$, we have $\xi (s_1,s_2) \in A(m_1, m_2)$.

Although Theorem 1 characterizes the set of outcomes consistent with $R_1^{m_1} \times R_2^{m_2}$ for any pair $(m_1, m_2) \in {\mathbb {N}}_+^2$, we are particularly interested in how cooperative behaviors arise at $R_1^{m_1} \times R_2^{m_2}$ for $(m_1, m_2) \in \{1, \ldots , M\}^2$. We devote most of this section to constructing the epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$ and addressing this question. Before doing that, we present a corollary that follows from Proposition 1 and Theorem 1.

Corollary 2

(a)
The epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$ has a state $(s^*_1, t^*_1, s^*_2, t^*_2)$ such that $(s^*_1, t^*_1, s^*_2, t^*_2) \in R^{M-1}$ and $\xi ^1(s^*_1, s^*_2) = CC$.
(b)
Fix an epistemic game $({\mathcal {P}}, {\mathcal {T}})$ and a state $(s_1, t_1, s_2, t_2)$ thereof. If $(s_1, t_1, s_2, t_2) \in R^{M-1}$ and $\xi ^n(s_1, s_2) = CC$ for some n, then $n = 1$ and $\xi ^l(s_1, s_2) = DD$ for each $l = 2, \ldots , M$.

Part (a) follows from Theorem 1. The path $\varvec{a^*}$ that has bilateral cooperation in round one and bilateral defections in all other rounds belongs to the set $A(M - 1, M - 1)$; hence, there is a state $(s^*_1, t^*_1, s^*_2, t^*_2)$ of $({\mathcal {P}}, {\mathcal {T}}^*)$ such that $(s^*_1, t^*_1, s^*_2, t^*_2) \in R^{M-1}$ and the strategy profile $(s^*_1, s^*_2)$ induces $\varvec{a^*}$. Part (b) follows from Proposition 1. In any epistemic game, at $R^{M-1}$, players defect in the last $M-1$ rounds on path. Hence, if bilateral cooperation occurs at $R^{M-1}$, then it must occur in round one. Informally, Corollary 2 states that bilateral cooperation might occur at $R^{M-1}$; in addition, if bilateral cooperation occurs at $R^{M-1}$, then it must occur in round one and be followed by bilateral defections in all subsequent rounds.

We construct the epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$ as follows. For each player i, let $\tilde{A}_i$ be a collection of pairs $(m_i, \varvec{a}) \in {\mathbb {N}} \times A^M$ such that $a_i^n = D$ for each $n > M - m_i$. We shall define a pair of mappings $(f_i, g_i)$ that maps each point in $\tilde{A}_i$ to a pair of strategy and epistemic type for player i. At each history, each epistemic type of player i assigns probability one to some point in $\tilde{A}_{-i}$, which is mapped to a strategy-type pair by $(f_{-i}, g_{-i})$. Our construction ensures that for each pair $(m_1, m_2) \in {\mathbb {N}}_+^2$ and each path $\varvec{a} \in A(m_1, m_2)$, the state $[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})]_{i=1,2}$ belongs to $R_1^{m_1} \times R_2^{m_2}$ and the strategy profile $[f_i(m_i, \varvec{a})]_{i=1,2}$ induces $\varvec{a}$.

5.1 Strategies

Fix $(m_i, \varvec{a}) \in \tilde{A}_i$. We define strategy $f_i (m_i, \varvec{a})$ below. At each history $h^n = (a^0, \ldots , a^{n-1})$ (equivalently, $h^n$ is consistent with $\varvec{a}$), let $f_i(m_i, \varvec{a})(h^n)= a_i^n$. At each history $h^n$ that is not consistent with $\varvec{a}$, let $f_i(m_i, \varvec{a})(h^n)= C$ if and only if the following hold: (a) $n \le M - m_i$, (b) at round $n^\prime $ when the first deviation from $\varvec{a}$ occurs, player $-i$ cooperates whereas $a_{-i}^{n^\prime }$ specifies ‘defect’, and both players cooperate from round $n^\prime + 1$ onwards (formally, if $h^n = (\bar{a}^0, \ldots , \bar{a}^{n-1})$, then $n^\prime = \min \{ n^{\prime \prime } ~|~ \bar{a}^{n^{\prime \prime }} \not = a^{n^{\prime \prime }} \}$, $\bar{a}_{-i}^{n^\prime } = C$, and $\bar{a}^{n^{\prime \prime }} = CC$ for each $n^{\prime \prime } \ge n^\prime + 1$).

Remark 3

We note that the strategy $f_i (m_i, \varvec{a})$ specifies ‘defect’ at every nth-round history with $n > M - m_i$. This ‘defection’ phase starts earlier if $m_i$ increases. In the next section, if player i is characterized by $(m_i, \varvec{a})$, we shall say player i has level $m_i$.

5.2 Beliefs

In this section, for each $(m_i, \varvec{a}) \in \tilde{A}_i$, we describe the beliefs of its corresponding epistemic type $g_i (m_i, \varvec{a})$. At each history $h \in H$, the epistemic type $g_i (m_i, \varvec{a})$ assigns probability one to a point $\eta _{-i} (m_i, \varvec{a}, h) \in \tilde{A}_{-i}$, which is mapped to a strategy-type pair by $(f_{-i}, g_{-i})$. For convenience, we decompose $\eta _{-i} (\cdot )$ into $\eta ^1_{-i} (\cdot ) \in {\mathbb {N}}$ and $\eta ^2_{-i} (\cdot ) \in A^M$.

Suppose $h = (a^0, \ldots , a^n)$ for some $n \le M - m_i$. A level-0 player believes her opponent has level 0, whereas a player with level $m_i \ge 1$ believes her opponent has level $m_i - 1$. Formally, $\eta ^1_{-i}(m_i,\varvec{a}, h) = \mathrm{max} \{0, m_i - 1\}$. To define $\eta ^2_{-i}(m_i,\varvec{a}, h)$, the following definition of $\tilde{m}_i$ is necessary: if $a^{n^\prime }_{-i} = D$ for some $n^\prime \in \{ n + 1, \ldots , M - m_i + 1 \}$, let $\tilde{m}_i \equiv \mathrm{min} \{ n^\prime \in \{ n + 1, \ldots , M - m_i + 1 \} ~|~ a^{n^\prime }_{-i} = D \}$; otherwise, let $\tilde{m}_i \equiv M - m_i + 1$. Let $\eta ^2_{-i}(m_i,\varvec{a}, h) = (h, \bar{a}^{n+1}, ..., \bar{a}^M)$, where $\bar{a}^l = a^l$ if $l < \tilde{m}_i$, $\bar{a}^{\tilde{m}_i} = a_i^{\tilde{m}_i} C$, $\bar{a}^l = CC$ if $\tilde{m}_i< l < M - m_i + 1$, $\bar{a}^{M - m_i + 1} = DC$, and $\bar{a}^l = DD$ if $l > M - m_i + 1$. As defined in Sect. 5.1, the strategy $s_{-i} \equiv f_{-i} [\eta _{-i}(m_i,\varvec{a}, h)]$ satisfies the following properties. First, $s_{-i}$ defects at every $n^\prime $th-round history with $n^\prime > M - m_i + 1$. Second, $s_{-i}$ cooperates at h (round $n + 1$). Third, at any round $n^\prime \in \{ n + 2, \ldots , M - m_i + 1 \}$, if both players have cooperated continually since round $n+1$, then $s_{-i}$ cooperates in round $n^\prime $. Fourth, depending on $\varvec{a}$, there might be some round $n^\prime \in \{ n + 2, \ldots , M - m_i + 1 \}$ in which $s_{-i}$ cooperates after player i has just defected; in this case, we say that player $-i$ ‘forgives’ her opponent’s past defection. For instance, by construction, if $a_i^l = D$ for some $l \in [n, \tilde{m}_i)$, then $s_{-i} (a^1, \ldots , a^l) = C$.

Suppose $h = (a^0, \ldots , a^n)$ for some $n > M - m_i$. If $h = (a^0)$, then let $\eta _{-i}(m_i,\varvec{a}, h) = (m_i - 1, DD, \ldots , DD)$. If $a_{-i}^n = C$, then let $\eta _{-i}(m_i,\varvec{a}, h) = (M - n, a^1, \ldots , a^n, DD, \ldots , DD)$. If $a_{-i}^n = D$, then we let $\tilde{n} \equiv \mathrm{min} \{ n^\prime \in \{ 1, \ldots , n \} ~|~ (a^{n^\prime }_{-i}, \ldots , a^n_{-i}) = (D, \ldots , D) \}$ be the round in which player $-i$ starts defecting continually and construct $\eta _{-i}(m_i,\varvec{a}, h)$ as follows. If $\tilde{n} = 1$, the history h is consistent with the behavior of a player $-i$ who has level $\max \{M, m_i - 1\}$ (see Remark 3); hence, let $\eta _{-i}(m_i,\varvec{a}, h) = (\max \{M, m_i - 1\}, DD, \ldots , DD)$. If $\tilde{n} > 1$, the history h is consistent with the behavior of a player $-i$ who has level $M - \tilde{n} + 1$ (see Remark 3); hence, let $\eta _{-i}(m_i,\varvec{a}, h) = (M - \tilde{n} + 1, a^0, \ldots , a^{\tilde{n} - 1}, DD, \ldots , DD)$. In each of these four cases, the strategy $f_{-i}[\eta _{-i}(m_i,\varvec{a}, h)]$ defects at every $n^\prime $th-round history with $n^\prime > n$.

Suppose $h = (\bar{a}^1, \ldots , \bar{a}^n)$ is inconsistent with $\varvec{a}$. We define the longest common predecessor (namely, $h^*$) of h and $\varvec{a}$ as follows. Let $n^\prime = \min \{ n^{\prime \prime } ~|~ \bar{a}^{n^{\prime \prime }} \not = a^{n^{\prime \prime }} \}$ be the point at which h and $\varvec{a}$ start to diverge. If $\bar{a}_{-i}^{n^{\prime }} \not = a_{-i}^{n^{\prime }}$ (player $-i$ deviates in round $n^\prime $), let $h^* \equiv (a^1, \ldots , a^{n^{\prime } - 1})$; otherwise, let $h^* \equiv (a^1, \ldots , a^{n^{\prime }})$. At history $h^*$, epistemic type $g_i (m_i, \varvec{a})$ believes her opponent is $\eta _{-i}(m_i, \varvec{a}, h^*)$ (constructed above). If history h is allowed by strategy $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h^*)]$, then $g_i (m_i, \varvec{a})$ believes her opponent is $\eta _{-i}(m_i, \varvec{a}, h^*)$ conditional on h (as required by Bayes’ rule). Formally, $\eta _{-i}(m_i,\varvec{a}, h) = \eta _{-i}(m_i,\varvec{a}, h^*)$. If history h is not allowed by strategy $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h^*)]$, then we construct $\eta _{-i}(m_i,\varvec{a}, h)$ as follows: if $\bar{a}_{-i}^n = C$, let $\eta _{-i}(m_i,\varvec{a}, h) = (M - n, h, DD, \ldots , DD)$; otherwise, let $\tilde{n}$ be the round in which player $-i$ starts defecting continually and let $\eta _{-i}(m_i,\varvec{a}, h) = (M - \tilde{n} + 1, \bar{a}^0, \ldots , \bar{a}^{\tilde{n} - 1}, DD, \ldots , DD)$.

We have completed our definition of $\eta _{-i}: \tilde{A}_i \times H \rightarrow \tilde{A}_{-i}$. The type structure ${\mathcal {T}}^* \equiv ({\mathcal {P}}; (T_i, {\mathcal {E}}_i, \beta _i)_{i=1,2})$ is constructed as follows. The epistemic type set for each player i is $T_i = \{g_i(m_i, \varvec{a}) ~|~ (m_i, \varvec{a}) \in \tilde{A}_i \}$. The belief of epistemic type $g_i(m_i, \varvec{a})$ at history h is

$$\begin{aligned} \beta _{i,h}[g_i(m_i, \varvec{a})] \Big [ f_{-i}[\eta _{-i} (m_i, \varvec{a},h)], g_{-i}[\eta _{-i}(m_i, \varvec{a},h)] \Big ] = 1. \end{aligned}$$

In Appendix C, we show $\beta _i [g_i(m_i, \varvec{a})]$ is a conditional probability system.

5.3 Rationality

In Appendix C, we show that a strategy-type pair $[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})]$ is rational for each $m_i \in {\mathbb {N}}_+$. We sketch out the proof below.

At every nth-round history with $n \le M - m_i + 1$, type $g_i(m_i, \varvec{a})$ believes that her opponent will defect at every $n^\prime $th-round history with $n^\prime \ge M - m_i + 2$. At every nth-round history with $n > M - m_i + 1$, type $g_i(m_i, \varvec{a})$ believes that her opponent will defect at every $n^\prime $th-round history with $n^\prime \ge n$. Hence, it is optimal for type $g_i(m_i, \varvec{a})$ to defect at every nth-round history with $n \ge M - m_i + 1$ as specified by strategy $f_i(m_i, \varvec{a})$.

Fix a history $h = (a^0, \ldots , a^{n-1})$ such that $n < M - m_i + 1$ and $a_i^n = C$. As h is consistent with $\varvec{a}$, we have $f_i(m_i, \varvec{a})(h) = a_i^n = C$ (by construction). At h, type $g_i(m_i, \varvec{a})$ believes that playing $f_i(m_i, \varvec{a})$ will lead to a continuation path that consists of only CC and DC until round $M - m_i + 1$, whereas playing D at h will induce player $-i$ to defect at every $n^\prime $th-round history with $n^\prime > n$. It is easy to show that rationality requires type $g_i(m_i, \varvec{a})$ to play C at h. If player i plays C after her opponent has just defected ($a_{-i}^{n-1} = D$), we say that player i ‘forgives’ her opponent’s past defection. As showed above, forgiving is optimal under conditional belief $\beta _{i,h}[g_i(m_i, \varvec{a})]$.

Fix a history $h = (a^0, \ldots , a^{n-1})$ such that $n < M - m_i + 1$ and $a_i^n = D$. As h is consistent with $\varvec{a}$, we have $f_i(m_i, \varvec{a})(h) = a_i^n = D$ (by construction). At h, type $g_i(m_i, \varvec{a})$ believes that player $-i$ will cooperate in round $n+1$ no matter what player i does in round n (i.e., player $-i$ will forgive player i’s past defection); in addition, player $-i$ will cooperate until round $M - m_i + 1$ if no one has defected since round $n+1$. It is easy to show that rationality requires type $g_i(m_i, \varvec{a})$ to play D at h.

Fix a history $h = (\bar{a}^1, \ldots , \bar{a}^{n-1})$ that is inconsistent with $\varvec{a}$ and satisfies both conditions (a)–(b) specified in Sect. 5.1. At h, type $g_i(m_i, \varvec{a})$ believes that playing $f_i(m_i, \varvec{a})$ will lead to a continuation path that consists of only CC until round $M - m_i$ and DC in round $M - m_i + 1$, whereas playing D at h will induce player $-i$ to defect at every $n^\prime $th-round history with $n^\prime > n$. It is easy to show that rationality requires type $g_i(m_i, \varvec{a})$ to play C at h.

Fix a history $h = (\bar{a}^1, \ldots , \bar{a}^{n-1})$ that is inconsistent with $\varvec{a}$ and does not satisfy both conditions (a)–(b) specified in Sect. 5.1. At h, type $g_i(m_i, \varvec{a})$ believes that her opponent will defect at every history that follows h. Hence, rationality requires type $g_i(m_i, \varvec{a})$ to play D at h.

Remark 4

In the epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$, ‘forgiving the opponent’s past defection’ is optimal under some conditional beliefs; in addition, each player i has some epistemic types that, at some histories, assign probability one to the event that ‘player $-i$ will forgive player i’s past defection’. The optimality of forgiving and the belief that one’s defection will be forgiven play important roles in generating the richness of the set of outcomes at $R_1^{m_1} \times R_2^{m_2}$: any path on which each player i defects in the last $m_i$ rounds is possible. When forgiving is optimal, a rational player cooperates after her opponent has just defected. When a player believes that her opponent will forgive her past defection, she might cooperate after she herself has just defected. If a player believes there is a phase during which her opponent plays grim trigger strategy^{Footnote 7} and does not forgive any past defection, then it is optimal to cooperate throughout this phase except for the last round.

5.4 Strong belief of rationality

In Appendix C, we show $[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] \in R_i^{m_i}$ for $m_i \ge 2$. For notational convenience, let $\phi _i(m_i, \varvec{a}) \equiv [f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})]$. In the following, we assume there are three rounds and show $\phi _i(m_i, \varvec{a}) \in R_i^{m_i}$ for $m_i \in \{ 2, 3 \}$. By construction, a strategy-type pair $\phi _i(m_i, \varvec{a})$ with $m_i \in \{ 2, 3 \}$ always assigns probability one to some $\phi _{-i}(m_{-i}, \varvec{a^\prime })$ with $m_{-i} \in \{ 1, 2 \}$. As discussed in Sect. 5.3, a strategy-type pair $\phi _{-i}(m_{-i}, \varvec{a^\prime })$ with $m_{-i} \in \{ 1, 2 \}$ is rational. Hence, $\phi _i(m_i, \varvec{a}) \in R_i^2$ for $m_i \in \{ 2, 3 \}$. It is left to show type $g_i(3, \varvec{a})$ strongly believes $R_{-i}^2$. By construction, type $g_i(3, \varvec{a})$ assigns probability one to some $\phi _{-i}(1, \varvec{a^\prime })$ if player $-i$ has just cooperated in round two, and assigns probability one to some $\phi _{-i}(2, \varvec{a^\prime }) \in R_{-i}^2$ at every other history. A player who satisfies $R_{-i}^2$ believes that her opponent defects at every third-round history; thus, cooperating in round two is inconsistent with $R_{-i}^2$. It follows that $g_i(3, \varvec{a})$ strongly believes $R_{-i}^2$: it assigns probability one to $R_{-i}^2$ whenever possible.

We conclude this section by giving an example that illustrates how the optimality of forgiving and the belief that one’s defection will be forgiven generate the richness of the set of behavior outcomes.

Example 1

Assume there are five rounds. The epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$ has a state $(s_1, t_1, s_2, t_2) \in R^2$ such that the strategy profile $(s_1, s_2)$ induces the following path:

$$\begin{aligned} (DC, CD, CC, DD, DD). \end{aligned}$$

Ex-ante, player 1 believes that her opponent will cooperate in round 1 and keep cooperating until round 4 if no one has defected since round 1. In addition, player 1 believes that her unilateral defection in round 1 will be forgiven: her opponent will still cooperate in round 2 and keep cooperating until round 4 if no one has defected since round 2. As a best response, player 1 defects in round 1 and cooperates in round 2. However, this prior belief turns out to be incorrect: player 2 in fact defects in round 2 due to the belief that this defection will be forgiven. Player 1 does forgive and cooperate in round 3, believing that player 2 will respond by cooperating in round 4. However, in round 4, both players defect since both strongly believe that their opponents will defect at every fifth-round history.

6 Sufficiently rich type structure

In Sect. 5, we construct epistemic game $({\mathcal {P}}, {\mathcal {T}}^*)$ in which, for each pair $(m_1, m_2) \in {\mathbb {N}}_+^2$, the set of outcomes consistent with $R_1^{m_1} \times R_2^{m_2}$ is the set of paths on which each player i defects in the last $m_i$ rounds. In this section, we use the type structure ${\mathcal {T}}^*$ to show that for any type structure that satisfies a richness condition introduced by Perea (2012), the set of outcomes consistent with $R_1^{m_1} \times R_2^{m_2}$ is also the set of paths on which each player i defects in the last $m_i$ rounds. The type structure ${\mathcal {T}}^*$ is sufficiently rich. An extension of ${\mathcal {T}}^*$, which is obtained by adding new epistemic types into ${\mathcal {T}}^*$, is also sufficiently rich.^{Footnote 8} We note that a complete type structure, which contains all beliefs, is an extension of ${\mathcal {T}}^*$. As discussed in Battigalli and Friedenberg (2012), a type structure specifies sets of possible beliefs, which might have been formed by social conventions or a history. An analyst who does not know which beliefs are possible might be interested in studying behavioral implications of an epistemic condition across different type structures. Although we focus on sufficiently rich type structures, we comment on other type structures in Sect. 7.

We formalize the richness condition introduced by Perea (2012) below. In Sect. 2.4, we fix a type structure and let $R_i^m$ denote the set of player i’s strategy-type pairs that satisfy rationality and $(m-1)$th-order strong belief of rationality. In this section, we examine different type structures and let $R_i^m ({\mathcal {T}})$ denote the set of player i’s strategy-type pairs that satisfy rationality and $(m-1)$th-order strong belief of rationality for type structure ${\mathcal {T}}$. Fix a type structure ${\mathcal {T}}$, a player i, and an order $m \in {\mathbb {N}}$. We say that a history $h \in H$ is consistent with $R_i^m ({\mathcal {T}})$ if there is some strategy-type pair $(s_i, t_i) \in R_i^m ({\mathcal {T}})$ such that $s_i$ allows h. Let ${\mathcal {H}}[R_i^m ({\mathcal {T}})]$ be the set of histories that are consistent with $R_i^m ({\mathcal {T}})$.

Definition 4

A type structure ${\mathcal {T}}$ is sufficiently rich if for each player i, each order $m \in {\mathbb {N}}_+$, and each type structure ${\mathcal {T}}^\prime $ such that ${\mathcal {H}}[R_j^{m^\prime } ({\mathcal {T}}^\prime )] = {\mathcal {H}}[R_j^{m^\prime } ({\mathcal {T}})]$ for each $m^\prime < m$ and each $j \in \{1, 2\}$, we have ${\mathcal {H}}[R_i^m ({\mathcal {T}}^\prime )] \subseteq {\mathcal {H}}[R_i^m ({\mathcal {T}})]$.

We define a sufficiently rich type structure informally below. Fix a type structure ${\mathcal {T}}$, a player i, and an order $m \in {\mathbb {N}}_+$. Fix a type structure ${\mathcal {T}}^\prime $ such that, for each $m^\prime < m$ and each $j \in \{1, 2\}$, the set of histories consistent with $R_j^{m^\prime } ({\mathcal {T}}^\prime )$ and the set of histories consistent with $R_j^{m^\prime } ({\mathcal {T}})$ are identical. If ${\mathcal {T}}$ is sufficiently rich, then any history h consistent with $R_i^m ({\mathcal {T}}^\prime )$ is also consistent with $R_i^m ({\mathcal {T}})$: there is some strategy-type pair $(s_i, t_i) \in R_i^m ({\mathcal {T}})$ such that $s_i$ allows h. Conversely, if ${\mathcal {T}}$ is sufficiently rich, then any history h inconsistent with $R_i^m ({\mathcal {T}})$ is also inconsistent with $R_i^m ({\mathcal {T}}^\prime )$.

The following proposition implies that if a type structure ${\mathcal {T}}$ is incomplete but sufficiently rich, then for each type structure ${\mathcal {T}}^\prime $ that is an extension of ${\mathcal {T}}$, the set of outcomes consistent with $R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$ and the set of outcomes consistent with $R_1^{m_1} ({\mathcal {T}}^\prime ) \times R_2^{m_2} ({\mathcal {T}}^\prime )$ are identical. We note that a complete type structure is an extension of ${\mathcal {T}}$. Thus, the concept of sufficiently rich type structure might be useful if one aims to study behavioral implications of an epistemic condition for a complete type structure but finds it more convenient to work with incomplete type structures.

Proposition 2

Fix a sufficiently rich type structure ${\mathcal {T}}$ and a type structure ${\mathcal {T}}^\prime $ that is an extension of ${\mathcal {T}}$. For each player i and each order $m \in {\mathbb {N}}_+$, we have $R_i^m ({\mathcal {T}}) \subseteq R_i^m ({\mathcal {T}}^\prime )$ and ${\mathcal {H}}[R_i^{m} ({\mathcal {T}})] = {\mathcal {H}}[R_i^{m} ({\mathcal {T}}^\prime )]$.

Fix $(m_1, m_2) \in {\mathbb {N}}_+^2$. Since $R_i^m ({\mathcal {T}}) \subseteq R_i^m ({\mathcal {T}}^\prime )$ for each i and each $m \in {\mathbb {N}}_+$, it is clear that an outcome consistent with $R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$ is also consistent with $R_1^{m_1} ({\mathcal {T}}^\prime ) \times R_2^{m_2} ({\mathcal {T}}^\prime )$. Conversely, let $\varvec{a} \equiv (a^1, \ldots , a^{M-1}, DD)$ be an outcome consistent with $R_1^{m_1} ({\mathcal {T}}^\prime ) \times R_2^{m_2} ({\mathcal {T}}^\prime )$. Denote $h \equiv (a^1, \ldots , a^{M-1})$. Since ${\mathcal {H}}[R_i^{m} ({\mathcal {T}})] = {\mathcal {H}}[R_i^{m} ({\mathcal {T}}^\prime )]$ for each i and each $m \in {\mathbb {N}}_+$, there exists some $(s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$ such that both $s_1$ and $s_2$ allow h. It is obvious that both $s_1$ and $s_2$ defect at h. Thus, $\xi (s_1, s_2) = \varvec{a}$. This implies $\varvec{a}$ is consistent with $R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$.

If a type structure ${\mathcal {T}}$ is not sufficiently rich, then there might exist an extension ${\mathcal {T}}^\prime $ of ${\mathcal {T}}$ and an outcome that is consistent with $R_i^m ({\mathcal {T}})$ but inconsistent with $R_i^m ({\mathcal {T}}^\prime )$ for some i and some $m \in {\mathbb {N}}_+$. To see how this might arise, suppose ${\mathcal {H}}[R_j^{m^\prime } ({\mathcal {T}}^\prime )] = {\mathcal {H}}[R_j^{m^\prime } ({\mathcal {T}})]$ for each $m^\prime < m-1$ and each $j \in \{1, 2 \}$ and there is some history $h^*$ such that $h^*$ is inconsistent with $R_{-i}^{m-1} ({\mathcal {T}})$ but consistent with $R_{-i}^{m-1} ({\mathcal {T}}^\prime )$. In addition, suppose $R_{-i}^{m-1} ({\mathcal {T}}^\prime ) \cap S_{-i}(h^*) \times T^\prime _{-i}$ are not present in epistemic game $({\mathcal {P}}, {\mathcal {T}})$. Then a strategy-type pair $(s_i, t_i) \in R_i^{m} ({\mathcal {T}})$ must assign probability zero to $R_{-i}^{m-1} ({\mathcal {T}}^\prime )$ at $h^*$, which implies $(s_i, t_i) \not \in R_i^{m} ({\mathcal {T}}^\prime )$. Consequently, there is a behavior outcome that is consistent with $R_i^{m} ({\mathcal {T}})$ but inconsistent with $R_i^{m} ({\mathcal {T}}^\prime )$. For the battle of the sexes with an outside option, Battigalli and Siniscalchi (2002) present a type structure ${\mathcal {T}}$ that is not sufficiently rich and a type structure ${\mathcal {T}}^\prime $ that is an extension of ${\mathcal {T}}$. They show there is a behavior outcome that is consistent with $R^{\infty } ({\mathcal {T}})$ but inconsistent with $R^{\infty } ({\mathcal {T}}^\prime )$. For other examples, see Perea (2012).

The following theorem claims that for any sufficiently rich type structure ${\mathcal {T}}$ and any pair $(m_1, m_2) \in {\mathbb {N}}_+^2$, the set of outcomes consistent with $R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$ is the set of paths on which each player i defects in the last $m_i$ rounds.

Theorem 2

Fix a sufficiently rich type structure ${\mathcal {T}}$ and a pair $(m_1, m_2) \in {\mathbb {N}}_+^2$. For each $\varvec{a} \in A(m_1, m_2)$, there exists a state $(s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$ such that $\xi (s_1, s_2) = \varvec{a}$. Conversely, for each $(s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$, we have $\xi (s_1, s_2) \in A(m_1, m_2)$.

Proof

For each i and each $m_i \in {\mathbb {N}}_+$, define

$$\begin{aligned} {\mathbb {H}}(i, m_i)= & {} \{ (a^0, \ldots , a^n) \in H |\\&\quad \text {if}\ n \ge M - m_i + 1\ \text {then}\ a^{n^\prime }_i = D\ \text {for each} \ n^\prime \ge M - m_i + 1 \}. \end{aligned}$$

First, we show that ${\mathbb {H}}(i, m_i)$ is the set of histories consistent with $R_i^{m_i} ({\mathcal {T}}^*)$, where ${\mathcal {T}}^*$ is the type structure constructed in Sect. 5. By construction, for each $(s_i, t_i) \in R_i^{m_i} ({\mathcal {T}}^*)$, the strategy $s_i$ defects at every nth-round history with $n \ge M - m_i + 1$. Hence, if a history h is consistent with $R_i^{m_i} ({\mathcal {T}}^*)$, then $h \in {\mathbb {H}}(i, m_i)$. In the following, we show that any history $h \in {\mathbb {H}}(i, m_i)$ is consistent with $R_i^{m_i} ({\mathcal {T}}^*)$. Fix any $h \in {\mathbb {H}}(i, m_i)$. Note that there exists a path $\varvec{a} \in A (m_i, 1)$ such that h is a subsequence of $\varvec{a}$. By Theorem 1, there is a state $(s_i, t_i, s_{-i}, t_{-i})$ of $({\mathcal {P}}, {\mathcal {T}}^*)$ such that $(s_i, t_i, s_{-i}, t_{-i}) \in R_i^{m_i} ({\mathcal {T}}^*) \times R_{-i}^1 ({\mathcal {T}}^*)$ and the strategy profile $(s_i,s_{-i})$ induces $\varvec{a}$. This implies $(s_i, t_i) \in R_i^{m_i} ({\mathcal {T}}^*)$ and $s_i$ allows h. Hence, h is consistent with $R_i^{m_i} ({\mathcal {T}}^*)$.

Next, we show that ${\mathbb {H}}(i, m_i)$ is the set of histories consistent with $R_i^{m_i} ({\mathcal {T}})$. The proof is by induction.

Step 1. For each player i and each strategy-type pair $(s_i, t_i) \in R_i^1({\mathcal {T}})$, it is obvious that $s_i$ defects at every last-round history. Hence, if a history h is consistent with $R_i^1({\mathcal {T}})$, then $h \in {\mathbb {H}}(i, 1)$. For each $j \in \{1, 2\}$, the set of histories consistent with $R^0_j ({\mathcal {T}})$ and the set of histories consistent with $R^0_j ({\mathcal {T}}^*)$ are identical (they both are the set of non-terminal histories). Since ${\mathcal {T}}$ is sufficiently rich, for each player i, any history consistent with $R^1_i ({\mathcal {T}}^*)$ is also consistent with $R^1_i ({\mathcal {T}})$, which implies any $h \in {\mathbb {H}}(i, 1)$ is consistent with $R^1_i ({\mathcal {T}})$. Consequently, ${\mathbb {H}}(i, 1)$ is the set of histories consistent with $R_i^1 ({\mathcal {T}})$.

Step 2. Fix a player i, a strategy-type pair $(s_i, t_i) \in R_i^2({\mathcal {T}})$, and a history $h \in A^{M-2}$. Since h is consistent with $R_{-i}^1({\mathcal {T}})$ and any $(s_{-i}, t_{-i}) \in R_{-i}^1({\mathcal {T}})$ has $s_{-i}$ defect at every last-round history, we have $s_i(h) = D$. Hence, if a history h is consistent with $R_i^2({\mathcal {T}})$, then $h \in {\mathbb {H}}(i, 2)$. It follows from Step 1 that for each $m^\prime < 2$ each $j \in \{1, 2\}$, the set of histories consistent with $R^{m^\prime }_j ({\mathcal {T}})$ and the set of histories consistent with $R^{m^\prime }_j ({\mathcal {T}}^*)$ are identical. Since ${\mathcal {T}}$ is sufficiently rich, for each player i, any history consistent with $R^2_i ({\mathcal {T}}^*)$ is also consistent with $R^2_i ({\mathcal {T}})$, which implies any $h \in {\mathbb {H}}(i, 2)$ is consistent with $R^2_i ({\mathcal {T}})$. Consequently, ${\mathbb {H}}(i, 2)$ is the set of histories consistent with $R_i^2 ({\mathcal {T}})$.

And so on.

Fix a pair $(m_1, m_2) \in {\mathbb {N}}_+^2$. As showed above, for each player i, any strategy-type pair $(s_i, t_i) \in R_i^{m_i}({\mathcal {T}})$ has $s_i$ defects at every nth-round history with $n \ge M - m_i + 1$. Hence any outcome consistent with $R_1^{m_1}({\mathcal {T}}) \times R_2^{m_2}({\mathcal {T}})$ has each player i defect in the last $m_i$ rounds. Conversely, fix a path $\varvec{a} \in A(m_1, m_2)$ on which each player i defects in the last $m_i$ rounds. The history $(a^1, \ldots , a^{M-1})$ is in both ${\mathbb {H}}(1, m_1)$ and ${\mathbb {H}}(2, m_2)$; hence, it is consistent with both $R_1^{m_1}({\mathcal {T}})$ and $R_2^{m_2}({\mathcal {T}})$. This implies there is state $(s_1, t_1, s_2, t_2) \in R_1^{m_1}({\mathcal {T}}) \times R_2^{m_2}({\mathcal {T}})$ such that both $s_1$ and $s_2$ allow $(a^1, \ldots , a^{M-1})$. We note that $a^M = DD$ and $s_i$ defects at every last-round history for each i. It follows that the strategy profile $(s_1, s_2)$ induces $\varvec{a}$. Hence, $\varvec{a}$ is consistent with $R_1^{m_1}({\mathcal {T}}) \times R_2^{m_2}({\mathcal {T}})$. $\square $

For Theorem 2, the assumption that the type structure is sufficiently rich is important. If ${\mathcal {T}}$ is not sufficiently rich, then there might be some history $h \in {\mathbb {H}}(i,m_i)$ that is inconsistent with $R_i^{m_i}({\mathcal {T}})$; equivalently, for player i, there is no belief that satisfies $(m_i - 1)$th order strong belief of rationality and rationalizes h. Consequently, any path that passes through h is impossible at $R_i^{m_i}({\mathcal {T}}) \times R_{-i}^{m_{-i}}({\mathcal {T}})$. This implies that some path on which each player $j \in \{i, -i\}$ defects in the last $m_j$ rounds is impossible at $R_i^{m_i}({\mathcal {T}}) \times R_{-i}^{m_{-i}}({\mathcal {T}})$. In addition, if ${\mathcal {T}}$ is not sufficiently rich, then some path on which some player i cooperates in some round $n \ge M - m_i + 1$ might be possible at $R_1^{m_1}({\mathcal {T}}) \times R_2^{m_2}({\mathcal {T}})$. For instance, in Example 2 (Appendix E), for a four-round Prisoner’s Dilemma, we construct an insufficiently rich type structure ${\mathcal {T}}$ such that: at some state $(s_1^3, t_1^3, s_2^1, t_2^1) \in R_1^3({\mathcal {T}}) \times R_2^1({\mathcal {T}})$, the path of play is $\xi (s_1^3, s_2^1) = (DC, CC, DC, DD)$, on which player 1 cooperates in round 2. In this epistemic game, for player 2, there is no epistemic type that satisfies first order strong belief of rationality and rationalizes history (DC) [equivalently, there is no strategy-type pair $(s_2, t_2) \in R_2^2({\mathcal {T}})$ such that $s_2 (h^1) = C$]. Ex ante, type $t_1^3$ believes that she is facing some strategy-type pair $(s^2_2, t^2_2) \in R_2^2({\mathcal {T}})$, where $s_2^2 (h^1) = D$. At history (DC) [that is inconsistent with $R_2^2({\mathcal {T}})$], type $t_1^3$ assigns probability one to $(s_2^1, t_2^1) \in R_2^1({\mathcal {T}})$, where $s_2^1$ cooperates at (DC), and cooperates in round 3 only if player 1 cooperates at (DC). With this belief, cooperating at (DC) is optimal for type $t_i^3$. On the contrary, in an epistemic game with a sufficiently rich type structure ${\mathcal {T}}^\prime $, history (DC) is consistent with $R_2^2({\mathcal {T}}^\prime )$; thus, at (DC), type $t_i^3$ that strongly believes $R_2^2({\mathcal {T}}^\prime )$ must assign probability one to $R_2^2({\mathcal {T}}^\prime )$; with this belief, defecting at (DC) is optimal for type $t_i^3$.

It is easy to show that the type structure ${\mathcal {T}}^*$ and its extensions are sufficiently rich (the proof is in Appendix D). We note that a complete type structure is an extension of ${\mathcal {T}}^*$. Theorem 2 implies that for all these type structures, the set of outcomes consistent with $R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$ is the set of paths on which each player i defects in the last $m_i$ rounds.

7 Discussion

7.1 Insufficiently rich type structures

In Example 2 (Appendix E), we assume the Finitely Repeated Prisoner’s Dilemma has 4 rounds, and present an insufficiently rich type structure ${\mathcal {T}}$ such that: at some state in $R_1^3({\mathcal {T}}) \times R_2^1({\mathcal {T}})$, player 1 cooperates in round 2 on path. We conjecture that: for each pair $(m_1, m_2) \in \{1, \ldots , M - 1\}^2$ with $m_1 > m_2$, and each path $\varvec{a}$ on which player 1 defects in the last $m_2 + 1$ rounds and player 2 defects in the last $m_2$ rounds, there exists an insufficiently rich type structure ${\mathcal {T}}$ and a state $(s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$ such that $\xi (s_1, s_2) = \varvec{a}$. If this conjecture is correct, then we can characterize the set of outcomes that can arise when each player i satisfies rationality and $(m_i - 1)$th order strong belief of rationality across all type structures [an outcome $\varvec{a}$ belongs to this set if and only if there exists a type structure ${\mathcal {T}}$ and a state $(s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$ such that $\xi (s_1, s_2) = \varvec{a}$]. We elaborate on this below.

For each pair $(m_1, m_2) \in \{1, \ldots , M - 1\}^2$ such that $m_1 = m_2$, it follows from Proposition 1 and Theorem 1 that the set of outcomes that can arise when each player i satisfies rationality and $(m_i - 1)$th order strong belief of rationality across all type structures is the set of paths on which each player i defects in the last $m_i$ rounds.

In the following, we fix some $(m_1, m_2) \in \{1, \ldots , M -1\}^2$ with $m_1 > m_2$. As stated in Remark 5 (Appendix A), for each type structure ${\mathcal {T}}$ and each state $(s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$, player 1 defects in the last $m_2 + 1$ rounds and player 2 defects in the last $m_2$ rounds on the path $\xi (s_1, s_2)$. If the aforementioned conjecture is correct, then the set of outcomes that can arise when each player i satisfies rationality and $(m_i - 1)$th order strong belief of rationality across all type structures is the set of paths on which player 1 defects in the last $m_2 + 1$ rounds and player 2 defects in the last $m_2$ rounds.

7.2 Monotonicity

It is well-known that strong belief fails monotonicity; equivalently, $E \subseteq F$ does not imply $SB_i (E) \subseteq SB_i (F)$. In Example 3 (Appendix E), we show how strong belief fails monotonicity for the Finitely Repeated Prisoner’s Dilemma. In particular, we present two type structures ${\mathcal {T}}$ and ${\mathcal {T}}^\prime $ such that $R_{-i}^1 ({\mathcal {T}}) \subseteq R_{-i}^1 ({\mathcal {T}}^\prime )$ but $SB_i[R_{-i}^1 ({\mathcal {T}})] \not \subseteq SB_i [R_{-i}^1 ({\mathcal {T}}^\prime )]$. In epistemic game $({\mathcal {P}}, {\mathcal {T}})$, there is some $(s_i^2, t_i^2) \in SB_i[R_{-i}^1 ({\mathcal {T}})]$, where type $t_i^2$ assigns probability one to an irrational strategy-type pair at history (DC, CD) as this history is inconsistent with $R_{-i}^1 ({\mathcal {T}})$. In epistemic game $({\mathcal {P}}, {\mathcal {T}}^\prime )$, history (DC, CD) is consistent with $R_{-i}^1 ({\mathcal {T}}^\prime )$; as type $t_i^2$ fails to assign probability one to $R_{-i}^1 ({\mathcal {T}}^\prime )$ whenever possible, we have $(s_i^2, t_i^2) \not \in SB_i[R_{-i}^1 ({\mathcal {T}}^\prime )]$.

Notes

We say that players satisfy rationality and common SB if they are rational and satisfy mth order SB for each $m \in {\mathbb {N}}$.
Player i’s belief is a conditional probability system that, at each conditioning event, specifies a probability assessment over her opponent’s strategies and beliefs.
More precisely, Perea (2012) discusses the importance of using an epistemic model that contains sufficiently many types for defining some order of strong belief of rationality. The term ‘sufficiently rich type structure’ in our paper is equivalent to ‘epistemic model that contains sufficiently many types’ in Perea (2012).
More formally, for each $j \in \{1, 2\}$ and each $m^\prime < m_i$, the set of histories consistent with the hypothesis that player j is rational and satisfies $(m^\prime -1)$th order SB in the game with ${\mathcal {T}}^\prime $ is identical to that in the game with ${\mathcal {T}}$.
A complete type structure contains all possible beliefs. See Brandenburger (2003) for a definition.
For a definition of conditional probability system, see, for instance, Battigalli and Friedenberg (2012).
The opponent cooperates in the first round of this phase, then cooperates only if no one has defected since the beginning of this phase.
A type structure ${\mathcal {T}}$ is an extension of ${\mathcal {T}}^*$ if the epistemic type set for player i in ${\mathcal {T}}$ is a superset of the epistemic type set for player i in ${\mathcal {T}}^*$

References

Andreoni J, Miller JH (1993) Rational cooperation in the finitely repeated prisoners dilemma: experimental evidence. Econ J 103(418):570–585
Article Google Scholar
Battigalli P, Friedenberg A (2012) Forward induction reasoning revisited. Theor Econ 7(1):57–98
Article Google Scholar
Battigalli P, Siniscalchi M (1999) Hierarchies of conditional beliefs and interactive epistemology in dynamic games. J Econ Theory 88(1):188–230
Article Google Scholar
Battigalli P, Siniscalchi M (2002) Strong belief and forward induction reasoning. J Econ Theory 106(2):356–391
Article Google Scholar
Brandenburger A (2003) Cognitive Processes and Economic Behavior, Routledge, chap On the Existence of a ’Complete’ Possibility Structure, pp. 30–34
Brandenburger A, Danieli A, Friedenberg A (2019) Finite-order epistemics, working paper
Cooper R, DeJong DV, Forsythe R, Ross TW (1996) Cooperation without reputation: experimental evidence from prisoners dilemma games. Games Econ Behav 12(2):187–218
Article Google Scholar
Dal Bo P, Frechette GR (2011) The evolution of cooperation in infinitely repeated games: experimental evidence. Am Econ Rev 101(1):411–29. https://doi.org/10.1257/aer.101.1.411
Article Google Scholar
Dijkstra J, van Assen MALM (2017) Explaining cooperation in the finitely repeated simultaneous and sequential prisoners dilemma game under incomplete and complete information. J Math Soc 41(1):1–25
Article Google Scholar
Embrey M, Frechette GR, Yuksel S (2017) Cooperation in the finitely repeated Prisoners Dilemma. Q J Econ 133(1):509–551. https://doi.org/10.1093/qje/qjx033
Article Google Scholar
Friedenberg A (2019) Bargaining under strategic uncertainty: the role of second-order optimism. Econometrica 87(6):1835–1865. https://doi.org/10.3982/ECTA14534
Article Google Scholar
Fudenberg D, Maskin E (1986) The folk theorem in repeated games with discounting or with incomplete information. Econometrica 54(3):533–554
Article Google Scholar
Hirshleifer D, Rasmusen E (1989) Cooperation in a repeated prisoners dilemma with ostracism. J Econ Behav Org 12(1):87–106
Article Google Scholar
Kagel J, McGee P (2014) Personality and cooperation in finitely repeated prisoners dilemma games. Econ Lett 124(2):274–277. https://doi.org/10.1016/j.econlet.2014.05.034
Article Google Scholar
Kreps DM, Milgrom P, Roberts J, Wilson R (1982) Rational cooperation in the finitely repeated prisoners dilemma. J Econ Theory 27(2):245–252
Article Google Scholar
Morehous LG (1966) One-play, two-play, five-play, and ten-play runs of prisoners dilemma. J Conflict Resolut 10(3):354–362
Article Google Scholar
Neyman A (1985) Bounded complexity justifies cooperation in the finitely repeated prisoners dilemma. Econ Lett 19(3):227–229. https://doi.org/10.1016/0165-1765(85)90026-6
Article Google Scholar
Neyman A (1998) Finitely repeated games with finite automata. Math Oper Res 23(3):513–552
Article Google Scholar
Neyman A (1999) Cooperation in repeated games when the number of stages is not commonly known. Econometrica 67(1):45–64. https://doi.org/10.1111/1468-0262.00003
Article Google Scholar
Normann H, Wallace B (2012) The impact of the termination rule on cooperation in a prisoners dilemma experiment. Int J Game Theory 41(3):707–718
Article Google Scholar
Oskamp S, Perlman D (1965) Factors affecting cooperation in a prisoners dilemma game. J Conflict Resolut 9(3):359–374
Article Google Scholar
Perea A (2012) Epistemic game theory: reasoning and choice. Cambridge University Press, New York
Book Google Scholar
Radner R (1980) Collusive behavior in noncooperative epsilon-equilibria of oligopolies with long but finite lives. J Econ Theory 22(2):136–154. https://doi.org/10.1016/0022-0531(80)90037-X
Article Google Scholar
Radner R (1986) Essays in Honor of Gerard Dereu, Amsterdam: North-Holland, chap Can Bounded Rationality Resolve the Prisoners’ Dilemma, pp 387–399
Samuelson L (1987) A note on uncertainty and cooperation in a finitely repeated prisoners dilemma. Int J Game Theory 16(3):187–195
Article Google Scholar
Selten R, Stoecker R (1986) End behavior in sequences of finite Prisoners Dilemma supergames A learning theory approach. J Econ Behav Org 7(1):47–70
Article Google Scholar

Download references

Acknowledgements

I thank my advisor, Paulo Barelli, for his guidance and encouragement. I also thank two anonymous referees, Marciano Siniscalchi, Edward Green, Srihari Govindan, Asen Kochov, Chih-Chun Yang, and Yu Awaya for helpful comments.

Author information

Authors and Affiliations

School of Economics, Sichuan University, Chengdu, Sichuan, China
Vi Cao

Authors

Vi Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vi Cao.

Ethics declarations

Conflict of interest

I have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proof for proposition 1

The proof is by induction. It is obvious that if $(s_1, t_1, s_2, t_2) \in R$, then $\xi ^M(s_1, s_2) = DD$. Fix $m \in \{ 1, \ldots , M - 1 \}$. Suppose $(s_1, t_1, s_2, t_2) \in R^m$ implies $\xi ^n(s_1, s_2) = DD$ for each $n \ge M - m + 1$. Since $R^{m+1} \subseteq R^m$, it suffices to show $(s_1, t_1, s_2, t_2) \in R^{m+1}$ implies $\xi ^{M - m}(s_1, s_2) = DD$. Let $\bar{n} \equiv M - m$ and $h^{\bar{n}}$ be the $\bar{n}$th-round history on the path induced by $(s_1, s_2)$. Since type $t_i$ strongly believes $R_{-i}^m$, at history $h^{\bar{n}}$, she assigns probability one to the event that the other player satisfies $R_{-i}^m$, and playing $s_i$ will lead to DD in each round $n \ge M - m + 1$. If $s_i(h^{\bar{n}}) = C$, then $s_i$ is strictly worse than some strategy $r_i \in S_i(h^{\bar{n}})$ that plays D at $h^{\bar{n}}$ and every nth-round history with $n > \bar{n}$. Hence, $(s_i, t_i) \in R_i^{m+1}$ implies $s_i(h^{\bar{n}}) = D$. It follows that $\xi ^{\bar{n}}(s_1, s_2) = DD$. By induction, for each $m \in \{ 1, \ldots , M \}$, if $(s_1, t_1, s_2, t_2) \in R^m$, then $\xi ^n(s_1, s_2) = DD$ for each $n \ge M - m + 1$. $\square $

Remark 5

We can use arguments similar to those above to obtain the following result: for each type structure ${\mathcal {T}}$, if $(s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})$ for some $(m_1, m_2) \in \{ 1, \ldots , M - 1 \}^2$ with $m_1 > m_2$, then (a) $\xi ^n(s_1, s_2) = DD$ for each $n \ge M - m_2 + 1$, and (b) $\xi ^{M - m_2}(s_1, s_2) \in \{ DD, DC \}$. Part (a) follows from Proposition 1. To show (b), let $\bar{n} \equiv M - m_2$ and $h^{\bar{n}}$ be the $\bar{n}$th-round history on the path induced by $(s_1, s_2)$. Since type $t_1$ strongly believes $R_2^{m_2}$, at history $h^{\bar{n}}$, she assigns probability one to the event that player 2 satisfies $R_2^{m_2}$, and playing $s_1$ will lead to DD in each round $n \ge M - m_2 + 1$. If $s_1(h^{\bar{n}}) = C$, then $s_i$ is strictly worse than some strategy $r_i \in S_i(h^{\bar{n}})$ that plays D at $h^{\bar{n}}$ and every nth-round history with $n > \bar{n}$. Hence, $(s_1, t_1) \in R_1^{m_1}$ implies $s_i(h^{\bar{n}}) = D$. It follows that $\xi ^{M - m_2}(s_1, s_2) \in \{ DD, DC \}$.

B Proof for Sect. 4

In this appendix, we show that $\beta _i(t_i^k)$ is a CPS and $(s_i^k, t_i^k) \in R_i$ for each $k \in \{1, \ldots , 5\}$.

Lemma 1

$\beta _i(t_i^k)$ is a CPS.

Proof

It is clear that $\beta _{i,h}(t_i^k)[S_{-i}(h) \times T_{-i}] = 1$ for each $h \in H$. In the following, we show that for each pair $\{ h, h^\prime \} \subset H$ such that $S_{-i}(h^\prime ) \times T_{-i} \subseteq S_{-i}(h) \times T_{-i}$ and each $E \subseteq S_{-i}(h^\prime ) \times T_{-i}$,

$$\begin{aligned} \beta _{i,h}(t_i^k)(E) = \beta _{i,h^\prime }(t_i^k)(E) \times \beta _{i,h}(t_i^k)[S_{-i}(h^\prime ) \times T_{-i}]. \end{aligned}$$

(1)

We show $\beta _i(t_i^1)$ satisfies Condition (1). If $\{h, h^\prime \} \subset H(s^G_{-i})$, then $\beta _{i,h}(t_i^1)(s^G_{-i}, t^1_{-i}) = \beta _{i,h^\prime }(t_i^1)(s^G_{-i}, t^1_{-i}) =\beta _{i,h}(t_i^1)[S_{-i}(h^\prime ) \times T_{-i}] = 1$, which implies (1). If $h \in H(s^G_{-i})$ and $h^\prime \not \in H(s^G_{-i})$, then $\beta _{i,h}(t_i^1)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (1). Suppose $\{ h, h^\prime \} \subset H {\setminus } H(s^G_{-i})$. Then $\beta _{i,h}(t_i^1)(s^h_{-i}, t^1_{-i}) = 1$ (where $s^h_{-i}$ allows h and defects at every history that does not precede h) and $\beta _{i,h^\prime }(t_i^1)(s^{h^\prime }_{-i}, t^1_{-i}) = 1$ (where $s^{h^\prime }_{-i}$ allows $h^\prime $ and defects at every history that does not precede $h^\prime $). If $h^\prime \in H(s^h_{-i})$, then $s^h_{-i}$ and $s^{h^\prime }_{-i}$ are identical, which implies (1). If $h^\prime \not \in H(s^h_{-i})$, then $\beta _{i,h}(t_i^1)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (1).

We show $\beta _i(t_i^2)$ satisfies Condition (1). By construction, $\beta _{i,h}(t_i^2)(\tilde{s}^h_{-i}, t^1_{-i}) = 1$ (where $\tilde{s}^h_{-i}$ allows h and has $\tilde{s}^h_{-i}(h^{\prime \prime }) = s_{-i}^F (h^{\prime \prime })$ for each history $h^{\prime \prime }$ that does not precede h) and $\beta _{i,h^\prime }(t_i^2)(\tilde{s}^{h^\prime }_{-i}, t^1_{-i}) = 1$ (where $\tilde{s}^{h^\prime }_{-i}$ allows $h^\prime $ and has $\tilde{s}^{h^\prime }_{-i}(h^{\prime \prime }) = s_{-i}^F (h^{\prime \prime })$ for each history $h^{\prime \prime }$ that does not precede $h^\prime $). If $h^\prime \in H(\tilde{s}^h_{-i})$, then $\tilde{s}^h_{-i}$ and $\tilde{s}^{h^\prime }_{-i}$ are identical, which implies (1). If $h^\prime \not \in H(\tilde{s}^h_{-i})$, then $\beta _{i,h}(t_i^2)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (1).

We show $\beta _i(t_i^3)$ satisfies Condition (1). If $\{ h, h^\prime \} \subset H(s^G_{-i})$, then $\beta _{i,h}(t_i^3)(s^G_{-i}, t^1_{-i}) =\beta _{i,h^\prime }(t_i^3)(s^G_{-i}, t^1_{-i}) =\beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 1$, which implies (1). If $h \in H(s^G_{-i})$ and $h^\prime \not \in H(s^G_{-i})$, then $\beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (1). Suppose $\{ h, h^\prime \} \subset H {\setminus } H(s^G_{-i})$. Then $\beta _{i,h}(t_i^3)(\tilde{s}^h_{-i}, t^1_{-i}) = 1$ and $\beta _{i,h^\prime }(t_i^3)(\tilde{s}^{h^\prime }_{-i}, t^1_{-i}) = 1$. If $h^\prime \in H(\tilde{s}^h_{-i})$, then $\tilde{s}^h_{-i}$ and $\tilde{s}^{h^\prime }_{-i}$ are identical, which implies (1); otherwise, $\beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (1).

We show $\beta _i(t_i^4)$ satisfies (1). Let $h \in H(s^3_{-i})$. If $h^\prime \in H(s^3_{-i})$, then $\beta _{i,h}(t_i^4)(s^3_{-i}, t^3_{-i}) = \beta _{i,h^\prime }(t_i^4)(s^3_{-i}, t^3_{-i}) = \beta _{i,h}(t_i^4)[S_{-i}(h^\prime ) \times T_{-i}] = 1$, which implies (1). If $h^\prime \not \in H(s^3_{-i})$, then $\beta _{i,h}(t_i^4)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (1). Let $h = (a_iD)$. If $h^\prime = (a_i^\prime D, a_i^{\prime \prime } D)$, then $\beta _{i,h}(t_i^4)(s^4_{-i}, t^4_{-i}) = \beta _{i,h^\prime }(t_i^4)(s^4_{-i}, t^4_{-i}) = \beta _{i,h}(t_i^4)[S_{-i}(h^\prime ) \times T_{-i}] = 1$, which implies (1). If $h^\prime = (a_i^\prime D, a_i^{\prime \prime } C)$, then $\beta _{i,h}(t_i^4)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (1).

We show $\beta _i(t_i^5)$ satisfies (1). Let $h \in H(s^1_{-i})$. If $h^\prime \in H(s^1_{-i})$, then $\beta _{i,h}(t_i^5)(s^1_{-i}, t^1_{-i}) = \beta _{i,h^\prime }(t_i^5)(s^1_{-i}, t^1_{-i}) = \beta _{i,h}(t_i^5)[S_{-i}(h^\prime ) \times T_{-i}] = 1$, which implies (1). If $h^\prime \not \in H(s^1_{-i})$, then $\beta _{i,h}(t_i^5)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (1). Let $h = (a_iD)$. If $h^\prime = (a_i^\prime D, a_i^{\prime \prime } D)$, then $\beta _{i,h}(t_i^5)(s^4_{-i}, t^4_{-i}) = \beta _{i,h^\prime }(t_i^5)(s^4_{-i}, t^4_{-i}) = \beta _{i,h}(t_i^5)[S_{-i}(h^\prime ) \times T_{-i}] = 1$, which implies (1). If $h^\prime = (a_i^\prime D, a_i^{\prime \prime } C)$, then $\beta _{i,h}(t_i^5)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (1). $\square $

Lemma 2

$(s_i^k, t_i^k) \in R_i$.

Proof

It suffices to show that for each $h \in H(s_i^k) {\setminus } A^2$, type $t_i^k$’s expected payoff from playing $s_i^k$ is weakly higher than her expected payoff from playing some $s_i \in S_i(h)$ that defects at every third-round history. Fix a row in some Table k below. Conditional on the history in column 1, type $t_i^k$’s expected outcome and payoff from playing $s_i^k$ are given in column 2, type $t_i^k$’s expected outcomes and payoffs from playing strategies in $\{s_i \in S_i(h) ~|~ s_i(h^\prime ) = D$ if $h^\prime \in A^2 \} {\setminus } \{s_i^k\}$ are given in column 3. It is clear that the expected payoff in column 2 is weakly higher than the expected payoffs in column 3. $\square $

Table 1 Expected payoffs for type $t_i^1$

Full size table

Table 2 Expected payoffs for type $t_i^2$

Full size table

Table 3 Expected payoffs for type $t_i^3$

Full size table

Table 4 Expected payoffs for type $t_i^4$

Full size table

Table 5 Expected payoffs for type $t_i^5$

Full size table

C Proof for Theorem 1

Part (i) of Theorem 1 follows from Lemmas 3, 4, and 5. Part (ii) of Theorem 1 follows from part (b) in the proof for Lemma 5.

Lemma 3

$\beta _i[g_i(m_i, \varvec{a})]$ is a CPS.

Proof

To show that $\beta _{i,h}[g_i(m_i, \varvec{a})] [S_{-i}(h) \times T_{-i}] = 1$, it suffices to show $f_{-i} [\eta _{-i}(m_i,\varvec{a}, h)] \in S_{-i}(h)$. If $h = (a^0, \ldots , a^n)$ for some $n \le M - m_i$, then h is the initial subsequence of $\eta ^2_{-i}(m_i, \varvec{a}, h)$, which implies $f_{-i} [\eta _{-i}(m_i,\varvec{a}, h)] \in S_{-i}(h)$. If h is inconsistent with $\varvec{a}$ and is allowed by $f_{-i} [\eta _{-i}(m_i, \varvec{a}, h^*)]$ (where $h^*$ is the longest common predecessor of h and $\varvec{a}$), then $f_{-i} [\eta _{-i}(m_i,\varvec{a}, h)] = f_{-i} [\eta _{-i}(m_i,\varvec{a}, h^*)] \in S_{-i}(h)$. Suppose either (a) $h = (a^0, \ldots , a^n)$ for some $n > M - m_i$ or (b) h is inconsistent with $\varvec{a}$ and is not allowed by $f_{-i} [\eta _{-i}(m_i, \varvec{a}, h^*)]$. If $h = (a^0)$, then it is obvious that $f_{-i} [\eta _{-i}(m_i,\varvec{a}, h)] \in S_{-i}(h)$. If $a_{-i}^n = C$, then h is the initial subsequence of $\eta ^2_{-i}(m_i, \varvec{a}, h)$, which implies $f_{-i} [\eta _{-i}(m_i,\varvec{a}, h)] \in S_{-i}(h)$. If $a_{-i}^n = D$, then $(a^0, \ldots , a^{\tilde{n}-1})$ is the initial subsequence of $\eta ^2_{-i}(m_i, \varvec{a}, h)$ (where $\tilde{n} \le n$ is the round in which player $-i$ starts defecting continually) and $f_{-i} [\eta _{-i}(m_i,\varvec{a}, h)](a^0, \ldots , a^{l-1}) = D = a_{-i}^l$ for each $l \in \{ \tilde{n}, \ldots , n \}$, which implies $f_{-i} [\eta _{-i}(m_i,\varvec{a}, h)] \in S_{-i}(h)$.

It remains to show that for each pair $\{\bar{h}, \hat{h}\} \subset H$ such that $S_{-i}(\hat{h}) \times T_{-i} \subseteq S_{-i}(\bar{h}) \times T_{-i}$ and each $E \subseteq S_{-i}(\hat{h}) \times T_{-i}$,

$$\begin{aligned} \beta _{i, \bar{h}}[g_i(m_i, \varvec{a})](E) = \beta _{i, \hat{h}}[g_i(m_i, \varvec{a})](E) \times \beta _{i, \bar{h}}[g_i(m_i, \varvec{a})][S_{-i}(\hat{h}) \times T_{-i}]. \end{aligned}$$

(2)

Denote $\bar{h} \equiv (\bar{a}^0, \bar{a}^1, \ldots , \bar{a}^{\bar{n}})$ and $\hat{h} \equiv (\hat{a}^0, \hat{a}^1, \ldots , \hat{a}^{\hat{n}})$. If $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})] \not \in S_{-i}(\hat{h})$, then $\beta _{i, \bar{h}}[g_i(m_i, \varvec{a})][S_{-i}(\hat{h}) \times T_{-i}] =0$, which implies (2). Suppose $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})] \in S_{-i}(\hat{h})$. Then $\beta _{i, \bar{h}}[g_i(m_i, \varvec{a})][S_{-i}(\hat{h}) \times T_{-i}] =1$. In the following, we will show $\eta _{-i}(m_i, \varvec{a}, \bar{h}) = \eta _{-i}(m_i, \varvec{a}, \hat{h})$, which implies (2).

Case 1. Suppose $\bar{h} = (a^0, \ldots , a^{\bar{n}})$ for some $\bar{n} \le M - m_i$. Let $\eta _{-i}^2(m_i, \varvec{a}, \bar{h}) = (\tilde{a}^1, ..., \tilde{a}^M)$. If $\tilde{a}^n \not = a^n$ for some $n \le M - m_i + 1$, let $l \equiv \mathrm{max} \{ n ~|~ \tilde{a}^{\tilde{n}} = a^{\tilde{n}}$ for each $\tilde{n} \le n \}$ be the round after which $\varvec{a}$ and $(\tilde{a}^1, ..., \tilde{a}^M)$ start to diverge; otherwise, let $l = M - m_i + 1$. Define $h^*$ as follows: if $\hat{h}$ is consistent with $\varvec{a}$, let $h^* = \hat{h}$; otherwise, let $h^*$ be the longest common predecessor of $\hat{h}$ and $\varvec{a}$. Denote $h^* \equiv (a^0, \ldots , a^{n^*})$.

Suppose $\bar{n} \le n^* \le l$. By construction, $\eta _{-i}(m_i, \varvec{a}, h^*) = \eta _{-i}(m_i, \varvec{a}, \bar{h})$. If $\hat{h}$ is consistent with $\varvec{a}$, then $\hat{h} = h^*$, which implies $\eta _{-i}(m_i, \varvec{a}, \hat{h}) = \eta _{-i}(m_i, \varvec{a}, \bar{h})$. If $\hat{h}$ is inconsistent with $\varvec{a}$, then $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})] \in S_{-i}(\hat{h})$ implies $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h^*)] \in S_{-i}(\hat{h})$; by construction, $\eta _{-i}(m_i, \varvec{a}, \hat{h}) = \eta _{-i}(m_i, \varvec{a}, h^*) = \eta _{-i}(m_i, \varvec{a}, \bar{h})$.

Suppose $n^* > l$ (which implies $\hat{a}_{-i}^{l+1} = a_{-i}^{l+1}$). First, we show that $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})] \in S_{-i}(\hat{h})$ implies $l = M - m_i + 1$. Suppose $l < M - m_i + 1$. By construction, $a_{-i}^{l+1} = D$, which implies $\hat{a}_{-i}^{l+1} = D$. Note that $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})](a^0, \ldots , a^{l+1}) = C \not = \hat{a}^{l+1}_{-i}$, which contradicts $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})] \in S_{-i}(\hat{h})$. Hence, $l = M - m_i + 1$. Note that $l +1 \le M$ implies $m_i \ge 2$. Hence, $\eta _{-i} (m_i, \varvec{a}, \bar{h}) = (m_i - 1, a^1, \ldots , a^l, DD, \ldots , DD)$. Since $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})] \in S_{-i}(\hat{h})$, we have $\hat{a}_{-i}^n = D$ for each $n > l$, which implies $a^n_{-i} = D$ for each $n = l + 1, \ldots , n^*$. It follows that for $h^*$, player $-i$ starts defecting continually in round $l+1$; hence, by construction, $\eta _{-i}(m_i, \varvec{a}, h^*) = (M - (l+1) + 1, a^1, \ldots , a^l, DD, \ldots , DD) = \eta _{-i}(m_i, \varvec{a}, \bar{h})$. Applying the arguments in the preceding paragraph gives $\eta _{-i}(m_i, \varvec{a}, \hat{h}) = \eta _{-i}(m_i, \varvec{a}, \bar{h})$.

Case 2. Suppose $\bar{h} = (a^0, \ldots , a^{\bar{n}})$ for some $\bar{n} > M - m_i$. Define $\tilde{n}$ as follows: if $\bar{h} = (a^0)$ or $a^{\bar{n}}_{-i} = C$, let $\tilde{n} = \bar{n} + 1$; otherwise, let $\tilde{n}$ be the round in which player $-i$ starts defecting continually. Define $h^*$ as follows: if $\hat{h}$ is consistent with $\varvec{a}$, let $h^* = \hat{h}$; otherwise, let $h^*$ be the longest common predecessor of $\hat{h}$ and $\varvec{a}$. Denote $h^* \equiv (a^0, \ldots , a^{n^*})$. Without loss of generality, assume $n^* > \bar{n}$. Since $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})] \in S_{-i}(\hat{h})$, we have $\hat{a}^n_{-i} = D$ for each $n > \bar{n}$, which implies $a_{-i}^n = D$ for each $n = \bar{n} + 1, \ldots , n^*$. It follows that for $h^*$, player $-i$ starts defecting continually in round $\tilde{n}$; hence, by construction, $\eta _{-i}(m_i, \varvec{a}, h^*) = \eta _{-i}(m_i, \varvec{a}, \bar{h})$. If $\hat{h}$ is consistent with $\varvec{a}$, then $\hat{h} = h^*$, which implies $\eta _{-i}(m_i, \varvec{a}, \hat{h}) = \eta _{-i}(m_i, \varvec{a}, \bar{h})$. If $\hat{h}$ is inconsistent with $\varvec{a}$, then $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})] \in S_{-i}(\hat{h})$ implies $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h^*)] \in S_{-i}(\hat{h})$; by construction, $\eta _{-i}(m_i, \varvec{a}, \hat{h}) = \eta _{-i}(m_i, \varvec{a}, h^*) = \eta _{-i}(m_i, \varvec{a}, \bar{h})$.

Case 3. Suppose $\bar{h} = (\bar{a}^1, \ldots , \bar{a}^{\bar{n}})$ is inconsistent with $\varvec{a}$. Let $\bar{h}^* \equiv (a^0, \ldots , a^{\bar{n}^*})$ be the longest common predecessor of $\bar{h}$ and $\varvec{a}$. Suppose $\hat{h}$ is consistent with $\varvec{a}$. Then $S_{-i}(\hat{h}) \subseteq S_{-i}(\bar{h})$ implies $\bar{h} = (a^0, \ldots , a^{\bar{n} - 1}, \bar{a}^{\bar{n}})$ with $\bar{a}^{\bar{n}}_{-i} = a^{\bar{n}}_{-i}$ and $\hat{h} = (a^1, \ldots , a^{\hat{n}})$. Note that $\bar{h}^* = (a^0, \ldots , a^{\bar{n}})$ and $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h}^*)] \in S_{-i}(\bar{h})$; hence, by construction, $\eta _{-i}(m_i, \varvec{a}, \bar{h}) = \eta _{-i}(m_i, \varvec{a}, \bar{h}^*)$. If $\hat{h} = \bar{h}^*$, then $\eta _{-i}(m_i, \varvec{a}, \bar{h}) = \eta _{-i}(m_i, \varvec{a}, \hat{h})$ is immediate. If $\hat{h} \not = \bar{h}^*$, then $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h}^*)] \in S_{-i} (\hat{h})$ implies $\eta _{-i}(m_i, \varvec{a}, \hat{h}) = \eta _{-i}(m_i, \varvec{a}, \bar{h}^*)$ (by Case 1 and Case 2), which implies $\eta _{-i}(m_i, \varvec{a}, \bar{h}) = \eta _{-i}(m_i, \varvec{a}, \hat{h})$. In the following, we assume $\hat{h}$ is inconsistent with $\varvec{a}$. Let $\hat{h}^*$ be the longest common predecessor of $\hat{h}$ and $\varvec{a}$.

If $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h}^*)] \in S_{-i}(\bar{h})$, then $\eta _{-i}(m_i, \varvec{a}, \bar{h}) =\eta _{-i}(m_i, \varvec{a}, \bar{h}^*)$. It follows from $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})] \in S_{-i}(\hat{h})$ that $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h}^*)] \in S_{-i}(\hat{h}^*)$. By Cases 1 and 2, we have $\eta _{-i}(m_i, \varvec{a}, \hat{h}^*) = \eta _{-i}(m_i, \varvec{a}, \bar{h}^*)$, which implies $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \hat{h}^*)] \in S_{-i}(\hat{h})$; hence, by construction, $\eta _{-i}(m_i, \varvec{a}, \hat{h}) = \eta _{-i}(m_i, \varvec{a}, \hat{h}^*) = \eta _{-i}(m_i, \varvec{a}, \bar{h}^*) = \eta _{-i}(m_i, \varvec{a}, \bar{h})$.

If $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h}^*)] \not \in S_{-i}(\bar{h})$, then $\bar{n}^* < \bar{n}$. Suppose $\bar{a}^{\bar{n}^*} = a^{\bar{n}^*}$. Since $\bar{h}^*$ is the longest common predecessor of $\bar{h}$ and $\varvec{a}$, we have $\bar{a}_{-i}^{\bar{n}^* + 1} \not = a_{-i}^{\bar{n}^* + 1}$. Then $S_{-i}(\hat{h}) \subseteq S_{-i}(\bar{h})$ implies $\hat{a}_{-i}^{\bar{n}^* + 1} = \bar{a}_{-i}^{\bar{n}^* + 1} \not = a_{-i}^{\bar{n}^* + 1}$ (note that $\bar{n}^* + 1 \le \bar{n}$). Since $\hat{a}_{-i}^{\bar{n}^* + 1} \not = a_{-i}^{\bar{n}^* + 1}$, the longest common predecessor of $\hat{h}$ and $\varvec{a}$ is $\hat{h}^* = (a^0, \ldots , a^{\bar{n}^*}) = \bar{h}^*$. Suppose $\bar{a}^{\bar{n}^*} \not = a^{\bar{n}^*}$. Since $\bar{h}^* = (a^0, \ldots , a^{\bar{n}^*})$ is the longest common predecessor of $\bar{h}$ and $\varvec{a}$, we have $\bar{a}_{-i}^{\bar{n}^*} =a_{-i}^{\bar{n}^*}$ and $\bar{a}_i^{\bar{n}^*} \not = a_i^{\bar{n}^*}$. Then $S_{-i}(\hat{h}) \subseteq S_{-i}(\bar{h})$ implies $\hat{a}_i^{\bar{n}^*} = \bar{a}_i^{\bar{n}^*} \not = a_i^{\bar{n}^*}$ (note that $\bar{n}^* < \bar{n}$). Since $\hat{a}_i^{\bar{n}^*} \not = a_i^{\bar{n}^*}$, the longest common predecessor of $\hat{h}$ and $\varvec{a}$ is $\hat{h}^* = (a^0, \ldots , a^{\bar{n}^*}) = \bar{h}^*$. Since $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h}^*)] \not \in S_{-i}(\bar{h})$ and $S_{-i}(\hat{h}) \subseteq S_{-i}(\bar{h})$ and $\hat{h}^* = \bar{h}^*$, we have $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \hat{h}^*)] \not \in S_{-i}(\hat{h})$. For $\bar{h}$, without loss of generality, assume $\bar{a}_{-i}^{\bar{n}} = D$ and let $\tilde{n}$ be the round in which player $-i$ starts defecting continually. Since $\bar{h}$ is inconsistent with $\varvec{a}$ and $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h}^*)] \not \in S_{-i}(\bar{h})$, we have $\eta _{-i}(m_i, \varvec{a}, \bar{h}) = (M - \tilde{n} + 1, a^0, \ldots , a^{\tilde{n} - 1}, DD, \ldots , DD)$. Since $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \bar{h})] \in S_{-i}(\hat{h})$, we have $\hat{a}_{-i}^n = D$ for each $n = \tilde{n}, \ldots , \hat{n}$. Since $\hat{h}$ is inconsistent with $\varvec{a}$ and $f_{-i}[\eta _{-i}(m_i, \varvec{a}, \hat{h}^*)] \not \in S_{-i}(\hat{h})$, we have $\eta _{-i}(m_i, \varvec{a}, \hat{h}) = (M - \tilde{n} + 1, a^0, \ldots , a^{\tilde{n} - 1}, DD, \ldots , DD) = \eta _{-i}(m_i, \varvec{a}, \bar{h})$. $\square $

Lemma 4

$[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] \in R_i$ for $m_i \ge 1$.

Proof

Fix a history h that is allowed by strategy $f_i(m_i, \varvec{a})$. By construction, at history h, type $g_i(m_i, \varvec{a})$ believes that player $-i$ is playing strategy $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)]$.

Case 1. Suppose $h = (a^0, \ldots , a^n)$ for some $n \le M - m_i$. By construction, $\eta ^2_{-i}(m_i, \varvec{a}, h) \equiv (\bar{a}^1, \ldots , \bar{a}^M)$ is the path of play induced by two strategies $f_i(m_i, \varvec{a})$ and $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)]$. Player i’s expected payoff from playing $f_i(m_i, \varvec{a})$ is

$$\begin{aligned} U_{i,h}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] =\sum _{n^\prime = 0}^{M-m_i} u_i (\bar{a}^{n^\prime }) + u_i(DC) + (m_i - 1)u_i(DD). \end{aligned}$$

Let $r_i \in S_i(h)$ and let $(\bar{a}^0, \ldots , \bar{a}^n, \tilde{a}^{n+1}, \ldots , \tilde{a}^M)$ be the path of play induced by $r_i$ and $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)]$. Without loss, suppose $\tilde{a}^{n^\prime } \not = \bar{a}^{n^\prime }$ for some ${n^\prime } > n$. Let $\tilde{n} \equiv \mathrm{min} \{n^\prime \in \{n+1, \ldots , M\} ~|~ \tilde{a}^{n^\prime } \not = \bar{a}^{n^\prime } \}$ be the round at which the two paths $(\bar{a}^0, \ldots , \bar{a}^M)$ and $(\bar{a}^0, \ldots , \bar{a}^n, \tilde{a}^{n+1}, \ldots , \tilde{a}^M)$ start to diverge. Since $\bar{a}_{-i}^{n^\prime } = C$ for each $n^\prime = n+1, \ldots , M - m_i + 1$, if $\tilde{n} \le M - m_i$, then $\bar{a}^{\tilde{n}} \in \{ CC, DC \}$. Also note that $\tilde{a}_i^{\tilde{n}} \not = \bar{a}_i^{\tilde{n}}$ and $\tilde{a}_{-i}^{\tilde{n}} = \bar{a}_{-i}^{\tilde{n}}$.

Suppose $\tilde{n} \le M - m_i$ and $\bar{a}^{\tilde{n}} = CC$. Then $\tilde{a}^{\tilde{n}} = DC$. Consequently, by construction of strategy $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)]$, we have $\tilde{a}_{-i}^{n^\prime } = D$ for each $n^\prime > \tilde{n}$, which implies $u_i(\tilde{a}^{n^\prime }) \le u_i(DD)$ for each $n^\prime > \tilde{n}$. Hence, $U_{i,h}[r_i, g_i(m_i, \varvec{a})] \le \sum _{n^\prime = 0}^{\tilde{n}-1} u_i (\bar{a}^{n^\prime }) + u_i(DC) + (M - \tilde{n}) u_i(DD)$. Since $\bar{a}_{-i}^{n^\prime } = C$ for each $n^\prime = n + 1, \ldots , M-m_i+1$, we have $u_i(\bar{a}^{n^\prime }) > u_i(DD)$ for each $n^\prime = n + 1, \ldots , M-m_i+1$, which implies $U_{i,h}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] - U_{i,h}[r_i, g_i(m_i, \varvec{a})] \ge \sum _{n^\prime = \tilde{n}}^{M - m_i} u_i(\bar{a}^{n^\prime }) - (M - m_i - \tilde{n}+1) u_i(DD) > 0$.

Suppose $\tilde{n} = M - m_i + 1$. Then $\bar{a}^{\tilde{n}} = DC$ implies $\tilde{a}^{\tilde{n}} = CC$. In addition, by construction of strategy $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)]$, we have $\tilde{a}_{-i}^{n^\prime } = D$ for each $n^\prime > M - m_i + 2$, which implies $u_i(\tilde{a}^{n^\prime }) \le u_i(DD)$ for each $n^\prime > M - m_i + 2$. It follows that $U_{i,h}[r_i, g_i(m_i, \varvec{a})] \le \sum _{n^\prime = 0}^{M - m_i} u_i (\bar{a}^{n^\prime }) + u_i(CC) + (m_i - 1) u_i(DD) < U_{i,h}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})]$.

Suppose $\tilde{n} \ge M - m_i + 2$. Then $\bar{a}^{\tilde{n}} =DD$ implies $\tilde{a}^{\tilde{n}} = CD$. In addition, by construction of strategy $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)]$, we have $\tilde{a}_{-i}^{n^\prime } = D$ for each $n^\prime > \tilde{n} + 1$, which implies $u_i(\tilde{a}^{n^\prime }) \le u_i(DD)$ for each $n^\prime > \tilde{n} + 1$. Hence, $U_{i,h}[r_i, g_i(m_i, \varvec{a})] \le \sum _{n^\prime = 0}^{\tilde{n}-1} u_i (\bar{a}^{n^\prime }) + u_i(CD) + (M - \tilde{n}) u_i(DD) < U_{i,h}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})]$.

Suppose $\tilde{n} \le M - m_i$ and $\bar{a}^{\tilde{n}} = DC$. Then $\tilde{a}^{\tilde{n}} = CC$, which implies strategy $r_i$ allows history $h^\prime \equiv (\bar{a}^0, \ldots , \bar{a}^{\tilde{n}-1}, CC)$. Let $\hat{r}_i \in S_i(h^\prime )$ be such that the path induced by $\hat{r}_i$ and $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)]$ is $(\bar{a}^0, \ldots , \bar{a}^{\tilde{n}-1}, \hat{a}^{\tilde{n}}, \ldots , \hat{a}^M)$, where $\hat{a}^{n^\prime } = CC$ for each $n^\prime = \tilde{n}, \ldots , M - m_i$, $\hat{a}^{M - m_i + 1} = DC$, and $\hat{a}^{n^\prime } = DD$ for each $n^\prime \ge M - m_i + 2$. Let $\hat{n} \equiv \mathrm{min} \{n^\prime \in \{\tilde{n} + 1, \ldots , M \} ~|~ \tilde{a}^{n^\prime } \not = \hat{a}^{n^\prime } \}$ be the round at which the two paths $(\bar{a}^0, \ldots , \bar{a}^{\tilde{n}-1}, \tilde{a}^{\tilde{n}}, \ldots , \tilde{a}^M)$ and $(\bar{a}^0, \ldots , \bar{a}^{\tilde{n}-1}, \hat{a}^{\tilde{n}}, \ldots , \hat{a}^M)$ start to diverge. We have either (a) $\hat{n} \le M - m_i$ and $\hat{a}^{\hat{n}} = CC$ or (b) $\hat{n} \ge M - m_i + 1$. Hence, our arguments above show that for each $r_i \in S_i(h^\prime )$, we have $U_{i,h}[r_i, g_i(m_i, \varvec{a})] \le \sum _{n^\prime =0}^{\tilde{n}-1} u_i (\bar{a}^{n^\prime }) + (M - m_i - \tilde{n} + 1) u_i(CC) + u_i(DC) + (m_i - 1) u_i(DD)$. We have $u_i(CC) < u_i(DC) = u_i(\bar{a}^{\tilde{n}})$. In addition, for each $n^\prime = \tilde{n}+1, \ldots , M - m_i$, we have $\bar{a}_{-i}^{n^\prime } = C$, which implies $u_i(CC) \le u_i(\bar{a}^{n^\prime })$. Hence, $U_{i,h}[r_i, g_i(m_i, \varvec{a})] < U_{i,h}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})]$.

Case 2. Suppose $h = (a^0, \ldots , a^n)$ for some $n \ge M - m_i + 1$. By construction of strategies $f_i(m_i, \varvec{a})$ and $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)]$, player i’s expected payoff from playing $f_i(m_i, \varvec{a})$ is

$$\begin{aligned} U_{i,h}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] =\sum _{n^\prime =1}^{n} u_i (a^{n^\prime }) + (M - n) u_i(DD). \end{aligned}$$

Let $r_i \in S_i(h)$ and let $(a^1, \ldots , a^n, \tilde{a}^{n+1}, \ldots , \tilde{a}^M)$ be the path of play induced by $r_i$ and $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)]$. For each $n^\prime =n+1, \ldots , M$, since $\tilde{a}_{-i}^{n^\prime } = D$, we have $u_i(\tilde{a}^{n^\prime }) \le u_i(DD)$. It follows that $U_{i,h}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] \ge U_{i,h}[r_i, g_i(m_i, \varvec{a})]$.

Case 3. Suppose $h = (\hat{a}^1, \ldots , \hat{a}^n)$ is inconsistent with $\varvec{a}$. Let $h^*$ be the longest common predecessor of h and $\varvec{a}$. If h is allowed by $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h^*)]$, then $g_i(m_i, \varvec{a})$ believes that her opponent is $\eta _{-i}(m_i, \varvec{a}, h^*)$ conditional on h, which implies $U_{i,h}[s_i, g_i(m_i, \varvec{a})] = U_{i,h^*}[s_i, g_i(m_i, \varvec{a})]$ for each $s_i \in S_i$. Fix $r_i \in S_i(h)$ and note that $r_i \in S_i(h^*)$. It follows from Case 1 and Case 2 that $U_{i,h^*}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] \ge U_{i,h^*}[r_i, g_i(m_i, \varvec{a})]$, which implies $U_{i,h}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] \ge U_{i,h}[r_i, g_i(m_i, \varvec{a})]$. In the following, we assume h is not allowed by $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h^*)]$. Let $h^{n^\prime }$ be an $n^\prime $-round history such that $n^\prime \ge n + 1$ and h is the initial sub-sequence of $h^{n^\prime }$. It is easy to show that $f_i (m_i, \varvec{a})(h^{n^\prime }) = f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)](h^{n^\prime }) = D$ (for each history $h^\prime $ inconsistent with $\varvec{a}$, we have $f_i (m_i, \varvec{a})(h^\prime ) = C$ only if $h^\prime $ satisfies certain conditions as specified in Section 5.1, but such $h^\prime $ is allowed by $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h^*)]$). Hence,

$$\begin{aligned} U_{i,h}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] =\sum _{n^\prime = 1}^{n} u_i (\hat{a}^n) + (M - n)u_i(DD). \end{aligned}$$

Let $r_i \in S_i(h)$ and let $(\hat{a}^1, \ldots , \hat{a}^n, \tilde{a}^{n+1}, \ldots , \tilde{a}^M)$ be the path of play induced by $r_i$ and $f_{-i}[\eta _{-i}(m_i, \varvec{a}, h)]$. For each $n^\prime = n+1, \ldots , M$, since $\tilde{a}_{-i}^{n^\prime } = D$, we have $u_i(\tilde{a}^{n^\prime }) \le u_i(DD)$. It follows that $U_{i,h}[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] \ge U_{i,h}[r_i, g_i(m_i, \varvec{a})]$. $\square $

Lemma 5

$[f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] \in R_i^{m_i}$ for $m_i \ge 2$.

Proof

It suffices to prove the following: for each $m_i \in {\mathbb {N}}_+$,

(a)
$[f_i(m^\prime _i, \varvec{a}), g_i(m^\prime _i, \varvec{a})] \in R_i^{m_i}$ for each $m_i^\prime \ge m_i$ and each $(m^\prime _i, \varvec{a}) \in \tilde{A}_i$,
(b)
for each $s_i \in S_i$, if $s_i(h^n) = C$ for some n-round history $h^n \in H(s_i)$ with $n \ge M - m_i + 1$, then $[\{s_i\} \times T_i] \cap R_i^{m_i} = \emptyset $.

The proof is by induction. It is clear that (a) and (b) hold for $m_i = 1$ (see Lemma 4). Fix $m_i \ge 1$. Suppose (a) and (b) hold for each $\tilde{m}_i = 1, \ldots , m_i$. In the following, we show (a) and (b) hold for $m_i + 1$.

To show (a), fix some $m_i^\prime \ge m_i + 1$ and some $(m^\prime _i, \varvec{a}) \in \tilde{A}_i$. Since (a) holds for $m_i$, we have $[f_i(m^\prime _i, \varvec{a}), g_i(m^\prime _i, \varvec{a})] \in R_i^{m_i}$. It is left to show $[f_i(m^\prime _i, \varvec{a}), g_i(m^\prime _i, \varvec{a})] \in SB_i(R_{-i}^{m_i})$: for each $h \in H$, if $[S_{-i}(h) \times T_{-i}] \cap R_{-i}^{m_i} \not = \emptyset $, then $\beta _{i,h}[g_i(m^\prime _i, \varvec{a})](R_{-i}^{m_i}) = 1$. Let $h \equiv (\tilde{a}^0, \ldots , \tilde{a}^n)$ be such that either (i) $n \le M - m_i$ or (ii) $n \ge M - m_i + 1$ and $h \equiv (\tilde{a}^0)$ or (iii) $n \ge M - m_i + 1$ and $\tilde{a}^{n^\prime }_{-i} = D$ for each $n^\prime \ge M - m_i + 1$. It is easy to verify that $\eta _{-i}(m^\prime _i, \varvec{a}, h) \in \tilde{A}_{-i}$ and can be written as $(m^\prime _{-i}, \varvec{a^\prime })$ for some $m^\prime _{-i} \ge m_i$, which implies $\big ( f_{-i}[\eta _{-i}(m^\prime _i, \varvec{a}, h)], g_{-i}[\eta _{-i}(m^\prime _i, \varvec{a}, h)] \big ) \in R_{-i}^{m_i}$ (since (a) holds for player $-i$ and $m_i$). By construction, $\beta _{i,h}[g_i(m^\prime _i, \varvec{a})](R_{-i}^{m_i}) = 1$. Now let $h \equiv (\tilde{a}^0, \ldots , \tilde{a}^n)$ be such that $n \ge M - m_i + 1$ and $\tilde{a}^{n^\prime }_{-i} = C$ for some $n^\prime \ge M - m_i + 1$. For each $s_{-i} \in S_{-i}(h)$, we have $s_{-i}(\tilde{a}^1, \ldots , \tilde{a}^{n^\prime - 1}) = C$; hence, $[S_{-i}(h) \times T_{-i}] \cap R^{m_i}_{-i} = \emptyset $ (since (b) holds for $m_i$).

To show (b), let $(s_i, t_i) \in R_i$ be such that $s_i(h^n) = C$ for some $h^n = (\tilde{a}^0, \ldots , \tilde{a}^{n-1}) \in H(s_i)$ with $n \ge M - m_i$. By construction, $\beta _{i, h^n}(t_i)(s_{-i}, t_{-i}) = 1$, where $s_{-i}(h^{n+1}) = C$ for some $h^{n+1} \in H(s_{-i})$. Then applying (b) for $M - n \le m_i$ gives $(s_{-i}, t_{-i}) \not \in R_{-i}^{M-n}$. It follows that $\beta _{i, h^n}(t_i)(R_{-i}^{M-n}) = 0$ $(*)$. Let $\varvec{\tilde{a}} =(\tilde{a}^0, \ldots , \tilde{a}^{n-1}, \tilde{a}^n, \ldots , \tilde{a}^M)$, where $\tilde{a}^n = CC$ and $\tilde{a}^{n^\prime } =DD$ for each $n^\prime \ge n + 1$. Note that $(M - n, \varvec{\tilde{a}}) \in \tilde{A}_{-i}$ and $f_{-i}(M-n, \varvec{\tilde{a}}) \in S_{-i}(h^n)$. Applying (a) on $M-n \le m_i$ gives $[f_{-i}(M-n, \varvec{\tilde{a}}),g_{-i}(M-n, \varvec{\tilde{a}})] \in R_{-i}^{M-n}$. Hence, $[S_{-i}(h^n) \times T_{-i}] \cap R_{-i}^{M-n} \not = \emptyset $ $(**)$. It follows from $(*)$ and $(**)$ that $(s_i, t_i) \not \in SB_i(R_{-i}^{M-n})$, which implies $(s_i, t_i) \not \in R_i^{M-n+1}$. Since $M-n+1 \le m_i +1$, we have $(s_i, t_i) \not \in R_i^{m_i + 1}$. $\square $

D Proof for Sect. 6

Proof for Proposition 2. The proof is by induction.

Step 1. For each $j \in \{1, 2 \}$, the set of histories consistent with $R^0_j ({\mathcal {T}})$ and the set of histories consistent with $R^0_j ({\mathcal {T}}^\prime )$ are identical (they both are the set of non-terminal histories). Since ${\mathcal {T}}$ is sufficiently rich, any history consistent with $R^1_i ({\mathcal {T}}^\prime )$ is also consistent with $R^1_i ({\mathcal {T}})$. Since any epistemic type in ${\mathcal {T}}$ is contained in ${\mathcal {T}}^\prime $, we have $R^1_i ({\mathcal {T}}) \subseteq R^1_i ({\mathcal {T}}^\prime )$, which implies any history consistent with $R^1_i ({\mathcal {T}})$ is also consistent with $R^1_i ({\mathcal {T}}^\prime )$.

Step 2. It follows from Step 1 that, for each $m^\prime < 2$ and each $j \in \{1, 2 \}$, the set of histories consistent with $R^{m^\prime }_j ({\mathcal {T}})$ and the set of histories consistent with $R^{m^\prime }_j ({\mathcal {T}}^\prime )$ are identical. Since ${\mathcal {T}}$ is sufficiently rich, any history consistent with $R^2_i ({\mathcal {T}}^\prime )$ is also consistent with $R^2_i ({\mathcal {T}})$. Fix some $(s_i, t_i) \in R_i^2 ({\mathcal {T}})$. As required by first-order strong belief of rationality, type $t_i$ assigns probability one to $R^1_{-i} ({\mathcal {T}}) \subseteq R^1_{-i} ({\mathcal {T}}^\prime )$ at each history h that is consistent with $R^1_{-i} ({\mathcal {T}})$. Recall that the set of histories consistent with $R^1_{-i} ({\mathcal {T}})$ and the set of histories consistent with $R^1_{-i} ({\mathcal {T}}^\prime )$ are identical. It follows that type $t_i$ assigns probability one to $R^1_{-i} ({\mathcal {T}}^\prime )$ at each history h that is consistent with $R^1_{-i} ({\mathcal {T}}^\prime )$. Hence, $(s_i, t_i) \in R_i^2 ({\mathcal {T}}^\prime )$. Since $R_i^2 ({\mathcal {T}}) \subseteq R_i^2 ({\mathcal {T}}^\prime )$, any history consistent with $R_i^2 ({\mathcal {T}})$ is also consistent with $R_i^2 ({\mathcal {T}}^\prime )$.

Step 3. It follows from Step 2 that, for each $m^\prime < 3$ and each $j \in \{1, 2 \}$, the set of histories consistent with $R^{m^\prime }_j ({\mathcal {T}})$ and the set of histories consistent with $R^{m^\prime }_j ({\mathcal {T}}^\prime )$ are identical. Since ${\mathcal {T}}$ is sufficiently rich, any history consistent with $R^3_i ({\mathcal {T}}^\prime )$ is also consistent with $R^3_i ({\mathcal {T}})$. Fix some $(s_i, t_i) \in R_i^3 ({\mathcal {T}})$. As required by second-order strong belief of rationality, type $t_i$ assigns probability one to $R^2_{-i} ({\mathcal {T}}) \subseteq R^2_{-i} ({\mathcal {T}}^\prime )$ at each history h that is consistent with $R^2_{-i} ({\mathcal {T}})$. Recall that the set of histories consistent with $R^2_{-i} ({\mathcal {T}})$ and the set of histories consistent with $R^2_{-i} ({\mathcal {T}}^\prime )$ are identical. It follows that type $t_i$ assigns probability one to $R^2_{-i} ({\mathcal {T}}^\prime )$ at each history h that is consistent with $R^2_{-i} ({\mathcal {T}}^\prime )$. Hence, $(s_i, t_i) \in R_i^3 ({\mathcal {T}}^\prime )$. Since $R_i^3 ({\mathcal {T}}) \subseteq R_i^3 ({\mathcal {T}}^\prime )$, any history consistent with $R_i^3 ({\mathcal {T}})$ is also consistent with $R_i^3 ({\mathcal {T}}^\prime )$.

And so on. $\square $

Lemma 6

The type structure ${\mathcal {T}}^*$ and its extensions are sufficiently rich.

Proof

Fix a type structure ${\mathcal {T}}^\prime $ that is an extension of ${\mathcal {T}}^*$. It follows from the proof for Proposition 2 and the proof for Theorem 2 that for each player i and each $m_i \in {\mathbb {N}}_+$, we have ${\mathcal {H}} [R_i^{m_i} ({\mathcal {T}}^*)] = {\mathcal {H}} [R_i^{m_i} ({\mathcal {T}}^\prime )] = {\mathbb {H}}(i, m_i)$. Hence, to show ${\mathcal {T}}^*$ and ${\mathcal {T}}^\prime $ are sufficiently rich, it suffices to prove the following: $(*)$ For each player i, each order $m_i \in {\mathbb {N}}_+$, and each type structure ${\mathcal {T}}$ such that ${\mathcal {H}} [R_j^{m^\prime } ({\mathcal {T}})] = {\mathbb {H}}(j, m^\prime )$ for each $m^\prime < m_i$ and each $j \in \{1, 2\}$, we have ${\mathcal {H}} [R_i^{m_i} ({\mathcal {T}})] \subseteq {\mathbb {H}}(i, m_i)$.

First, we prove $(*)$ for $m_i \in \{1, \ldots , M\}$. Fix some $(s_i, t_i) \in R_i^{m_i} ({\mathcal {T}})$ and a history $h \in A^{M - m_i}$. It follows from ${\mathcal {H}} [R_{-i}^{m_i - 1} ({\mathcal {T}})] = {\mathbb {H}}(-i, m_i - 1)$ that h is consistent with $R_{-i}^{m_i - 1} ({\mathcal {T}})$ and any $(s_{-i}, t_{-i}) \in R_{-i}^{m_i - 1} ({\mathcal {T}})$ has $s_{-i}$ defect at every nth round history with $n \ge M - m_i + 2$. Hence, $s_i(h) = D$. This implies ${\mathcal {H}} [R_i^{m_i} ({\mathcal {T}})] \subseteq {\mathbb {H}}(i, m_i)$. To prove $(*)$ for $m_i > M$, we note that ${\mathcal {H}} [R_i^M ({\mathcal {T}})] = {\mathbb {H}}(i, M)$ implies any $(s_i, t_i) \in R_i^{m_i} ({\mathcal {T}}) \subseteq R_i^M ({\mathcal {T}})$ has $s_i$ defect at every history. Hence, ${\mathcal {H}} [R_i^{m_i} ({\mathcal {T}})] \subseteq {\mathbb {H}}(i, m_i)$. $\square $

E Insufficiently rich type structures

Example 2

Suppose the Finitely Repeated Prisoner’s Dilemma has 4 rounds. In this example, we construct a type structure ${\mathcal {T}}$ such that: at some state $(s_1^3, t_1^3, s_2^1, t_2^1) \in R_1^3({\mathcal {T}}) \times R_2^1({\mathcal {T}})$, player 1 cooperates in round 2 on path. The epistemic type set for each player i is

$$\begin{aligned} T_i = \{ t_i^0, t_i^1, t_i^2, t_i^3 \}. \end{aligned}$$

Ex ante, type $t_i^0$ believes that she is facing $(\tilde{s}^G_{-i}, t_{-i}^0)$, where $\tilde{s}^G_{-i}$ defects in round 1, but mimics the grim trigger strategy in the remaining rounds (formally, $\tilde{s}^G_{-i} (a^1, \ldots , a^n) = D$ if and only if $a_j^m = D$ for some $(j,m) \in \{1, 2\} \times \{ 2, \ldots , n \}$). At any $h \in H(\tilde{s}^G_{-i})$, type $t_i^0$ believes she is facing $(\tilde{s}^G_{-i}, t_{-i}^0)$; at any $h \not \in H(\tilde{s}^G_{-i})$, type $t_i^0$ believes she is facing $(s_{-i}^h, t_{-i}^0)$, where $s_{-i}^h$ allows h and defects at every history that does not precede h. We can show that a sequential best response for $t_i^0$ is a strategy $s_i^0$ that cooperates at h if and only if $h \in \{ (DD), (DD, CC) \}$.

Ex ante, type $t_i^1$ believes that she is facing $(s^G_{-i}, t_{-i}^1)$, where $s^G_{-i}$ is the grim trigger strategy (which cooperates if and only if no one has defected). At any $h \in H(s^G_{-i})$, type $t_i^1$ believes she is facing $(s^G_{-i}, t_{-i}^1)$; at any $h \in H(\tilde{s}^G_{-i}) {\setminus } H(s^G_{-i})$, type $t_i^1$ believes she is facing $(\tilde{s}^G_{-i}, t_{-i}^1)$; at any other history h, type $t_i^1$ believes she is facing $(s_{-i}^h, t_{-i}^1)$, where $s_{-i}^h$ allows h and defects at every history that does not precede h. We can show that a sequential best response for $t_i^1$ is a strategy $s_i^1$ that cooperates at h if and only if $h \in \{ h^1, (CC), (CD), (CC, CC), (CD, CC) \}$.

Constructing epistemic types $t_i^2$ and $t_i^3$ involves the following strategies:

$s_i^2$ cooperates at h if and only if $h \in \{ (DC), (DD) \}$
$s_i^3$ cooperates at h if and only if $h \in \{ (DC) \}$.

At any $h \in H(s^1_{-i})$, type $t_i^2$ believes she is facing $(s^1_{-i}, t_{-i}^1)$; at any $h \in H(s^0_{-i}) {\setminus } H(s^1_{-i})$, type $t_i^2$ believes she is facing $(s^0_{-i}, t_{-i}^0)$; at any $h \in H(s^2_{-i}) {\setminus } [\cup _{k=0}^1 H(s^k_{-i})]$, type $t_i^2$ believes she is facing $(s^2_{-i}, t_{-i}^2)$; at any $h \in H(s^3_{-i}) {\setminus } [\cup _{k=0}^2 H(s^k_{-i})]$, type $t_i^2$ believes she is facing $(s^3_{-i}, t_{-i}^3)$; at any other history h, type $t_i^2$ believes she is facing $(s_{-i}^h, t_{-i}^2)$, where $s_{-i}^h$ allows h and defects at every history that does not precede h. We can show that $s_i^2$ is a sequential best response for $t_i^2$.

At any $h \in H(s^2_{-i})$, type $t_i^3$ believes she is facing $(s^2_{-i}, t_{-i}^2)$; at any $h \in H(s^3_{-i}) {\setminus } H(s^2_{-i})$, type $t_i^3$ believes she is facing $(s^3_{-i}, t_{-i}^3)$; at any $h \in H(s^1_{-i}) {\setminus } [\cup _{k=2}^3 H(s^k_{-i})]$, type $t_i^3$ believes she is facing $(s^1_{-i}, t_{-i}^1)$; at any $h \in H(s^0_{-i}) {\setminus } [ \cup _{k = 1}^3 H(s^k_{-i})]$, type $t_i^2$ believes she is facing $(s^0_{-i}, t_{-i}^0)$; at any other history h, type $t_i^1$ believes she is facing $(s_{-i}^h, t_{-i}^3)$, where $s_{-i}^h$ allows h and defects at every history that does not precede h. We can show that $s_i^3$ is a sequential best response for $t_i^3$.

Lemma 7 claims that $\beta _i (t_i^k)$ is a conditional probability system for $k \in \{0, 1, 2, 3\}$. Lemma 8 claims that $(s_i^k, t_i^k) \in R_i^1 ({\mathcal {T}})$ for $k \in \{0, 1, 2, 3\}$. Lemma 9 claims that $(s_i^k, t_i^k) \in R_i^2 ({\mathcal {T}})$ for $k \in \{2, 3\}$. Lemma 10 claims that $(s_i^3, t_i^3) \in R_i^3 ({\mathcal {T}})$. Note that $(s_1^3, t_1^3, s_2^1, t_2^1) \in R_1^3({\mathcal {T}}) \times R_2^1({\mathcal {T}})$ and the path induced by $(s_1^3, s_2^1)$ is (DC, CC, DC, DD), on which player 1 cooperates in round 2.

Example 3

Suppose the Finitely Repeated Prisoner’s Dilemma has 4 rounds. In this example, we construct a type structure ${\mathcal {T}}^\prime $ such that $R_{-i}^1 ({\mathcal {T}}) \subseteq R_{-i}^1 ({\mathcal {T}}^\prime )$ but $SB_i [R_{-i}^1 ({\mathcal {T}})] \not \subseteq SB_i [R_{-i}^1 ({\mathcal {T}}^\prime )]$, where ${\mathcal {T}}$ is the type structure constructed in Example 2. The epistemic type set for each player i is $T^\prime _i = T_i \cup \{ t_i^4 \}$. At any $h \in H(s^G_{-i})$, type $t_i^4$ believes she is facing $(s^G_{-i}, t_{-i}^4)$, where $s^G_{-i}$ is the grim trigger strategy; at any $h \not \in H(s^G_{-i})$, type $t_i^4$ believes she is facing $(s_{-i}^h, t_{-i}^4)$, where $s_{-i}^h$ allows h and defects at every history that does not precede h. It is easy to show that a sequential best response for $t_i^4$ is a strategy $s_i^4$ that cooperates at h if and only if $h \in \{ h^1, (CC), (CC, CC) \}$. Since $T_i \subseteq T_i^\prime $, we have $R_i^1 ({\mathcal {T}}) \subseteq R_{-i}^1 ({\mathcal {T}}^\prime )$. Note that at history $(DC, CD) \in H(s_{-i}^4)$, type $t_i^2$ assigns probability one to an irrational strategy-type pair; thus $(s_i^2, t_i^2) \not \in SB_i [R_{-i}^1 ({\mathcal {T}}^\prime )]$; which implies $SB_i [R_{-i}^1 ({\mathcal {T}})] \not \subseteq SB_i [R_{-i}^1 ({\mathcal {T}}^\prime )]$.

Lemma 7

$\beta _i(t_i^k)$ is a CPS.

Proof

In the following, we show that $\beta _i(t_i^3)$ is a CPS. Showing that $\beta _i(t_i^k)$ is a CPS for $k \in \{0, 1, 2\}$ is similar. It is clear that $\beta _{i,h}(t_i^3)[S_{-i}(h) \times T_{-i}] = 1$ for each $h \in H$. We will show that for each pair $\{ h, h^\prime \} \subset H$ such that $S_{-i}(h^\prime ) \times T_{-i} \subseteq S_{-i}(h) \times T_{-i}$ and each $E \subseteq S_{-i}(h^\prime ) \times T_{-i}$,

$$\begin{aligned} \beta _{i,h}(t_i^3)(E) = \beta _{i,h^\prime }(t_i^3)(E) \times \beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}]. \end{aligned}$$

(3)

Suppose $h \in H(s^2_{-i})$. If $h^\prime \in H(s^2_{-i})$, then $\beta _{i,h}(t_i^3)(s^2_{-i}, t^2_{-i}) =\beta _{i,h^\prime }(t_i^3)(s^2_{-i}, t^2_{-i}) =\beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 1$, which implies (3). If $h^\prime \not \in H(s^2_{-i})$, then $\beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (3).

Suppose $h \in H(s^3_{-i}) {\setminus } H(s^2_{-i})$. Note that $h \not \in H(s^2_{-i})$ implies $h^\prime \not \in H(s^2_{-i})$. If $h^\prime \in H(s^3_{-i})$, then $\beta _{i,h}(t_i^3)(s^3_{-i}, t^3_{-i}) = \beta _{i,h^\prime }(t_i^3)(s^3_{-i}, t^3_{-i}) = \beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 1$, which implies (3). If $h^\prime \not \in H(s^3_{-i})$, then $\beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (3).

Suppose $h \in H(s^1_{-i}) {\setminus } [\cup _{k=2}^3 H(s^k_{-i})]$. As $h \not \in \cup _{k=2}^3 H(s^k_{-i})$, we have $h^\prime \not \in \cup _{k=2}^3 H(s^k_{-i})$. If $h^\prime \in H(s^1_{-i})$, then $\beta _{i,h}(t_i^3)(s^1_{-i}, t^1_{-i}) = \beta _{i,h^\prime }(t_i^3)(s^1_{-i}, t^1_{-i}) = \beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 1$, which implies (3). If $h^\prime \not \in H(s^1_{-i})$, then $\beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (3).

Suppose $h \in H(s^0_{-i}) {\setminus } [\cup _{k=1}^3 H(s^k_{-i})]$. As $h \not \in \cup _{k=1}^3 H(s^k_{-i})$, we have $h^\prime \not \in \cup _{k=1}^3 H(s^k_{-i})$. If $h^\prime \in H(s^0_{-i})$, then $\beta _{i,h}(t_i^3)(s^0_{-i}, t^0_{-i}) =\beta _{i,h^\prime }(t_i^3)(s^0_{-i}, t^0_{-i}) =\beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 1$, which implies (3). If $h^\prime \not \in H(s^0_{-i})$, then $\beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (3).

Suppose $h \not \in \cup _{k=0}^3 H(s^k_{-i})$. This implies $h^\prime \not \in \cup _{k=0}^3 H(s^k_{-i})$. Then $\beta _{i,h}(t_i^3)(s^h_{-i}, t^3_{-i}) = 1$ and $\beta _{i,h^\prime }(t_i^3)(s^{h^\prime }_{-i}, t^3_{-i}) = 1$. If $h^\prime \in H(s^h_{-i})$, then $s^h_{-i}$ and $s^{h^\prime }_{-i}$ are identical, which implies (3). If $h^\prime \not \in H(s^h_{-i})$, then $\beta _{i,h}(t_i^3)[S_{-i}(h^\prime ) \times T_{-i}] = 0$, which implies (3). $\square $

Lemma 8

$(s_i^k, t_i^k) \in R^1_i({\mathcal {T}})$ for $k = 0, 1, 2, 3$.

Proof

At each history $h \in H(s_i^0) \cap H(\tilde{s}^G_{-i}) {=} \{ h^1, (DD), (DD, CC), (DD, CC, CC) \}$, type $t_i^0$ believes that her opponent is playing $\tilde{s}^G_{-i}$ that mimics the grim trigger strategy in rounds 2, 3, 4 regardless of what happens in round 1; thus, it is optimal for type $t_i^0$ to defect in round 1, cooperate at (DD), and cooperate at (DD, CC). At each history $h \in H(s_i^0) {\setminus } H(\tilde{s}^G_{-i})$, type $t_i^0$ believes that her opponent is defecting at every history that does not precede h; thus, defecting at these histories is optimal.

At each history $h \in H(s_i^1) \cap H(s^G_{-i}) = \{ h^1, (CC), (CC, CC), (CC, CC, CC) \}$, type $t_i^1$ believes that her opponent is playing the grim trigger strategy $s^G_{-i}$; thus, it is optimal for type $t_i^1$ to cooperate at $h^1$, (CC), and (CC, CC). At each history $h \in [H(s_i^1) \cap H(\tilde{s}^G_{-i})] {\setminus } H(s^G_{-i}) = \{ (CD), (CD, CC), (CD, CC, CC) \}$, type $t_i^1$ believes that her opponent is playing $\tilde{s}^G_{-i}$ that mimics the grim trigger strategy in rounds 2, 3, 4 regardless of what happens in round 1; thus, it is optimal for type $t_i^1$ to cooperate at (CD) and (CD, CC). At each history $h \in H(s_i^1) {\setminus } [H(s^G_{-i}) \cup H(\tilde{s}^G_{-i})]$, type $t_i^1$ believes that her opponent is defecting at every history that does not precede h; thus, defecting at these histories is optimal.

At each history $h \in H(s_i^2) \cap H(s^1_{-i}) = \{ h^1, (DC), (DC, CC), (DC, CC, DC) \}$, type $t_i^2$ believes that her opponent is playing $s^1_{-i}$ that cooperates in round 1, cooperates in round 2 regardless of what player i does in round 1, cooperates in round 3 only if player i cooperates in round 2, and defects in round 4; thus, it is optimal for type $t_i^2$ to defect in round 1, cooperate at (DC), and defect at (DC, CC). At each history $h \in [H(s_i^2) \cap H(s^0_{-i})] {\setminus } H(s^1_{-i}) = \{ (DD), (DD, CC), (DD, CC, DC) \}$, type $t_i^2$ believes that her opponent is playing $s^0_{-i}$ that cooperates in round 3 only if player i cooperates in round 2, and defects in round 4; thus, it is optimal for type $t_i^2$ to cooperate at (DD) and defect at (DD, CC). At each history $h \in [H(s_i^2) \cap H(s^2_{-i})] {\setminus } [\cup _{k=0}^1 H(s^k_{-i})] = \{ (DD, CC, DD) \}$, defecting is obviously optimal. At each history $h \in [H(s_i^2) \cap H(s^3_{-i})] {\setminus } [\cup _{k=0}^2 H(s^k_{-i})] =\{ (DD, CD), (DD, CD, DD) \}$, type $t_i^2$ believes that her opponent is playing $s^3_{-i}$ that defects in rounds 3 and 4; thus, defecting at these histories is optimal. At each history $h \in H(s_i^2) {\setminus } [\cup _{k=0}^3 H(s^k_{-i})]$, type $t_i^2$ believes that her opponent is defecting at every history that does not precede h; thus, defecting at these histories is optimal.

At each history $h \in H(s_i^3) \cap H(s^2_{-i}) = \{ h^1, (DD), (DD, DC), (DD, DC, DD) \}$, type $t_i^3$ believes that her opponent is playing $s^2_{-i}$ that cooperates in round 2 regardless of what player i does in round 1, and defects in rounds 3, 4; thus, defecting at these histories is optimal. At each history $h \in [H(s_i^3) \cap H(s^3_{-i})] {\setminus } H(s^2_{-i}) = \{ (DD, DD), (DD, DD, DD) \}$, type $t_i^3$ believes that her opponent is playing $s^3_{-i}$ that defects in rounds 3, 4; thus, defecting at these histories is optimal. At each history $h \in [H(s_i^3) \cap H(s^1_{-i})] {\setminus } [\cup _{k=2}^3 H(s^k_{-i})] = \{ (DC), (DC, CC), (DC, CC, DC) \}$, type $t_i^3$ believes that her opponent is playing $s^1_{-i}$ that cooperates in round 3 only if player i cooperates in round 2, and defects in round 4; thus, it is optimal for type $t_i^3$ to cooperate at (DC) and defect at (DC, CC). Note that $[H(s_i^3) \cap H(s^0_{-i})] {\setminus } [\cup _{k=1}^3 H(s^k_{-i})] = \emptyset $. At each history $h \in H(s_i^3) {\setminus }[\cup _{k=0}^3 H(s^k_{-i})]$, type $t_i^3$ believes that her opponent is defecting at every history that does not precede h; thus, defecting at these histories is optimal. $\square $

Lemma 9

$(s_i^k, t_i^k) \in R^2_i({\mathcal {T}})$ for $k = 2, 3$.

Proof

First, we show that for each $k = 0, 1, 2, 3$, if $(s_i, t_i^k) \in R_i^1 ({\mathcal {T}})$, then $H(s_i) \subseteq H(s_i^k)$. The proof is by contrapositive. Suppose there exists $h \in H(s_i) {\setminus } H(s_i^k)$. Then we can find some $h^\prime \in H(s_i^k) \cap H(s_i)$ at which $s_i^k (h^\prime ) \not = s_i (h^\prime )$. By construction, at $h^\prime $, type $t_i^k$ assigns probability one to some $s_{-i} \in S_{-i} (h^\prime )$. Since $s_i^k (h^\prime ) \not = s_i (h^\prime )$, the two paths $\xi (s_i^k, s_{-i})$ and $\xi (s_i, s_{-i})$ diverge at $h^\prime $. It is easy to verify that $\xi (s_i^k, s_{-i})$ is strictly better than $\xi (s_i, s_{-i})$ for player i. Thus $(s_i, t_i^k) \not \in R_i^1 ({\mathcal {T}})$.

It follows that if $h \not \in \cup _{k=0}^3 H(s^k_{-i})$, then h is inconsistent with $R_{-i}^1 ({\mathcal {T}})$. Since types $t_i^2$ and $t_i^3$ assign probability one to a rational strategy-type pair at each $h \in \cup _{k=0}^3 H(s^k_{-i})$, they strongly believe $R_{-i}^1 ({\mathcal {T}})$. Thus $(s_i^k, t_i^k) \in R^2_i({\mathcal {T}})$ for $k = 2, 3$. $\square $

Lemma 10

$(s_i^3, t_i^3) \in R^3_i({\mathcal {T}})$.

Proof

Since types $t_i^0$ and $t_i^1$ assign probability one to an irrational strategy-type pair ex ante, they do not strongly believe $R^1_i({\mathcal {T}})$. Thus, a history h is consistent with $R^2_i({\mathcal {T}})$ if and only if it is allowed by some strategy $s_i$ that satisfies either $(s_i, t_i^2) \in R^1_i({\mathcal {T}})$ or $(s_i, t_i^3) \in R^1_i({\mathcal {T}})$. As shown in the proof for Lemma 9, any such history belongs to $\cup _{k=2}^3 H(s^k_{-i})$. At each $h \in \cup _{k=2}^3 H(s^k_{-i})$, type $t_i^3$ assigns probability one to $R^2_i({\mathcal {T}})$; thus $(s_i^3, t_i^3) \in R^3_i({\mathcal {T}})$. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cao, V. An epistemic approach to explaining cooperation in the finitely repeated Prisoner’s Dilemma. Int J Game Theory 51, 53–85 (2022). https://doi.org/10.1007/s00182-021-00785-x

Download citation

Accepted: 13 June 2021
Published: 04 July 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s00182-021-00785-x

Keywords

JEL Classification

C72

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

	C (Cooperate)	D (Defect)
C (Cooperate)	\(b_3, b_3\)	\(b_1, b_4\)
D (Defect)	\(b_4, b_1\)	\(b_2, b_2\)

An epistemic approach to explaining cooperation in the finitely repeated Prisoner’s Dilemma

Abstract

Similar content being viewed by others

Cooperation Mechanisms for the Prisoner’s Dilemma with Bayesian Games

Apology and forgiveness evolve to resolve failures in cooperative agreements

Cooperation in stochastic games: a prisoner’s dilemma experiment

1 Introduction

2 Epistemic game and epistemic conditions

2.1 Finitely repeated Prisoner’s Dilemma

Remark 1

2.2 Type structure and epistemic game

Definition 1

2.3 Rationality

Definition 2

2.4 Strong belief of rationality

Definition 3

3 A necessary condition for cooperation

Proposition 1

Corollary 1

4 An illustrative example

Remark 2

5 The richness of behaviors

Theorem 1

Corollary 2

5.1 Strategies

Remark 3

5.2 Beliefs

5.3 Rationality

Remark 4

5.4 Strong belief of rationality

Example 1

6 Sufficiently rich type structure

Definition 4

Proposition 2

Theorem 2

Proof

7 Discussion

7.1 Insufficiently rich type structures

7.2 Monotonicity

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

A Proof for proposition 1

Remark 5

B Proof for Sect. 4

Lemma 1

Proof

Lemma 2

Proof

C Proof for Theorem 1

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

D Proof for Sect. 6

Lemma 6

Proof

E Insufficiently rich type structures

Example 2

Example 3

Lemma 7

Proof

Lemma 8

Proof

Lemma 9

Proof

Lemma 10

Proof

Rights and permissions

About this article