1 Introduction

The finitely repeated Prisoner’s Dilemma has a unique Nash equilibrium, in which both players defect in every round. This equilibrium outcome is inconsistent with experimental evidence: experimental subjects typically cooperate to some extent [see Oskamp and Perlman (1965), Morehous (1966), Selten and Stoecker (1986), Andreoni and Miller (1993), Cooper et al. (1996), Dal Bo and Frechette (2011), Normann and Wallace (2012), Kagel and McGee (2014), and Embrey et al. (2017)]. Various modifications to the finitely repeated Prisoner’s Dilemma have been made to obtain an equilibrium path on which players cooperate. For instance, Kreps et al. (1982) assume there is a small probability that one player is irrational and plays a Tit for Tat strategy (one-sided incomplete information). Under this assumption, in any sequential equilibrium that is not Pareto-dominated by any other sequential equilibria, the path of play has bilateral defection only in the last few rounds and bilateral cooperation in all other rounds. Fudenberg and Maskin (1986) also assume each player is irrational with a small probability (two-sided incomplete information). When the number of rounds is sufficiently large, by varying the form of irrationality, any individually rational payoffs of the stage game can be approximated by sequential equilibrium payoffs of the finitely repeated game. Instead of introducing irrational types, Neyman (1999) assumes an exponentially small departure from the common knowledge assumption on the number of rounds. He shows that when the number of rounds is sufficiently large, any individually rational payoffs of the stage game can be approximated by subgame perfect equilibrium payoffs of the finitely repeated game. For other works that use the equilibrium approach to rationalize cooperation, see Radner (1980), Radner (1986), Neyman (1985), Neyman (1998), Samuelson (1987), Hirshleifer and Rasmusen (1989), and Dijkstra and van Assen (2017).

This paper uses epistemic theory instead of equilibrium theory to explain why rational players cooperate. We attach to the finitely repeated Prisoner’s Dilemma a type structure that captures players’ interactive beliefs as in Battigalli and Siniscalchi (1999). Our objective is to characterize the set of all behavior outcomes that can arise when players are rational and satisfy some orders of strong belief of rationality (henceforth SB). As defined in Battigalli and Siniscalchi (2002), a player satisfies first order SB if, whenever possible, she believes that her opponent is rational; a player satisfies second order SB if (a) she satisfies first order SB, and (b) whenever possible, she believes that her opponent is rational and satisfies first order SB; and so on. Our first result (Theorem 1) claims that there exists a type structure that satisfies the following: for each pair \((m_1, m_2) \in {\mathbb {N}}_+^2\), the set of outcomes that can arise when each player i is rational and satisfies \((m_i-1)\)th order SB is the set of paths on which each player i defects in the last \(m_i\) rounds. This result is consistent with Battigalli and Friedenberg (2012), who show that for any type structure, if players satisfy rationality and common SBFootnote 1, then they defect in every round on path. In this paper, we focus on behavioral implications of epistemic conditions weaker than rationality and common SB. In particular, for each pair \((m_1, m_2) \in \{1, \ldots , M - 1 \}^2\) (where M is the number of rounds), we elaborate on how different patterns of cooperative behaviors arise when each player i is rational and satisfies \((m_i-1)\)th order SB.

In the following, we give a sketch on how to construct the type structure that supports Theorem 1. In this type structure, for each pair \((m_1, m_2) \in {\mathbb {N}}_+^2\) and each path \(\varvec{a}\) on which each player i defects in the last \(m_i\) rounds, there must be a pair of beliefsFootnote 2 (one for each player) such that player i’s belief satisfies \((m_i-1)\)th order SB and players’ rationality under these beliefs leads to \(\varvec{a}\). To ensure the type structure contains all necessary beliefs, for each player i, we construct a mapping that maps each pair \((m_i, \varvec{a})\) (where \(m_i \in {\mathbb {N}}\) and \(\varvec{a}\) is a path on which player i defects in the last \(m_i\) rounds) to a pair of strategy and epistemic type for player i. At each conditioning event, this epistemic type assigns probability one to some \((m^\prime _{-i}, \varvec{a^\prime })\) (where \(m^\prime _{-i} \in {\mathbb {N}}\) and \(\varvec{a^\prime }\) is a path on which player \(-i\) defects in the last \(m^\prime _{-i}\) rounds), which is mapped to a pair of strategy and epistemic type for player \(-i\). For each pair \((m_1, m_2) \in {\mathbb {N}}_+^2\) and each path \(\varvec{a}\) on which each player i defects in the last \(m_i\) rounds, the strategy-epistemic type pairs associated with \((m_1, \varvec{a})\) and \((m_2, \varvec{a})\) satisfy the following desired properties: player i’s epistemic type satisfies \((m_i-1)\)th order SB, player i’s strategy is rational given her epistemic type, and the strategy profile induces \(\varvec{a}\).

With this type structure, for each pair \((m_1, m_2) \in \{1, \ldots , M \}^2\) (where M is the number of rounds), when each player i is rational and satisfies \((m_i-1)\)th order SB, any path on which each player i defects in the last \(m_i\) rounds (indexed \(M - m_i + 1, \ldots , M\)) is possible. Such richness of the set of possible outcomes is due to the optimality of forgiving the opponent’s past defection and the belief that one’s defection will be forgiven. For illustration, suppose player i satisfies \((m_i-1)\)th order SB and is at some round \(n < M - m_i + 1\). At the beginning of round n, player i believes that her opponent will cooperate in round n and will keep cooperating until round \(M - m_i + 1\) if no one has defected since round n. If player i believes that her defection in round n will be forgiven (i.e., her opponent will still cooperate in round \(n+1\) and keep cooperating until round \(M - m_i + 1\) if no one has defected since round \(n+1\)), then it is optimal to defect in round n. If player i believes that her defection in round n will not be forgiven (i.e., her opponent will respond by defecting from round \(n+1\) onwards), then it is optimal to cooperate in round n. Thus, any action by player i in each round \(n < M - m_i + 1\) can be rationalized. Indeed, there are beliefs under which forgiving the opponent’s past defection is optimal: if a player believes that forgiving will lead to a cooperative phase whereas retaliating will lead to bilateral defection in all future rounds, then it is optimal to forgive.

With this type structure, for each pair \((m_1, m_2) \in \{1, \ldots , M \}^2\), when each player i is rational and satisfies \((m_i-1)\)th order SB, any path on which some player i cooperates in some round \(n \ge M - m_i + 1\) is impossible. We sketch out the proof below. Suppose player i satisfies first order SB and is at round \(M - 1\). As discussed in the preceding paragraph, any past play is consistent with the hypothesis that player \(-i\) is rational. Thus, player i assigns probability one to the event that player \(-i\) is rational and will defect in round M regardless of what player i does in round \(M - 1\). Under this belief, defecting in round \(M - 1\) is optimal for player i. Next, suppose player i satisfies second order SB and is at round \(M - 2\). As discussed in the preceding paragraph, any past play is consistent with the hypothesis that player \(-i\) is rational and satisfies first order SB. Thus, player i assigns probability one to the event that player \(-i\) is rational and satisfies first order SB, and will defect in round \(M-1\) regardless of what player i does in round \(M - 2\). Under this belief, defecting in round \(M - 2\) is optimal for player i. And so on. By induction, if player i is rational and satisfies \((m_i-1)\)th order SB, then she defects in each round \(n \ge M - m_i + 1\) regardless of what player \(-i\) has done by round n, as she assigns the highest possible order of SB to player \(-i\) and expects that player \(-i\) will defect from round \(n + 1\) onwards.

We then use the type structure constructed for Theorem 1 to obtain the second result (Theorem 2): for any sufficiently rich type structure and each pair \((m_1, m_2) \in {\mathbb {N}}_+^2\), the set of outcomes that can arise when each player i is rational and satisfies \((m_i-1)\)th order SB is the set of paths on which each player i defects in the last \(m_i\) rounds. The notion of sufficiently rich type structure is introduced in Perea (2012).Footnote 3 Informally, a type structure \({\mathcal {T}}\) is sufficiently rich if: for each player i, each order \(m_i \in {\mathbb {N}}_+\), and each history of past actions h, if there exists a type structure \({\mathcal {T}}^\prime \) such that (a) for each \(j \in \{1, 2\}\) and each \(m^\prime < m_i\), the behavioral implication of \((m^\prime -1)\)th order SB for player j in the game with \({\mathcal {T}}^\prime \) is identical to that in the game with \({\mathcal {T}}\),Footnote 4 and (b) with \({\mathcal {T}}^\prime \), the history h is consistent with the hypothesis that player i is rational and satisfies \((m_i-1)\)th order SB, then the type structure \({\mathcal {T}}\) must contain some belief for player i such that the belief satisfies \((m_i-1)\)th order SB and rationality under this belief leads to some behavior consistent with history h. We show that the set of outcomes that can arise when each player i is rational and satisfies \((m_i-1)\)th order SB for a sufficiently rich type structure coincides with the set of outcomes that can arise when each player i is rational and satisfies \((m_i-1)\)th order SB for a complete type structure.Footnote 5 Thus, the notion of sufficiently rich type structure might be useful if one aims to study behavioral implications of an epistemic condition for a complete type structure but finds it more convenient to work with incomplete type structures. We also show that the type structure that supports Theorem 1 and its extensions (which are formed by adding new beliefs to the original type structure) are sufficiently rich. A complete type structure is one of these extensions and obviously sufficiently rich.

In the following, we show how to use the type structure constructed for Theorem 1 to obtain Theorem 2, and highlight the importance of the assumption that the type structure is sufficiently rich. Let \({\mathcal {T}}\) be a sufficiently rich type structure and \({\mathcal {T}}^*\) be the type structure constructed for Theorem 1. First, recall that with \({\mathcal {T}}^*\), when players are rational, any outcome with bilateral defection in the last round is possible. Consequently, as \({\mathcal {T}}\) is sufficiently rich, in the game with \({\mathcal {T}}\), any non-terminal history is consistent with the behavior of a rational player. It follows that in the game with \({\mathcal {T}}\), the set of outcomes that can arise when players are rational is the set of paths on which players defect in the last round. Next, recall that with \({\mathcal {T}}^*\), when players are rational and satisfy first order SB, any outcome with bilateral defections in the last two rounds is possible. Consequently, as \({\mathcal {T}}\) is sufficiently rich, in the game with \({\mathcal {T}}\), any non-terminal history (except the one on which player i defects in round \(M - 1\)) is consistent with the behavior of player i who is rational and satisfies first order SB. In addition, if a player is rational and satisfies first order SB, then she defects in round \(M - 1\) as she assigns probability one to the event that her opponent is rational and will defect in round M. It follows that in the game with \({\mathcal {T}}\), the set of outcomes that can arise when players are rational and satisfy first order SB is the set of paths on which players defect in the last two rounds. And so on. If a type structure is not sufficiently rich, when each player i is rational and satisfies \((m_i - 1)\)th order SB, the following might happen: (a) some path on which each player i defects in the last \(m_i\) rounds is impossible, as some player i does not have any belief that satisfies \((m_i - 1)\)th order SB and rationalizes this path, and (b) some path on which some player i cooperates in some round \(n \ge M - m_i + 1\) is possible (we show (b) by means of an example).

Throughout the paper, we mostly work with sufficiently rich type structures. In Sect. 7, we give a conjecture for type structures that are not sufficiently rich. In particular, we conjecture that: for each pair \((m_1, m_2) \in \{ 1, \ldots , M - 1 \}^2\) with \(m_1 > m_2\), and each path \(\varvec{a}\) on which player 1 defects in the last \(m_2 + 1\) rounds and player 2 defects in the last \(m_2\) rounds, there exists an insufficiently rich type structure with which \(\varvec{a}\) is possible when each player i is rational and satisfies \((m_i - 1)\)th order SB. If this conjecture is correct, then we can characterize the set of outcomes that can arise when each player i satisfies rationality and \((m_i - 1)\)th order SB across all type structures. The importance of studying different type structures has been discussed in Battigalli and Friedenberg (2012). The authors interpret a type structure as a context, in which the set of possible beliefs is a product of a history or social conventions. If an analyst does not know the context, she might need to study behavioral implications of an epistemic condition across all type structures.

Our work is related to some papers that provide procedures for deriving the set of possible outcomes under rationality and mth order strong belief of rationality (henceforth RmSBR) for each \(m \in {\mathbb {N}}\). Battigalli and Siniscalchi (2002) provide a procedure for deriving such sets for a complete type structure: an outcome is possible under RmSBR if, with a complete type structure, it can arise under RmSBR. Brandenburger et al. (2019) provide a procedure for deriving such sets across all type structures: an outcome is possible under RmSBR if there exists a type structure with which it can arise under RmSBR. Our paper derives these sets for each sufficiently rich type structure. First, we construct a sufficiently rich type structure with which the set of outcomes that can arise under RmSBR is the set of paths on which players defect in the last \(m + 1\) rounds. Then, we use this result to show that for any sufficiently rich type structure, the set of outcomes that can arise under RmSBR is the set of paths on which players defect in the last \(m + 1\) rounds. For our work as well as the two aforementioned procedures, the most challenging task is to construct beliefs that satisfy mth order strong belief of rationality and rationalize all paths that have bilateral defections in the last \(m + 1\) rounds. Our main contribution is our construction of the type structure that contains all such beliefs.

The remaining of the paper is organized as follows. Section 2 introduces basic epistemic concepts. Section 3 shows that for any type structure and any \(m \in {\mathbb {N}}_+\), if an outcome arises under R\((m-1)\)SBR, then it has bilateral defections in the last m rounds. Section 4 gives an illustrative example. Section 5 presents a type structure with which the set of outcomes that can arise when each player i is rational and satisfies \((m_i-1)\)th order SB is the set of paths on which each player i defects in the last \(m_i\) rounds. Section 6 defines a sufficiently rich type structure and studies behavioral implications of SB for these type structures. Section 7 discusses insufficiently rich type structures.

2 Epistemic game and epistemic conditions

In this section, we introduce the Finitely Repeated Prisoner’s Dilemma, its epistemic game, and epistemic conditions. Our epistemic framework follows Battigalli and Siniscalchi (2002), Battigalli and Friedenberg (2012), Friedenberg (2019), and Brandenburger et al. (2019).

2.1 Finitely repeated Prisoner’s Dilemma

Let the stage game be \(P = (A_i, u_i)_{i=1,2}\), where for each player i, the action set \(A_i = \{C, D\}\) and the payoff function \(u_i: A_1 \times A_2 \rightarrow {\mathbb {R}}\) is described by the following payoff matrix:

 

C (Cooperate)

D (Defect)

C (Cooperate)

\(b_3, b_3\)

\(b_1, b_4\)

D (Defect)

\(b_4, b_1\)

\(b_2, b_2\)

where \(b_4> b_3> b_2 > b_1\). There are M rounds (\(1< M < \infty \)). In each round, the stage game is played and the outcome is observed. Let \(a^m \in A \equiv A_1 \times A_2\) be the joint action played in round m and let \(a^m_i\) be the action of player i that constitutes \(a^m\). Let \({\mathcal {P}}\) denote the supergame.

Histories. A history h is a sequence of joint actions. Let \(h^1 \equiv (a^0)\) denote the root of the game tree. For each \(m \in \{ 2, \ldots , M \}\), let \(h^m \equiv (a^1, \ldots , a^{m-1}) \in A^{m-1}\) be an mth-round history. Let H be the set of non-terminal histories and Z be the set of terminal histories.

Strategies. A strategy \(s_i: H \rightarrow A_i\) maps each history to an action available at that history. Let \(S_i\) be the set of strategies of player i. A strategy \(s_i\) allows history \(h \in H \cup Z\) if there is some strategy \(s_{-i}\) such that the path induced by \((s_i, s_{-i})\) passes through h. For each history \(h \in H \cup Z\), let \(S_i(h)\) be the set of player i’s strategies that allow h. For each strategy \(s_i \in S_i\), let \(H(s_i) = \big \{ h \in H \cup Z ~|~ s_i \in S_i(h) \big \}\) be the set of histories allowed by \(s_i\).

Payoffs. Let \(\xi : S_1 \times S_2 \rightarrow A^M\) map each strategy profile \((s_1, s_2)\) to a path of play. For each \(m \in \{1, \ldots , M \}\), let \(\xi ^m: S_1 \times S_2 \rightarrow A\) map each strategy profile \((s_1, s_2)\) to a joint action at round m. The overall payoff is the sum of stage game payoffs: \(\pi _i (s_i, s_{-i}) = \sum _{m = 1}^M u_i [ \xi ^m(s_i, s_{-i}) ]\).

Remark 1

Generally, we write a joint action \(a \in A\) in order of \(a_1a_2\). When focusing on a generic player i, we write a in order of \(a_i a_{-i}\); for instance, CD means that player i chooses C and his opponent chooses D. We use ‘path of play’ and ‘outcome’ interchangeably, and write a generic outcome as \((a^1, \ldots , a^M)\) and \(\varvec{a}\) interchangeably.

2.2 Type structure and epistemic game

As the game unfolds, each player might need to revise her belief conditional on her opponent’s past actions. Hence, specifying players’ beliefs at every history is necessary. At each history, each player forms a first-order conditional belief (a probabilistic assessment about the strategies of her opponent), a second-order conditional belief (a probabilistic assessment about the strategies and first-order conditional beliefs of her opponent), and so on. Players’ possible beliefs are captured by a type structure.

Definition 1

A type structure is a tuple

$$\begin{aligned} {\mathcal {T}} \equiv ({\mathcal {P}}; T_1, T_2; {\mathcal {E}}_1, {\mathcal {E}}_2; \beta _1, \beta _2), \end{aligned}$$

where for each i,

  1. (a)

    \(T_i\) is a compact metric epistemic type set,

  2. (b)

    \({\mathcal {E}}_i \otimes T_{-i} \equiv \{ S_{-i}(h) \times T_{-i} : h \in H \}\) is the set of conditioning events,

  3. (c)

    \(\beta _i : T_i \rightarrow {\mathcal {C}}(S_{-i} \times T_{-i}, {\mathcal {E}}_i \otimes T_{-i})\) is a continuous belief map.

In this definition, \({\mathcal {C}}(S_{-i} \times T_{-i}, {\mathcal {E}}_i \otimes T_{-i})\) is a set of conditional probability systems, each of which specifies player i’s conditional beliefs about the strategies and epistemic types of her opponent at all histories.Footnote 6 For each epistemic type \(t_i \in T_i\) and each history \(h \in H\), let \(\beta _{i, h} (t_i) (\cdot ) \equiv \beta _i(t_i)(\cdot |S_{-i}(h) \times T_{-i})\) be the conditional belief of type \(t_i\) at h. Marginalizing \(\beta _{i, h} (t_i)\) onto \(S_{-i}\) gives the first-order conditional belief of type \(t_i\). Since each type \(t_{-i}\) also has a first-order conditional belief, \(\beta _{i, h} (t_i)\) gives the second-order conditional belief of type \(t_i\). And so on.

A epistemic game is a pair \(({\mathcal {P}}, {\mathcal {T}})\), where \({\mathcal {T}}\) is a type structure. Each \(({\mathcal {P}}, {\mathcal {T}})\) induces a set of states, viz. \(S_1 \times T_1 \times S_2 \times T_2\). A state \((s_1, t_1, s_2, t_2)\) describes the strategies \((s_1, s_2)\) and beliefs \([\beta _1(t_1), \beta _2(t_2)]\) of players.

2.3 Rationality

Rationality requires that the player maximize her expected utility subject to her conditional belief at every history allowed by her strategy. Let player i’s expected utility from choosing \(s_i\) under conditional belief \(\beta _{i,h}(t_i)\) be

$$\begin{aligned} U_{i,h}(s_i, t_i) \equiv \displaystyle \sum _{s_{-i} \in S_{-i}} \pi _i(s_i, s_{-i}) \times \mathrm{marg}_{S_{-i}} \beta _{i,h}(t_i)(s_{-i}). \end{aligned}$$

Definition 2

A strategy-type pair \((s_i, t_i)\) is rational if for each \(h \in H(s_i)\) and each \(r_i \in S_i(h)\), we have \(U_{i,h}(s_i, t_i) \ge U_{i,h}(r_i, t_i)\).

If \((s_i, t_i)\) is rational, then we say \(s_i\) is a sequential best response for \(t_i\). Let \(R_i\) be the set of rational strategy-type pairs for player i and let \(R \equiv R_1 \times R_2\) be the set of states at which each player is rational.

2.4 Strong belief of rationality

This section presents conditions on beliefs. First-order strong belief of rationality requires that the player believe her opponent is rational until she observes otherwise. Second-order strong belief of rationality requires that the player (a) satisfy first-order strong belief of rationality, and (b) believe her opponent is rational and satisfies first-order strong belief of rationality until she observes otherwise. An so on. The following definition of strong belief follows Battigalli and Siniscalchi (2002).

Definition 3

A type \(t_i\) strongly believes an event \(E \subseteq S_{-i} \times T_{-i}\) if, for each \(h \in H\),

$$\begin{aligned} E \cap [S_{-i} (h) \times T_{-i}] \not = \emptyset \ \hbox {implies} \ \beta _{i, h} (t_i) (E) = 1. \end{aligned}$$

For each \(m \in {\mathbb {N}}\), the condition of mth-order strong belief of rationality is defined as follows. Let \(SB_i (E) \equiv S_i \times \{ t_i ~|~ t_i\) strongly believes \(E \}\). Set \(R_i^0 = S_i \times T_i\) and \(R_i^1 = R_i\). Define inductively \(R_i^{m+1} = R_i^m \cap SB_i(R_{-i}^m)\) for each \(m \in {\mathbb {N}}_+\). Set \(R_i^\infty = \bigcap _{m \in {\mathbb {N}}_+} R_i^m\). We say that \(R_i^m\) is the set of player i’s strategy-type pairs that satisfy rationality and \((m-1)\)th-order strong belief of rationality. Let \(R^m = R_1^m \times R_2^m\) be the set of states at which there is rationality and \((m-1)\)th-order strong belief of rationality. Let \(R^\infty = R_1^\infty \times R_2^\infty \) be the set of states at which there is rationality and common strong belief of rationality. We say that an outcome \(\varvec{a}\) is consistent with \(R^m\) if there exists a state \((s_1, t_1, s_2, t_2) \in R^m\) such that \(\xi (s_1, s_2) = \varvec{a}\).

3 A necessary condition for cooperation

In this section, we show that cooperation occurs only if at least one player fails rationality and \((M-1)\)th-order strong belief of rationality. In other words, if both players satisfy rationality and \((M-1)\)th-order strong belief of rationality, then they defect in every round on the path of play. Battigalli and Friedenberg (2012) show that, for any epistemic game \(({\mathcal {P}}, {\mathcal {T}})\), rationality and common strong belief of rationality implies bilateral defection in every round. We show that, for any epistemic game \(({\mathcal {P}}, {\mathcal {T}})\), rationality and \((m-1)\)th-order strong belief of rationality (with \(m \in \{ 1, \ldots , M \}\)) implies bilateral defection in each round \(n \ge M - m + 1\); equivalently, at any state in \(R^m\), players defect in the last m rounds on path.

Proposition 1

For each epistemic game \(({\mathcal {P}}, {\mathcal {T}})\) and each state \((s_1, t_1, s_2, t_2)\) thereof, if \((s_1, t_1, s_2, t_2) \in R^m\) for some \(m \in \{ 1, \ldots , M \}\), then \(\xi ^n(s_1, s_2) = DD\) for each \(n \ge M - m + 1\).

The proof for Proposition 1 (see Appendix A) resembles that of Battigalli and Friedenberg (2012). For illustration, we assume the stage game is played for three rounds. For each \(m \in \{1, 2, 3\}\), we will show that \(R^m\) implies bilateral defection in the last m rounds. It is easy to show that players who satisfy \(R^1\) defect at every third-round history; hence, \(R^1\) implies bilateral defection in round 3. Fix a state \((s_i, t_i, s_{-i}, t_{-i}) \in R^2\) and a second-round history \(h^2\) on the path induced by \((s_i, s_{-i})\). The history \(h^2\) is consistent with rationality; hence, at \(h^2\), player i is certain that her opponent is rational and will defect at every third-round history. Consequently, player i defects at \(h^2\). Finally, fix a state \((s_i, t_i, s_{-i}, t_{-i}) \in R^3\). Ex ante, player i is certain that her opponent satisfies \(R^2\) and that bilateral defection will ensue in the last two rounds. If \(s_i(h^1) = C\), then the strategy \(s_i\) is strictly worse than some strategy \(r_i\) that defects at every history. It follows that \(s_i(h^1) = D\).

Proposition 1 implies the following:

Corollary 1

For each epistemic game \(({\mathcal {P}}, {\mathcal {T}})\) and each state \((s_1, t_1, s_2, t_2)\) thereof, if \((s_1, t_1, s_2, t_2) \in R^m\) for some \(m \ge M\), then \(\xi ^n(s_1, s_2) = DD\) for each \(n \in \{ 1, \ldots , M \}\).

Corollary 1 claims that if both players satisfy rationality and \((M-1)\)th-order strong belief of rationality, then they defect in every round on the path of play. In the next sections, we study how cooperation arises when rationality and \((M-1)\)th-order strong belief of rationality does not hold.

4 An illustrative example

In Sect. 5, for an M-round Prisoner’s Dilemma, we construct an epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\) in which: for each \(m \in \{ 1, \ldots , M - 1 \}\), any outcome that has bilateral defection in the last m rounds is consistent with \(R^m\). In this section, we assume there are three rounds. A three-round Prisoner’s Dilemma has a relatively small set of non-terminal histories, which allows us to construct a relatively simple epistemic game \(({\mathcal {P}}, {\mathcal {T}}^\prime )\) in which: for each \(m \in \{ 1, 2 \}\), any outcome that has bilateral defection in the last m rounds is consistent with \(R^m\). Although the epistemic game \(({\mathcal {P}}, {\mathcal {T}}^\prime )\) presented in this section is simpler than the three-round Prisoner’s Dilemma epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\) presented in Sect. 5, it preserves key features of \(({\mathcal {P}}, {\mathcal {T}}^*)\). We highlight these key features in Remark 2 at the end of this section.

The type structure \({\mathcal {T}}^\prime \) is constructed as follows. The epistemic type set for each player i is

$$\begin{aligned} T_i = \{ t_i^1, t_i^2, t_i^3, t_i^4, t_i^5 \}. \end{aligned}$$

Ex ante, type \(t_i^1\) believes that her opponent has type \(t_{-i}^1\) and plays grim trigger strategy \(s_{-i}^G\) (which cooperates if and only if no one has defected). At any history h that is allowed by \(s_{-i}^G\), type \(t_i^1\) continues to believe that she is facing the strategy-type pair \((s_{-i}^G, t_{-i}^1)\). At any history h that is not allowed by \(s_{-i}^G\), type \(t_i^1\) believes that she is facing strategy-type pair \((s_{-i}^h, t_{-i}^1)\), where \(s_{-i}^h\) allows h and defects at every history that does not precede h. We can show that a sequential best response for \(t_i^1\) is a strategy \(s_i^1\) that cooperates in round one, cooperates at history (CC), and defects at any other history.

Ex ante, type \(t_i^2\) believes that her opponent has type \(t_{-i}^1\) and plays strategy \(s_{-i}^F\) that cooperates in round one, cooperates at every second-round history, and plays Tit-for-Tat in round three (formally, \(s^F_{-i} (h) = D\) if and only if \(h \in \{ (a^1, a^2) \in A^2 ~|~ a_{-i}^2 = D \}\)). The superscript F in \(s^F_{-i}\) is a mnemonic for ‘forgiving’: type \(t_i^2\) believes that her defection in round one will be ‘forgiven’ by her opponent. At any history \(h \not = h^1\), type \(t_i^2\) believes that she is facing strategy-type pair \((\tilde{s}_{-i}^h, t_{-i}^1)\), where \(\tilde{s}_{-i}^h\) allows h and has \(\tilde{s}_{-i}^h (h^\prime ) =s^F_{-i} (h^\prime )\) for each \(h^\prime \) that does not precede h. We can show that a sequential best response for \(t_i^2\) is a strategy \(s_i^2\) that defects in round one, cooperates at every second-round history, and defects at every third-round history.

At any history h that is allowed by grim trigger strategy \(s_{-i}^G\), type \(t_i^3\) believes that she is facing strategy-type pair \((s_{-i}^G, t_{-i}^1)\). At any history h that is not allowed by \(s_{-i}^G\), type \(t_i^3\) has the same belief as type \(t_i^2\) does: \(\beta _{i,h} (t_i^3) = \beta _{i,h} (t_i^2)\). We can show that a sequential best response for \(t_i^3\) is a strategy \(s_i^3\) that cooperates at each history \(h \in \{ h^1, CC, CD, DD \}\) and defects at any other history.

At any history h that is allowed by strategy \(s^3_{-i}\), type \(t_i^4\) believes that she is facing strategy-type pair \((s_{-i}^3, t_{-i}^3)\). At the beginning of round two, if player \(-i\) has defected in round one, type \(t_i^4\) believes that she is facing \((s_{-i}^4, t_{-i}^4)\), where \(s_{-i}^4\) defects at every history. At the beginning of round three, if player \(-i\) has defected in the first two rounds, type \(t_i^4\) believes she is facing \((s_{-i}^4, t_{-i}^4)\); if player \(-i\) has cooperated in the first two rounds, type \(t_i^4\) believes she is facing \((s_{-i}^3, t_{-i}^3)\); if player \(-i\) has defected in round one and cooperated in round two, type \(t_i^4\) believes she is facing \((s_{-i}^2, t_{-i}^2)\); if player \(-i\) has cooperated in round one and defected in round two, type \(t_i^4\) believes that she is facing \((s_{-i}^5, t_{-i}^5)\), where \(s_{-i}^5\) cooperates in round one and defects at every other history. We can show that a sequential best response for \(t_i^4\) is strategy \(s_i^4\) that defects at every history.

At any history h that is allowed by strategy \(s_{-i}^1\), type \(t_i^5\) believes that she is facing strategy-type pair \((s_{-i}^1, t_{-i}^1)\). At any history h that is not allowed by \(s_{-i}^1\), type \(t_i^5\) has the same belief as type \(t_i^4\) does: \(\beta _{i,h} (t_i^5) = \beta _{i,h} (t_i^4)\). We can show that a sequential best response for \(t_i^5\) is strategy \(s_i^5\) that cooperates in round one and defects at every other history.

In Appendix B, we show that \(\beta _i (t_i^{k_i})\) is a conditional probability system and \((s_i^{k_i}, t_i^{k_i})\) is a rational strategy-type pair for each \(k_i \in \{1, 2, 3, 4, 5 \}\). For each outcome \(\varvec{a}\) that has bilateral defection in round three, there is a state \((s_i^{k_i}, t_i^{k_i}, s_{-i}^{k_{-i}}, t_{-i}^{k_{-i}}) \in R\) such that the strategy profile \((s_i^{k_i}, s_{-i}^{k_{-i}})\) induces \(\varvec{a}\). Specifically,

$$\begin{aligned} \begin{array}{ll} \xi (s^4_i, s^4_{-i}) = (DD, DD, DD) &{} \qquad \xi (s^5_i, s^5_{-i}) = (CC, DD, DD) \\ \xi (s^2_i, s^2_{-i}) = (DD, CC, DD) &{} \qquad \xi (s^1_i, s^1_{-i}) = (CC, CC, DD) \\ \xi (s^2_i, s^4_{-i}) = (DD, CD, DD) &{} \qquad \xi (s^1_i, s^5_{-i}) = (CC, CD, DD) \\ \xi (s^5_i, s^4_{-i}) = (CD, DD, DD) &{} \qquad \xi (s^3_i, s^4_{-i}) = (CD, CD, DD) \\ \xi (s^3_i, s^2_{-i}) = (CD, CC, DD) &{} \qquad \xi (s^5_i, s^2_{-i}) = (CD, DC, DD) \\ \end{array} \end{aligned}$$

By construction, at every history \(h \in H\), epistemic types \(t_i^4\) and \(t_i^5\) assign probability one to a rational strategy-type pair. Hence, strategy-type pairs \((s_i^4, t_i^4)\) and \((s_i^5, t_i^5)\) satisfy rationality and first order strong belief of rationality. As seen above, for each outcome \(\varvec{a}\) that has bilateral defection in the last two rounds, there is a state in \(\{ (s_i^4, t_i^4), (s_i^5, t_i^5) \} \times \{ (s_{-i}^4, t_{-i}^4), (s_{-i}^5, t_{-i}^5) \} \subseteq R^2\) at which \(\varvec{a}\) occurs. We note that \((s_i^{k_i}, t_i^{k_i}) \in R_i^1 {\setminus } R_i^2\) for \(k_i \in \{1, 2, 3 \}\) (as \(t_i^1\), \(t_i^2\), and \(t_i^3\) do not assign probability one to \(R^1_{-i}\) ex ante) and \((s_i^{k_i}, t_i^{k_i}) \in R_i^2 {\setminus } R_i^3\) for \(k_i \in \{4, 5 \}\) (as \(t_i^4\) and \(t_i^5\) do not assign probability one to \(R^2_{-i}\) ex ante).

Remark 2

The epistemic game \(({\mathcal {P}}, {\mathcal {T}}^\prime )\) has four key features. First, if \((s_i, t_i) \in R_i^m\), then \(t_i\) always believes that her opponent will defect at every nth-round history with \(n \ge M - m + 2\). Specifically, types \(t_i^4\) and \(t_i^5\) always believe that her opponent will defect at every third-round history. Second, if \((s_i, t_i) \in R_i^m {\setminus } R_i^{m+1}\), then ex ante type \(t_i\) believes that her opponent will cooperate in each round \(n \in \{ 1, \ldots , M - m + 1 \}\) if no defection has occurred. Third, there exists some strategy-type pair \((s_i, t_i) \in R_i^1\) that occasionally ‘forgives’ her opponent’s past defection. Specifically, on observing that player \(-i\) has defected in round one, type \(t_i^3\) believes that player \(-i\) will cooperate in round two and play Tit-for-Tat in round three; thus, forgiving player \(-i\)’s defection and cooperating in round two will lead to outcome (CDCCDC) whereas defecting in round two will lead to outcome (CDDCDD), which implies that forgiving is optimal. Fourth, there exists some strategy-type pair \((s_i, t_i) \in R_i^1\) that occasionally defects due to the belief that her defection will be ‘forgiven’. For instance, strategy-type pair \((s^4_i, t^4_i)\) defects in round one as ex ante she believes she is facing \((s^3_{-i}, t^3_{-i})\) that forgives a first-round defection. The epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\) in Sect. 5 also has these features. In the next section, we will show how the optimality of forgiving and the belief that one’s defection will be forgiven generate a rich set of behavior outcomes at \(R_1^{m_1} \times R_2^{m_2}\) for any pair \((m_1, m_2) \in \{ 1, \ldots , M \}^2\).

5 The richness of behaviors

In this section, we construct an epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\) such that, for each pair \((m_1, m_2) \in \{ 1, \ldots , M \}^2\), the set of outcomes consistent with \(R_1^{m_1} \times R_2^{m_2}\) is the set of paths on which each player i defects in the last \(m_i\). The following definition is useful for stating this result. For each pair \((m_1, m_2) \in {\mathbb {N}}_+^2\), define

$$\begin{aligned} A(m_1, m_2) \equiv \{ (a^1, \ldots , a^M) ~|~ a_i^n = D\ \text {for each} \ i\ \text {and each}\ n > M-m_i \}. \end{aligned}$$

When \((m_1, m_2) \in \{1, \ldots , M\}^2\), the set \(A(m_1, m_2)\) is the set of paths on which each player i defects in the last \(m_i\) rounds.

Theorem 1

There exists an epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\) that satisfies the following properties:

  1. (i)

    for each pair \((m_1, m_2) \in {\mathbb {N}}_+^2\) and each \(\varvec{a} \in A(m_1, m_2)\), there is a state \((s_1, t_1, s_2, t_2) \in R_1^{m_1} \times R_2^{m_2}\) such that \(\xi (s_1,s_2) = \varvec{a}\),

  2. (ii)

    for each pair \((m_1, m_2) \in {\mathbb {N}}_+^2\) and each \((s_1, t_1, s_2, t_2) \in R_1^{m_1} \times R_2^{m_2}\), we have \(\xi (s_1,s_2) \in A(m_1, m_2)\).

Although Theorem 1 characterizes the set of outcomes consistent with \(R_1^{m_1} \times R_2^{m_2}\) for any pair \((m_1, m_2) \in {\mathbb {N}}_+^2\), we are particularly interested in how cooperative behaviors arise at \(R_1^{m_1} \times R_2^{m_2}\) for \((m_1, m_2) \in \{1, \ldots , M\}^2\). We devote most of this section to constructing the epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\) and addressing this question. Before doing that, we present a corollary that follows from Proposition 1 and Theorem 1.

Corollary 2

  1. (a)

    The epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\) has a state \((s^*_1, t^*_1, s^*_2, t^*_2)\) such that \((s^*_1, t^*_1, s^*_2, t^*_2) \in R^{M-1}\) and \(\xi ^1(s^*_1, s^*_2) = CC\).

  2. (b)

    Fix an epistemic game \(({\mathcal {P}}, {\mathcal {T}})\) and a state \((s_1, t_1, s_2, t_2)\) thereof. If \((s_1, t_1, s_2, t_2) \in R^{M-1}\) and \(\xi ^n(s_1, s_2) = CC\) for some n, then \(n = 1\) and \(\xi ^l(s_1, s_2) = DD\) for each \(l = 2, \ldots , M\).

Part (a) follows from Theorem 1. The path \(\varvec{a^*}\) that has bilateral cooperation in round one and bilateral defections in all other rounds belongs to the set \(A(M - 1, M - 1)\); hence, there is a state \((s^*_1, t^*_1, s^*_2, t^*_2)\) of \(({\mathcal {P}}, {\mathcal {T}}^*)\) such that \((s^*_1, t^*_1, s^*_2, t^*_2) \in R^{M-1}\) and the strategy profile \((s^*_1, s^*_2)\) induces \(\varvec{a^*}\). Part (b) follows from Proposition 1. In any epistemic game, at \(R^{M-1}\), players defect in the last \(M-1\) rounds on path. Hence, if bilateral cooperation occurs at \(R^{M-1}\), then it must occur in round one. Informally, Corollary 2 states that bilateral cooperation might occur at \(R^{M-1}\); in addition, if bilateral cooperation occurs at \(R^{M-1}\), then it must occur in round one and be followed by bilateral defections in all subsequent rounds.

We construct the epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\) as follows. For each player i, let \(\tilde{A}_i\) be a collection of pairs \((m_i, \varvec{a}) \in {\mathbb {N}} \times A^M\) such that \(a_i^n = D\) for each \(n > M - m_i\). We shall define a pair of mappings \((f_i, g_i)\) that maps each point in \(\tilde{A}_i\) to a pair of strategy and epistemic type for player i. At each history, each epistemic type of player i assigns probability one to some point in \(\tilde{A}_{-i}\), which is mapped to a strategy-type pair by \((f_{-i}, g_{-i})\). Our construction ensures that for each pair \((m_1, m_2) \in {\mathbb {N}}_+^2\) and each path \(\varvec{a} \in A(m_1, m_2)\), the state \([f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})]_{i=1,2}\) belongs to \(R_1^{m_1} \times R_2^{m_2}\) and the strategy profile \([f_i(m_i, \varvec{a})]_{i=1,2}\) induces \(\varvec{a}\).

5.1 Strategies

Fix \((m_i, \varvec{a}) \in \tilde{A}_i\). We define strategy \(f_i (m_i, \varvec{a})\) below. At each history \(h^n = (a^0, \ldots , a^{n-1})\) (equivalently, \(h^n\) is consistent with \(\varvec{a}\)), let \(f_i(m_i, \varvec{a})(h^n)= a_i^n\). At each history \(h^n\) that is not consistent with \(\varvec{a}\), let \(f_i(m_i, \varvec{a})(h^n)= C\) if and only if the following hold: (a) \(n \le M - m_i\), (b) at round \(n^\prime \) when the first deviation from \(\varvec{a}\) occurs, player \(-i\) cooperates whereas \(a_{-i}^{n^\prime }\) specifies ‘defect’, and both players cooperate from round \(n^\prime + 1\) onwards (formally, if \(h^n = (\bar{a}^0, \ldots , \bar{a}^{n-1})\), then \(n^\prime = \min \{ n^{\prime \prime } ~|~ \bar{a}^{n^{\prime \prime }} \not = a^{n^{\prime \prime }} \}\), \(\bar{a}_{-i}^{n^\prime } = C\), and \(\bar{a}^{n^{\prime \prime }} = CC\) for each \(n^{\prime \prime } \ge n^\prime + 1\)).

Remark 3

We note that the strategy \(f_i (m_i, \varvec{a})\) specifies ‘defect’ at every nth-round history with \(n > M - m_i\). This ‘defection’ phase starts earlier if \(m_i\) increases. In the next section, if player i is characterized by \((m_i, \varvec{a})\), we shall say player i has level \(m_i\).

5.2 Beliefs

In this section, for each \((m_i, \varvec{a}) \in \tilde{A}_i\), we describe the beliefs of its corresponding epistemic type \(g_i (m_i, \varvec{a})\). At each history \(h \in H\), the epistemic type \(g_i (m_i, \varvec{a})\) assigns probability one to a point \(\eta _{-i} (m_i, \varvec{a}, h) \in \tilde{A}_{-i}\), which is mapped to a strategy-type pair by \((f_{-i}, g_{-i})\). For convenience, we decompose \(\eta _{-i} (\cdot )\) into \(\eta ^1_{-i} (\cdot ) \in {\mathbb {N}}\) and \(\eta ^2_{-i} (\cdot ) \in A^M\).

Suppose \(h = (a^0, \ldots , a^n)\) for some \(n \le M - m_i\). A level-0 player believes her opponent has level 0, whereas a player with level \(m_i \ge 1\) believes her opponent has level \(m_i - 1\). Formally, \(\eta ^1_{-i}(m_i,\varvec{a}, h) = \mathrm{max} \{0, m_i - 1\}\). To define \(\eta ^2_{-i}(m_i,\varvec{a}, h)\), the following definition of \(\tilde{m}_i\) is necessary: if \(a^{n^\prime }_{-i} = D\) for some \(n^\prime \in \{ n + 1, \ldots , M - m_i + 1 \}\), let \(\tilde{m}_i \equiv \mathrm{min} \{ n^\prime \in \{ n + 1, \ldots , M - m_i + 1 \} ~|~ a^{n^\prime }_{-i} = D \}\); otherwise, let \(\tilde{m}_i \equiv M - m_i + 1\). Let \(\eta ^2_{-i}(m_i,\varvec{a}, h) = (h, \bar{a}^{n+1}, ..., \bar{a}^M)\), where \(\bar{a}^l = a^l\) if \(l < \tilde{m}_i\), \(\bar{a}^{\tilde{m}_i} = a_i^{\tilde{m}_i} C\), \(\bar{a}^l = CC\) if \(\tilde{m}_i< l < M - m_i + 1\), \(\bar{a}^{M - m_i + 1} = DC\), and \(\bar{a}^l = DD\) if \(l > M - m_i + 1\). As defined in Sect. 5.1, the strategy \(s_{-i} \equiv f_{-i} [\eta _{-i}(m_i,\varvec{a}, h)]\) satisfies the following properties. First, \(s_{-i}\) defects at every \(n^\prime \)th-round history with \(n^\prime > M - m_i + 1\). Second, \(s_{-i}\) cooperates at h (round \(n + 1\)). Third, at any round \(n^\prime \in \{ n + 2, \ldots , M - m_i + 1 \}\), if both players have cooperated continually since round \(n+1\), then \(s_{-i}\) cooperates in round \(n^\prime \). Fourth, depending on \(\varvec{a}\), there might be some round \(n^\prime \in \{ n + 2, \ldots , M - m_i + 1 \}\) in which \(s_{-i}\) cooperates after player i has just defected; in this case, we say that player \(-i\) ‘forgives’ her opponent’s past defection. For instance, by construction, if \(a_i^l = D\) for some \(l \in [n, \tilde{m}_i)\), then \(s_{-i} (a^1, \ldots , a^l) = C\).

Suppose \(h = (a^0, \ldots , a^n)\) for some \(n > M - m_i\). If \(h = (a^0)\), then let \(\eta _{-i}(m_i,\varvec{a}, h) = (m_i - 1, DD, \ldots , DD)\). If \(a_{-i}^n = C\), then let \(\eta _{-i}(m_i,\varvec{a}, h) = (M - n, a^1, \ldots , a^n, DD, \ldots , DD)\). If \(a_{-i}^n = D\), then we let \(\tilde{n} \equiv \mathrm{min} \{ n^\prime \in \{ 1, \ldots , n \} ~|~ (a^{n^\prime }_{-i}, \ldots , a^n_{-i}) = (D, \ldots , D) \}\) be the round in which player \(-i\) starts defecting continually and construct \(\eta _{-i}(m_i,\varvec{a}, h)\) as follows. If \(\tilde{n} = 1\), the history h is consistent with the behavior of a player \(-i\) who has level \(\max \{M, m_i - 1\}\) (see Remark 3); hence, let \(\eta _{-i}(m_i,\varvec{a}, h) = (\max \{M, m_i - 1\}, DD, \ldots , DD)\). If \(\tilde{n} > 1\), the history h is consistent with the behavior of a player \(-i\) who has level \(M - \tilde{n} + 1\) (see Remark 3); hence, let \(\eta _{-i}(m_i,\varvec{a}, h) = (M - \tilde{n} + 1, a^0, \ldots , a^{\tilde{n} - 1}, DD, \ldots , DD)\). In each of these four cases, the strategy \(f_{-i}[\eta _{-i}(m_i,\varvec{a}, h)]\) defects at every \(n^\prime \)th-round history with \(n^\prime > n\).

Suppose \(h = (\bar{a}^1, \ldots , \bar{a}^n)\) is inconsistent with \(\varvec{a}\). We define the longest common predecessor (namely, \(h^*\)) of h and \(\varvec{a}\) as follows. Let \(n^\prime = \min \{ n^{\prime \prime } ~|~ \bar{a}^{n^{\prime \prime }} \not = a^{n^{\prime \prime }} \}\) be the point at which h and \(\varvec{a}\) start to diverge. If \(\bar{a}_{-i}^{n^{\prime }} \not = a_{-i}^{n^{\prime }}\) (player \(-i\) deviates in round \(n^\prime \)), let \(h^* \equiv (a^1, \ldots , a^{n^{\prime } - 1})\); otherwise, let \(h^* \equiv (a^1, \ldots , a^{n^{\prime }})\). At history \(h^*\), epistemic type \(g_i (m_i, \varvec{a})\) believes her opponent is \(\eta _{-i}(m_i, \varvec{a}, h^*)\) (constructed above). If history h is allowed by strategy \(f_{-i}[\eta _{-i}(m_i, \varvec{a}, h^*)]\), then \(g_i (m_i, \varvec{a})\) believes her opponent is \(\eta _{-i}(m_i, \varvec{a}, h^*)\) conditional on h (as required by Bayes’ rule). Formally, \(\eta _{-i}(m_i,\varvec{a}, h) = \eta _{-i}(m_i,\varvec{a}, h^*)\). If history h is not allowed by strategy \(f_{-i}[\eta _{-i}(m_i, \varvec{a}, h^*)]\), then we construct \(\eta _{-i}(m_i,\varvec{a}, h)\) as follows: if \(\bar{a}_{-i}^n = C\), let \(\eta _{-i}(m_i,\varvec{a}, h) = (M - n, h, DD, \ldots , DD)\); otherwise, let \(\tilde{n}\) be the round in which player \(-i\) starts defecting continually and let \(\eta _{-i}(m_i,\varvec{a}, h) = (M - \tilde{n} + 1, \bar{a}^0, \ldots , \bar{a}^{\tilde{n} - 1}, DD, \ldots , DD)\).

We have completed our definition of \(\eta _{-i}: \tilde{A}_i \times H \rightarrow \tilde{A}_{-i}\). The type structure \({\mathcal {T}}^* \equiv ({\mathcal {P}}; (T_i, {\mathcal {E}}_i, \beta _i)_{i=1,2})\) is constructed as follows. The epistemic type set for each player i is \(T_i = \{g_i(m_i, \varvec{a}) ~|~ (m_i, \varvec{a}) \in \tilde{A}_i \}\). The belief of epistemic type \(g_i(m_i, \varvec{a})\) at history h is

$$\begin{aligned} \beta _{i,h}[g_i(m_i, \varvec{a})] \Big [ f_{-i}[\eta _{-i} (m_i, \varvec{a},h)], g_{-i}[\eta _{-i}(m_i, \varvec{a},h)] \Big ] = 1. \end{aligned}$$

In Appendix C, we show \(\beta _i [g_i(m_i, \varvec{a})]\) is a conditional probability system.

5.3 Rationality

In Appendix C, we show that a strategy-type pair \([f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})]\) is rational for each \(m_i \in {\mathbb {N}}_+\). We sketch out the proof below.

At every nth-round history with \(n \le M - m_i + 1\), type \(g_i(m_i, \varvec{a})\) believes that her opponent will defect at every \(n^\prime \)th-round history with \(n^\prime \ge M - m_i + 2\). At every nth-round history with \(n > M - m_i + 1\), type \(g_i(m_i, \varvec{a})\) believes that her opponent will defect at every \(n^\prime \)th-round history with \(n^\prime \ge n\). Hence, it is optimal for type \(g_i(m_i, \varvec{a})\) to defect at every nth-round history with \(n \ge M - m_i + 1\) as specified by strategy \(f_i(m_i, \varvec{a})\).

Fix a history \(h = (a^0, \ldots , a^{n-1})\) such that \(n < M - m_i + 1\) and \(a_i^n = C\). As h is consistent with \(\varvec{a}\), we have \(f_i(m_i, \varvec{a})(h) = a_i^n = C\) (by construction). At h, type \(g_i(m_i, \varvec{a})\) believes that playing \(f_i(m_i, \varvec{a})\) will lead to a continuation path that consists of only CC and DC until round \(M - m_i + 1\), whereas playing D at h will induce player \(-i\) to defect at every \(n^\prime \)th-round history with \(n^\prime > n\). It is easy to show that rationality requires type \(g_i(m_i, \varvec{a})\) to play C at h. If player i plays C after her opponent has just defected (\(a_{-i}^{n-1} = D\)), we say that player i ‘forgives’ her opponent’s past defection. As showed above, forgiving is optimal under conditional belief \(\beta _{i,h}[g_i(m_i, \varvec{a})]\).

Fix a history \(h = (a^0, \ldots , a^{n-1})\) such that \(n < M - m_i + 1\) and \(a_i^n = D\). As h is consistent with \(\varvec{a}\), we have \(f_i(m_i, \varvec{a})(h) = a_i^n = D\) (by construction). At h, type \(g_i(m_i, \varvec{a})\) believes that player \(-i\) will cooperate in round \(n+1\) no matter what player i does in round n (i.e., player \(-i\) will forgive player i’s past defection); in addition, player \(-i\) will cooperate until round \(M - m_i + 1\) if no one has defected since round \(n+1\). It is easy to show that rationality requires type \(g_i(m_i, \varvec{a})\) to play D at h.

Fix a history \(h = (\bar{a}^1, \ldots , \bar{a}^{n-1})\) that is inconsistent with \(\varvec{a}\) and satisfies both conditions (a)–(b) specified in Sect. 5.1. At h, type \(g_i(m_i, \varvec{a})\) believes that playing \(f_i(m_i, \varvec{a})\) will lead to a continuation path that consists of only CC until round \(M - m_i\) and DC in round \(M - m_i + 1\), whereas playing D at h will induce player \(-i\) to defect at every \(n^\prime \)th-round history with \(n^\prime > n\). It is easy to show that rationality requires type \(g_i(m_i, \varvec{a})\) to play C at h.

Fix a history \(h = (\bar{a}^1, \ldots , \bar{a}^{n-1})\) that is inconsistent with \(\varvec{a}\) and does not satisfy both conditions (a)–(b) specified in Sect. 5.1. At h, type \(g_i(m_i, \varvec{a})\) believes that her opponent will defect at every history that follows h. Hence, rationality requires type \(g_i(m_i, \varvec{a})\) to play D at h.

Remark 4

In the epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\), ‘forgiving the opponent’s past defection’ is optimal under some conditional beliefs; in addition, each player i has some epistemic types that, at some histories, assign probability one to the event that ‘player \(-i\) will forgive player i’s past defection’. The optimality of forgiving and the belief that one’s defection will be forgiven play important roles in generating the richness of the set of outcomes at \(R_1^{m_1} \times R_2^{m_2}\): any path on which each player i defects in the last \(m_i\) rounds is possible. When forgiving is optimal, a rational player cooperates after her opponent has just defected. When a player believes that her opponent will forgive her past defection, she might cooperate after she herself has just defected. If a player believes there is a phase during which her opponent plays grim trigger strategyFootnote 7 and does not forgive any past defection, then it is optimal to cooperate throughout this phase except for the last round.

5.4 Strong belief of rationality

In Appendix C, we show \([f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})] \in R_i^{m_i}\) for \(m_i \ge 2\). For notational convenience, let \(\phi _i(m_i, \varvec{a}) \equiv [f_i(m_i, \varvec{a}), g_i(m_i, \varvec{a})]\). In the following, we assume there are three rounds and show \(\phi _i(m_i, \varvec{a}) \in R_i^{m_i}\) for \(m_i \in \{ 2, 3 \}\). By construction, a strategy-type pair \(\phi _i(m_i, \varvec{a})\) with \(m_i \in \{ 2, 3 \}\) always assigns probability one to some \(\phi _{-i}(m_{-i}, \varvec{a^\prime })\) with \(m_{-i} \in \{ 1, 2 \}\). As discussed in Sect. 5.3, a strategy-type pair \(\phi _{-i}(m_{-i}, \varvec{a^\prime })\) with \(m_{-i} \in \{ 1, 2 \}\) is rational. Hence, \(\phi _i(m_i, \varvec{a}) \in R_i^2\) for \(m_i \in \{ 2, 3 \}\). It is left to show type \(g_i(3, \varvec{a})\) strongly believes \(R_{-i}^2\). By construction, type \(g_i(3, \varvec{a})\) assigns probability one to some \(\phi _{-i}(1, \varvec{a^\prime })\) if player \(-i\) has just cooperated in round two, and assigns probability one to some \(\phi _{-i}(2, \varvec{a^\prime }) \in R_{-i}^2\) at every other history. A player who satisfies \(R_{-i}^2\) believes that her opponent defects at every third-round history; thus, cooperating in round two is inconsistent with \(R_{-i}^2\). It follows that \(g_i(3, \varvec{a})\) strongly believes \(R_{-i}^2\): it assigns probability one to \(R_{-i}^2\) whenever possible.

We conclude this section by giving an example that illustrates how the optimality of forgiving and the belief that one’s defection will be forgiven generate the richness of the set of behavior outcomes.

Example 1

Assume there are five rounds. The epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\) has a state \((s_1, t_1, s_2, t_2) \in R^2\) such that the strategy profile \((s_1, s_2)\) induces the following path:

$$\begin{aligned} (DC, CD, CC, DD, DD). \end{aligned}$$

Ex-ante, player 1 believes that her opponent will cooperate in round 1 and keep cooperating until round 4 if no one has defected since round 1. In addition, player 1 believes that her unilateral defection in round 1 will be forgiven: her opponent will still cooperate in round 2 and keep cooperating until round 4 if no one has defected since round 2. As a best response, player 1 defects in round 1 and cooperates in round 2. However, this prior belief turns out to be incorrect: player 2 in fact defects in round 2 due to the belief that this defection will be forgiven. Player 1 does forgive and cooperate in round 3, believing that player 2 will respond by cooperating in round 4. However, in round 4, both players defect since both strongly believe that their opponents will defect at every fifth-round history.

6 Sufficiently rich type structure

In Sect. 5, we construct epistemic game \(({\mathcal {P}}, {\mathcal {T}}^*)\) in which, for each pair \((m_1, m_2) \in {\mathbb {N}}_+^2\), the set of outcomes consistent with \(R_1^{m_1} \times R_2^{m_2}\) is the set of paths on which each player i defects in the last \(m_i\) rounds. In this section, we use the type structure \({\mathcal {T}}^*\) to show that for any type structure that satisfies a richness condition introduced by Perea (2012), the set of outcomes consistent with \(R_1^{m_1} \times R_2^{m_2}\) is also the set of paths on which each player i defects in the last \(m_i\) rounds. The type structure \({\mathcal {T}}^*\) is sufficiently rich. An extension of \({\mathcal {T}}^*\), which is obtained by adding new epistemic types into \({\mathcal {T}}^*\), is also sufficiently rich.Footnote 8 We note that a complete type structure, which contains all beliefs, is an extension of \({\mathcal {T}}^*\). As discussed in Battigalli and Friedenberg (2012), a type structure specifies sets of possible beliefs, which might have been formed by social conventions or a history. An analyst who does not know which beliefs are possible might be interested in studying behavioral implications of an epistemic condition across different type structures. Although we focus on sufficiently rich type structures, we comment on other type structures in Sect. 7.

We formalize the richness condition introduced by Perea (2012) below. In Sect. 2.4, we fix a type structure and let \(R_i^m\) denote the set of player i’s strategy-type pairs that satisfy rationality and \((m-1)\)th-order strong belief of rationality. In this section, we examine different type structures and let \(R_i^m ({\mathcal {T}})\) denote the set of player i’s strategy-type pairs that satisfy rationality and \((m-1)\)th-order strong belief of rationality for type structure \({\mathcal {T}}\). Fix a type structure \({\mathcal {T}}\), a player i, and an order \(m \in {\mathbb {N}}\). We say that a history \(h \in H\) is consistent with \(R_i^m ({\mathcal {T}})\) if there is some strategy-type pair \((s_i, t_i) \in R_i^m ({\mathcal {T}})\) such that \(s_i\) allows h. Let \({\mathcal {H}}[R_i^m ({\mathcal {T}})]\) be the set of histories that are consistent with \(R_i^m ({\mathcal {T}})\).

Definition 4

A type structure \({\mathcal {T}}\) is sufficiently rich if for each player i, each order \(m \in {\mathbb {N}}_+\), and each type structure \({\mathcal {T}}^\prime \) such that \({\mathcal {H}}[R_j^{m^\prime } ({\mathcal {T}}^\prime )] = {\mathcal {H}}[R_j^{m^\prime } ({\mathcal {T}})]\) for each \(m^\prime < m\) and each \(j \in \{1, 2\}\), we have \({\mathcal {H}}[R_i^m ({\mathcal {T}}^\prime )] \subseteq {\mathcal {H}}[R_i^m ({\mathcal {T}})]\).

We define a sufficiently rich type structure informally below. Fix a type structure \({\mathcal {T}}\), a player i, and an order \(m \in {\mathbb {N}}_+\). Fix a type structure \({\mathcal {T}}^\prime \) such that, for each \(m^\prime < m\) and each \(j \in \{1, 2\}\), the set of histories consistent with \(R_j^{m^\prime } ({\mathcal {T}}^\prime )\) and the set of histories consistent with \(R_j^{m^\prime } ({\mathcal {T}})\) are identical. If \({\mathcal {T}}\) is sufficiently rich, then any history h consistent with \(R_i^m ({\mathcal {T}}^\prime )\) is also consistent with \(R_i^m ({\mathcal {T}})\): there is some strategy-type pair \((s_i, t_i) \in R_i^m ({\mathcal {T}})\) such that \(s_i\) allows h. Conversely, if \({\mathcal {T}}\) is sufficiently rich, then any history h inconsistent with \(R_i^m ({\mathcal {T}})\) is also inconsistent with \(R_i^m ({\mathcal {T}}^\prime )\).

The following proposition implies that if a type structure \({\mathcal {T}}\) is incomplete but sufficiently rich, then for each type structure \({\mathcal {T}}^\prime \) that is an extension of \({\mathcal {T}}\), the set of outcomes consistent with \(R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\) and the set of outcomes consistent with \(R_1^{m_1} ({\mathcal {T}}^\prime ) \times R_2^{m_2} ({\mathcal {T}}^\prime )\) are identical. We note that a complete type structure is an extension of \({\mathcal {T}}\). Thus, the concept of sufficiently rich type structure might be useful if one aims to study behavioral implications of an epistemic condition for a complete type structure but finds it more convenient to work with incomplete type structures.

Proposition 2

Fix a sufficiently rich type structure \({\mathcal {T}}\) and a type structure \({\mathcal {T}}^\prime \) that is an extension of \({\mathcal {T}}\). For each player i and each order \(m \in {\mathbb {N}}_+\), we have \(R_i^m ({\mathcal {T}}) \subseteq R_i^m ({\mathcal {T}}^\prime )\) and \({\mathcal {H}}[R_i^{m} ({\mathcal {T}})] = {\mathcal {H}}[R_i^{m} ({\mathcal {T}}^\prime )]\).

Fix \((m_1, m_2) \in {\mathbb {N}}_+^2\). Since \(R_i^m ({\mathcal {T}}) \subseteq R_i^m ({\mathcal {T}}^\prime )\) for each i and each \(m \in {\mathbb {N}}_+\), it is clear that an outcome consistent with \(R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\) is also consistent with \(R_1^{m_1} ({\mathcal {T}}^\prime ) \times R_2^{m_2} ({\mathcal {T}}^\prime )\). Conversely, let \(\varvec{a} \equiv (a^1, \ldots , a^{M-1}, DD)\) be an outcome consistent with \(R_1^{m_1} ({\mathcal {T}}^\prime ) \times R_2^{m_2} ({\mathcal {T}}^\prime )\). Denote \(h \equiv (a^1, \ldots , a^{M-1})\). Since \({\mathcal {H}}[R_i^{m} ({\mathcal {T}})] = {\mathcal {H}}[R_i^{m} ({\mathcal {T}}^\prime )]\) for each i and each \(m \in {\mathbb {N}}_+\), there exists some \((s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\) such that both \(s_1\) and \(s_2\) allow h. It is obvious that both \(s_1\) and \(s_2\) defect at h. Thus, \(\xi (s_1, s_2) = \varvec{a}\). This implies \(\varvec{a}\) is consistent with \(R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\).

If a type structure \({\mathcal {T}}\) is not sufficiently rich, then there might exist an extension \({\mathcal {T}}^\prime \) of \({\mathcal {T}}\) and an outcome that is consistent with \(R_i^m ({\mathcal {T}})\) but inconsistent with \(R_i^m ({\mathcal {T}}^\prime )\) for some i and some \(m \in {\mathbb {N}}_+\). To see how this might arise, suppose \({\mathcal {H}}[R_j^{m^\prime } ({\mathcal {T}}^\prime )] = {\mathcal {H}}[R_j^{m^\prime } ({\mathcal {T}})]\) for each \(m^\prime < m-1\) and each \(j \in \{1, 2 \}\) and there is some history \(h^*\) such that \(h^*\) is inconsistent with \(R_{-i}^{m-1} ({\mathcal {T}})\) but consistent with \(R_{-i}^{m-1} ({\mathcal {T}}^\prime )\). In addition, suppose \(R_{-i}^{m-1} ({\mathcal {T}}^\prime ) \cap S_{-i}(h^*) \times T^\prime _{-i}\) are not present in epistemic game \(({\mathcal {P}}, {\mathcal {T}})\). Then a strategy-type pair \((s_i, t_i) \in R_i^{m} ({\mathcal {T}})\) must assign probability zero to \(R_{-i}^{m-1} ({\mathcal {T}}^\prime )\) at \(h^*\), which implies \((s_i, t_i) \not \in R_i^{m} ({\mathcal {T}}^\prime )\). Consequently, there is a behavior outcome that is consistent with \(R_i^{m} ({\mathcal {T}})\) but inconsistent with \(R_i^{m} ({\mathcal {T}}^\prime )\). For the battle of the sexes with an outside option, Battigalli and Siniscalchi (2002) present a type structure \({\mathcal {T}}\) that is not sufficiently rich and a type structure \({\mathcal {T}}^\prime \) that is an extension of \({\mathcal {T}}\). They show there is a behavior outcome that is consistent with \(R^{\infty } ({\mathcal {T}})\) but inconsistent with \(R^{\infty } ({\mathcal {T}}^\prime )\). For other examples, see Perea (2012).

The following theorem claims that for any sufficiently rich type structure \({\mathcal {T}}\) and any pair \((m_1, m_2) \in {\mathbb {N}}_+^2\), the set of outcomes consistent with \(R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\) is the set of paths on which each player i defects in the last \(m_i\) rounds.

Theorem 2

Fix a sufficiently rich type structure \({\mathcal {T}}\) and a pair \((m_1, m_2) \in {\mathbb {N}}_+^2\). For each \(\varvec{a} \in A(m_1, m_2)\), there exists a state \((s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\) such that \(\xi (s_1, s_2) = \varvec{a}\). Conversely, for each \((s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\), we have \(\xi (s_1, s_2) \in A(m_1, m_2)\).

Proof

For each i and each \(m_i \in {\mathbb {N}}_+\), define

$$\begin{aligned} {\mathbb {H}}(i, m_i)= & {} \{ (a^0, \ldots , a^n) \in H |\\&\quad \text {if}\ n \ge M - m_i + 1\ \text {then}\ a^{n^\prime }_i = D\ \text {for each} \ n^\prime \ge M - m_i + 1 \}. \end{aligned}$$

First, we show that \({\mathbb {H}}(i, m_i)\) is the set of histories consistent with \(R_i^{m_i} ({\mathcal {T}}^*)\), where \({\mathcal {T}}^*\) is the type structure constructed in Sect. 5. By construction, for each \((s_i, t_i) \in R_i^{m_i} ({\mathcal {T}}^*)\), the strategy \(s_i\) defects at every nth-round history with \(n \ge M - m_i + 1\). Hence, if a history h is consistent with \(R_i^{m_i} ({\mathcal {T}}^*)\), then \(h \in {\mathbb {H}}(i, m_i)\). In the following, we show that any history \(h \in {\mathbb {H}}(i, m_i)\) is consistent with \(R_i^{m_i} ({\mathcal {T}}^*)\). Fix any \(h \in {\mathbb {H}}(i, m_i)\). Note that there exists a path \(\varvec{a} \in A (m_i, 1)\) such that h is a subsequence of \(\varvec{a}\). By Theorem 1, there is a state \((s_i, t_i, s_{-i}, t_{-i})\) of \(({\mathcal {P}}, {\mathcal {T}}^*)\) such that \((s_i, t_i, s_{-i}, t_{-i}) \in R_i^{m_i} ({\mathcal {T}}^*) \times R_{-i}^1 ({\mathcal {T}}^*)\) and the strategy profile \((s_i,s_{-i})\) induces \(\varvec{a}\). This implies \((s_i, t_i) \in R_i^{m_i} ({\mathcal {T}}^*)\) and \(s_i\) allows h. Hence, h is consistent with \(R_i^{m_i} ({\mathcal {T}}^*)\).

Next, we show that \({\mathbb {H}}(i, m_i)\) is the set of histories consistent with \(R_i^{m_i} ({\mathcal {T}})\). The proof is by induction.

Step 1. For each player i and each strategy-type pair \((s_i, t_i) \in R_i^1({\mathcal {T}})\), it is obvious that \(s_i\) defects at every last-round history. Hence, if a history h is consistent with \(R_i^1({\mathcal {T}})\), then \(h \in {\mathbb {H}}(i, 1)\). For each \(j \in \{1, 2\}\), the set of histories consistent with \(R^0_j ({\mathcal {T}})\) and the set of histories consistent with \(R^0_j ({\mathcal {T}}^*)\) are identical (they both are the set of non-terminal histories). Since \({\mathcal {T}}\) is sufficiently rich, for each player i, any history consistent with \(R^1_i ({\mathcal {T}}^*)\) is also consistent with \(R^1_i ({\mathcal {T}})\), which implies any \(h \in {\mathbb {H}}(i, 1)\) is consistent with \(R^1_i ({\mathcal {T}})\). Consequently, \({\mathbb {H}}(i, 1)\) is the set of histories consistent with \(R_i^1 ({\mathcal {T}})\).

Step 2. Fix a player i, a strategy-type pair \((s_i, t_i) \in R_i^2({\mathcal {T}})\), and a history \(h \in A^{M-2}\). Since h is consistent with \(R_{-i}^1({\mathcal {T}})\) and any \((s_{-i}, t_{-i}) \in R_{-i}^1({\mathcal {T}})\) has \(s_{-i}\) defect at every last-round history, we have \(s_i(h) = D\). Hence, if a history h is consistent with \(R_i^2({\mathcal {T}})\), then \(h \in {\mathbb {H}}(i, 2)\). It follows from Step 1 that for each \(m^\prime < 2\) each \(j \in \{1, 2\}\), the set of histories consistent with \(R^{m^\prime }_j ({\mathcal {T}})\) and the set of histories consistent with \(R^{m^\prime }_j ({\mathcal {T}}^*)\) are identical. Since \({\mathcal {T}}\) is sufficiently rich, for each player i, any history consistent with \(R^2_i ({\mathcal {T}}^*)\) is also consistent with \(R^2_i ({\mathcal {T}})\), which implies any \(h \in {\mathbb {H}}(i, 2)\) is consistent with \(R^2_i ({\mathcal {T}})\). Consequently, \({\mathbb {H}}(i, 2)\) is the set of histories consistent with \(R_i^2 ({\mathcal {T}})\).

And so on.

Fix a pair \((m_1, m_2) \in {\mathbb {N}}_+^2\). As showed above, for each player i, any strategy-type pair \((s_i, t_i) \in R_i^{m_i}({\mathcal {T}})\) has \(s_i\) defects at every nth-round history with \(n \ge M - m_i + 1\). Hence any outcome consistent with \(R_1^{m_1}({\mathcal {T}}) \times R_2^{m_2}({\mathcal {T}})\) has each player i defect in the last \(m_i\) rounds. Conversely, fix a path \(\varvec{a} \in A(m_1, m_2)\) on which each player i defects in the last \(m_i\) rounds. The history \((a^1, \ldots , a^{M-1})\) is in both \({\mathbb {H}}(1, m_1)\) and \({\mathbb {H}}(2, m_2)\); hence, it is consistent with both \(R_1^{m_1}({\mathcal {T}})\) and \(R_2^{m_2}({\mathcal {T}})\). This implies there is state \((s_1, t_1, s_2, t_2) \in R_1^{m_1}({\mathcal {T}}) \times R_2^{m_2}({\mathcal {T}})\) such that both \(s_1\) and \(s_2\) allow \((a^1, \ldots , a^{M-1})\). We note that \(a^M = DD\) and \(s_i\) defects at every last-round history for each i. It follows that the strategy profile \((s_1, s_2)\) induces \(\varvec{a}\). Hence, \(\varvec{a}\) is consistent with \(R_1^{m_1}({\mathcal {T}}) \times R_2^{m_2}({\mathcal {T}})\). \(\square \)

For Theorem 2, the assumption that the type structure is sufficiently rich is important. If \({\mathcal {T}}\) is not sufficiently rich, then there might be some history \(h \in {\mathbb {H}}(i,m_i)\) that is inconsistent with \(R_i^{m_i}({\mathcal {T}})\); equivalently, for player i, there is no belief that satisfies \((m_i - 1)\)th order strong belief of rationality and rationalizes h. Consequently, any path that passes through h is impossible at \(R_i^{m_i}({\mathcal {T}}) \times R_{-i}^{m_{-i}}({\mathcal {T}})\). This implies that some path on which each player \(j \in \{i, -i\}\) defects in the last \(m_j\) rounds is impossible at \(R_i^{m_i}({\mathcal {T}}) \times R_{-i}^{m_{-i}}({\mathcal {T}})\). In addition, if \({\mathcal {T}}\) is not sufficiently rich, then some path on which some player i cooperates in some round \(n \ge M - m_i + 1\) might be possible at \(R_1^{m_1}({\mathcal {T}}) \times R_2^{m_2}({\mathcal {T}})\). For instance, in Example 2 (Appendix E), for a four-round Prisoner’s Dilemma, we construct an insufficiently rich type structure \({\mathcal {T}}\) such that: at some state \((s_1^3, t_1^3, s_2^1, t_2^1) \in R_1^3({\mathcal {T}}) \times R_2^1({\mathcal {T}})\), the path of play is \(\xi (s_1^3, s_2^1) = (DC, CC, DC, DD)\), on which player 1 cooperates in round 2. In this epistemic game, for player 2, there is no epistemic type that satisfies first order strong belief of rationality and rationalizes history (DC) [equivalently, there is no strategy-type pair \((s_2, t_2) \in R_2^2({\mathcal {T}})\) such that \(s_2 (h^1) = C\)]. Ex ante, type \(t_1^3\) believes that she is facing some strategy-type pair \((s^2_2, t^2_2) \in R_2^2({\mathcal {T}})\), where \(s_2^2 (h^1) = D\). At history (DC) [that is inconsistent with \(R_2^2({\mathcal {T}})\)], type \(t_1^3\) assigns probability one to \((s_2^1, t_2^1) \in R_2^1({\mathcal {T}})\), where \(s_2^1\) cooperates at (DC), and cooperates in round 3 only if player 1 cooperates at (DC). With this belief, cooperating at (DC) is optimal for type \(t_i^3\). On the contrary, in an epistemic game with a sufficiently rich type structure \({\mathcal {T}}^\prime \), history (DC) is consistent with \(R_2^2({\mathcal {T}}^\prime )\); thus, at (DC), type \(t_i^3\) that strongly believes \(R_2^2({\mathcal {T}}^\prime )\) must assign probability one to \(R_2^2({\mathcal {T}}^\prime )\); with this belief, defecting at (DC) is optimal for type \(t_i^3\).

It is easy to show that the type structure \({\mathcal {T}}^*\) and its extensions are sufficiently rich (the proof is in Appendix D). We note that a complete type structure is an extension of \({\mathcal {T}}^*\). Theorem 2 implies that for all these type structures, the set of outcomes consistent with \(R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\) is the set of paths on which each player i defects in the last \(m_i\) rounds.

7 Discussion

7.1 Insufficiently rich type structures

In Example 2 (Appendix E), we assume the Finitely Repeated Prisoner’s Dilemma has 4 rounds, and present an insufficiently rich type structure \({\mathcal {T}}\) such that: at some state in \(R_1^3({\mathcal {T}}) \times R_2^1({\mathcal {T}})\), player 1 cooperates in round 2 on path. We conjecture that: for each pair \((m_1, m_2) \in \{1, \ldots , M - 1\}^2\) with \(m_1 > m_2\), and each path \(\varvec{a}\) on which player 1 defects in the last \(m_2 + 1\) rounds and player 2 defects in the last \(m_2\) rounds, there exists an insufficiently rich type structure \({\mathcal {T}}\) and a state \((s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\) such that \(\xi (s_1, s_2) = \varvec{a}\). If this conjecture is correct, then we can characterize the set of outcomes that can arise when each player i satisfies rationality and \((m_i - 1)\)th order strong belief of rationality across all type structures [an outcome \(\varvec{a}\) belongs to this set if and only if there exists a type structure \({\mathcal {T}}\) and a state \((s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\) such that \(\xi (s_1, s_2) = \varvec{a}\)]. We elaborate on this below.

For each pair \((m_1, m_2) \in \{1, \ldots , M - 1\}^2\) such that \(m_1 = m_2\), it follows from Proposition 1 and Theorem 1 that the set of outcomes that can arise when each player i satisfies rationality and \((m_i - 1)\)th order strong belief of rationality across all type structures is the set of paths on which each player i defects in the last \(m_i\) rounds.

In the following, we fix some \((m_1, m_2) \in \{1, \ldots , M -1\}^2\) with \(m_1 > m_2\). As stated in Remark 5 (Appendix A), for each type structure \({\mathcal {T}}\) and each state \((s_1, t_1, s_2, t_2) \in R_1^{m_1} ({\mathcal {T}}) \times R_2^{m_2} ({\mathcal {T}})\), player 1 defects in the last \(m_2 + 1\) rounds and player 2 defects in the last \(m_2\) rounds on the path \(\xi (s_1, s_2)\). If the aforementioned conjecture is correct, then the set of outcomes that can arise when each player i satisfies rationality and \((m_i - 1)\)th order strong belief of rationality across all type structures is the set of paths on which player 1 defects in the last \(m_2 + 1\) rounds and player 2 defects in the last \(m_2\) rounds.

7.2 Monotonicity

It is well-known that strong belief fails monotonicity; equivalently, \(E \subseteq F\) does not imply \(SB_i (E) \subseteq SB_i (F)\). In Example 3 (Appendix E), we show how strong belief fails monotonicity for the Finitely Repeated Prisoner’s Dilemma. In particular, we present two type structures \({\mathcal {T}}\) and \({\mathcal {T}}^\prime \) such that \(R_{-i}^1 ({\mathcal {T}}) \subseteq R_{-i}^1 ({\mathcal {T}}^\prime )\) but \(SB_i[R_{-i}^1 ({\mathcal {T}})] \not \subseteq SB_i [R_{-i}^1 ({\mathcal {T}}^\prime )]\). In epistemic game \(({\mathcal {P}}, {\mathcal {T}})\), there is some \((s_i^2, t_i^2) \in SB_i[R_{-i}^1 ({\mathcal {T}})]\), where type \(t_i^2\) assigns probability one to an irrational strategy-type pair at history (DCCD) as this history is inconsistent with \(R_{-i}^1 ({\mathcal {T}})\). In epistemic game \(({\mathcal {P}}, {\mathcal {T}}^\prime )\), history (DCCD) is consistent with \(R_{-i}^1 ({\mathcal {T}}^\prime )\); as type \(t_i^2\) fails to assign probability one to \(R_{-i}^1 ({\mathcal {T}}^\prime )\) whenever possible, we have \((s_i^2, t_i^2) \not \in SB_i[R_{-i}^1 ({\mathcal {T}}^\prime )]\).