1 Introduction

Epistemic game theory deals with the reasoning processes of an individual about his opponents before he makes a decision. This requires a belief about the choices of his opponents, but also a belief about the opponents’ beliefs about their opponents’ choices, and so on.

Such reasoning processes have been studied thoroughly in the framework of static games, in various forms of the concept of common belief in rationality. However, the extension of these concepts to the framework of dynamic games is not entirely trivial. One possible way to extend the idea of common belief in rationality would require that the players believe their opponents make only rational choices, in particular that past choices have been rational. However, in many cases this is not possible, since there may be stages in the game where players have to conclude that an opponent has chosen irrationally in the past.

To solve this problem some alternative concepts have been proposed. Battigalli and Siniscalchi (2002) propose the concept of common strong belief in rationality, in which players, whenever possible, must believe that their opponents are implementing rational strategies. Perea (2014) proposed the concept of common belief in future rationality, in which at each decision point a player must believe that all players are rational in the present and in the future, but allows players to believe that irrational choices have been made in the past. This concept is similar to sequential rationalizability, proposed by Dekel et al. (1999; 2002), and Asheim and Perea (2005).

Reny (1992, 1993) studies the idea of common belief in past and future rationality at all information sets, coming to the conclusion that in most games, it is not possible to reason under such concept.

However, taking as a starting point the concept of common belief in future rationality in which we allow players to believe that past choices were irrational, we consider a concept in which a restricted notion of belief in past rationality is assumed. The example presented in Fig. 1 will be used to illustrate the concepts that are being discussed, while it also serves as one of the motivations for developing a new rationality concept.

Fig. 1
figure 1

Example of a dynamic game

The key idea in the new concept we propose is that a player does not only believe that his opponents choose rationally in the future, but also that the decisions made in the past were rational among a restricted set of choices. In Fig. 1 we can see that at \(\varnothing \) the optimal choice for player 1 is c. However, if the game were to reach \(h_{1}\), player 2 must believe that a suboptimal choice was made at \(\varnothing \). Under the concept of common belief in future rationality player 2 can assume either a or b was chosen at \(\varnothing \), as there is no restriction on the beliefs about choices made in the past. We propose that player 2 should reason about the choice made at \(\varnothing \) by considering only those choices that reach \(h_{1}\) and from those find which are optimal: in this case, we can see that a is the best choice for player 1 from those that reach \(h_{1}\) assuming he would choose f afterwards. Hence, under the new concept, player 2 must believe at \(h_{1}\) that player 1 chose a in the past.

The concept proposed here, which we call “common belief in future and restricted past rationality” is a refinement of common belief in future rationality. The difference is that common belief in future rationality does not reason about the choices made in the past, while the addition of “restricted past rationality” makes players consider the subset of past choices that reach an information set and find the optimal choice in this subset. In other terms, the important factor in what is proposed is that a player, at an information set h, may still believe that the opponents chose irrationally in the past, but if that is the case, he must believe that the opponents chose the “least irrational” strategies among the strategies that reach h. A key feature of our new concept is that belief in the opponents restricted past rationality is always possible, whereas belief in the opponents’ (unrestricted) past rationality is only useful when studying strategies that fall on the path of those that are rational, while disregarding the rest of the game.

Since we are ranking the possible mistakes, we can tell there should be a connection between the concepts of proper rationalizability, proposed by Schuhmacher (1999) and Asheim (2001), for the normal form of a dynamic game and common belief in future and restricted past rationality for a dynamic game. In Theorem 1 it is shown that properly rationalizable strategies in the normal form can rationally be chosen under common belief in future and restricted past rationality. And since we know that there are properly rationalizable strategies for every finite normal form game, then we have that there are strategies that can rationally be chosen under common belief in future and restricted past rationality for every finite dynamic game. This shows that common belief in future and restricted past rationality has plausible restrictions which, when studying the normal form, allows players to make reasonable decisions in the dynamic game. In addition we propose an algorithm for this concept, and we show that it delivers exactly the strategies that can rationally be chosen under common belief in future and restricted past rationality.

Note that the example presented in Fig. 1 has unobserved past choices. This is intentional, because we can easily show that for games with observable past choices in which we possibly allow for simultaneous moves, common belief in future rationality and common belief in future and restricted past rationality are equivalent. To see this, note that under observable choices, every information set for every player is a singleton which implies that each player knows exactly what the previous choices were. Since this happens, players do not have to reason about what possible choices were made earlier, since such choices are already given and known to everyone, reducing the reasoning only to future choices. This also shows that the algorithm presented here and the backward dominance procedure proposed in Perea (2014) coincide when the games have observable past choices, as the description of the second set \(\hat{S}_{-i}^{k}(h)\) in Step k of Algorithm 1 reduces to a description that is equivalent to the sets \(\Gamma ^{k}(h)\) in the inductive step of the backward dominance procedure.

For one-shot games, common belief in future rationality and common belief in future and restricted past rationality both coincide with common belief in rationality, and hence are equivalent. Since it is well known that common belief in rationality is weaker than proper rationalizability, there are one-shot games in which a strategy can rationally be chosen under common belief in future and restricted past rationality, but that same strategy is not properly rationalizable in the normal form. Hence, the converse of Theorem 1 does not hold. Such a game is presented in Fig. 2, which does not have generic payoffs. In this game, strategies b and d can rationally be chosen under common belief in future and restricted past rationality, but b and d are not properly rationalizable. We conjecture that for generic payoffs, proper rationalizability is equivalent to common belief in future and restricted past rationality.

Fig. 2
figure 2

Counterexample to the converse of Theorem 1

It was shown by Asheim (2001) that every choice that has positive probability in some proper equilibrium is optimal for some properly rationalizable type. Therefore proper rationalizability can be seen as the non-equilibrium analogue to proper equilibria. van Damme (1984) proves that for every dynamic game, the proper equilibria of its normal form induce quasi-perfect equilibria of the dynamic game. In this way he shows that it is possible to reason about a dynamic game in terms of the normal form and obtain equilibria of the dynamic game by looking at the normal form only. This is precisely one of the driving ideas behind the present paper, in which the concept of proper rationalizability, which is less restrictive than proper equilibrium, is linked to a concept for dynamic games that, in contrast to common belief in future rationality, takes into account a restricted version of rationality in the past. Also in contrast to strong belief in rationality, it makes players reason about the optimality of choices at every information set, even if an information set can only be reached by past choices that are suboptimal.

Another motivation for proposing a new reasoning concept is that common belief in future rationality only reasons about what can happen from a certain point in time onwards, without caring about how the game got to this instance. On the other hand, strong belief in rationality may impose no restrictions at information sets that can only be reached by irrational past choices. Our concept, in contrast, may impose restrictions even in such situations.

Indeed, if the game reaches a certain information h, the player that has to choose knows the game could reach such information set only if his opponents have previously made choices that can reach h. Therefore, according to our concept, this player believes his opponents chose the most plausible among the choices that actually reach h. That is, he concentrates on those opponents’ strategies that reach h, that are optimal at all future information sets, and that are optimal at all past information sets among strategies that reach h.

The structure of the paper is as follows. In Sect. 2 we discuss a few examples that highlight some properties of our new concept. In Sect. 3 we introduce dynamic games. In Sect. 4 we present the concept of proper rationalizability for the normal form of a dynamic game. In Sect. 5 we introduce the notion of common belief in future and restricted past rationality for a dynamic game. In Sect. 6 both of these rationalizability concepts are connected, by showing that the strategies that are proper rationalizable can also be chosen under common belief in future and restricted past rationality. In Sect. 7 we describe an algorithm and show that it yields precisely those strategies that can be chosen under common belief in future and restricted past rationality. Section 8 has some concluding remarks and Sect. 9 contains all the proofs of this paper.

2 Examples

We present some examples that show some properties of the concept of common belief in future and restricted past rationality that differentiate it from previously known concepts for dynamic games.

The example in Fig. 1 has shown that common belief in future and restricted past rationality can be more restrictive than common belief in future rationality in terms of strategies. The example presented in Fig. 3 shows that it can also be more restrictive in terms of outcomes.

Fig. 3
figure 3

An example where common belief in future and restricted past rationality is more restrictive, in terms of outcomes, than common belief in future rationality

Under common belief in future rationality, player 2 can rationally choose d or e, and hence player 1 can rationally choose a or c. Therefore, common belief in future rationality allows for the outcomes (ad), (ae) and c. However, since a is better than b for player 1, common belief in future and restricted past rationality requires player 2 to believe at \(h_{1}\) that player 1 chose a. Therefore, player 2 must choose d and player 1 must choose a. Hence, common belief in future and restricted past rationality only allows for the outcome (ad).

Moreover, it is not forward induction that is being used in the concept presented here. To see this, consider the game in Fig. 4, which is the Battle of the sexes with an “Outside” option.

Fig. 4
figure 4

Battle of the sexes with an “Outside” option

Under forward induction, the only possible outcome is player 1 choosing a and player 2 choosing d, since player 2, on noticing the game has reached \(h_{1}\) reasons that player 1 must have chosen a since that is the only way player 1 can get more than by choosing c. Therefore, player 2 must choose d.

Under our concept however, the only choice that is eliminated is b for player 1. That is because b is strictly dominated by c at \(\varnothing \). However, player 2 can still believe at \(h_{1}\) that player 1 chose b. Indeed, if player 1 believes that player 2 chooses e, strategy b is better than a, and hence player 2 may believe at \(h_{1}\) that player 1 chose b. Consequently, player 2 may choose e under our concept, and player 1 may choose c. However, c is not the forward induction strategy for player 1.

Now, for the example shown in Fig. 5 it can be seen why in the definition for the concept presented here it is required for players to reason about every past information set and not just a few of the previous information sets. Note that a is better than b for player 1 at \(\varnothing \), and that f is better than g for player 1 at \(h_{2}\). Therefore, under our concept, player 2 must believe at \(h_{4}\) that the second node has been reached, so player 2 must choose \(\ell \). To reach this conclusion, it is important that player 2 reasons about both \(\varnothing \) and \(h_{2}\). Reasoning only about \(\varnothing \), or reasoning only about \(h_{2}\) and \(h_{3}\), would not be sufficient to draw this conclusion.

Fig. 5
figure 5

Game with more than one information set for each player

3 Dynamic games

In this section we define the dynamic games we consider, and some general notions that will be used throughout the paper. In what follows we assume the players have perfect recall.

Definition 1

(Dynamic game) A dynamic game G is a tuple

$$\begin{aligned} G = (I, (C_{i})_{i \in I}, X, Z, (H_{i})_{i \in I}, (C_{i}(h))_{i \in I, h \in H_{i}}, (u_{i})_{i \in I}), \end{aligned}$$

where

  • I is the finite set of players;

  • \(C_{i}\) is the finite set of choices for each player \(i \in I\);

  • X is the set of non-terminal histories, which are sequences of profiles of choices \(x = (x_{1}, \ldots , x_{k})\), with \(x_{m} = (c_{i})_{i \in \hat{I}} \in \mathop {\times }\nolimits _{i \in \hat{I}} C_{i}\) for some non-empty \(\hat{I} \subseteq I\), and for all \(\ell < k\), \((x_{1}, \ldots , x_{\ell })\) is also a history. As \(\hat{I}\) may contain more than one player, simultaneous moves are allowed;

  • Z is the set of terminal histories of the game. In this case, if \(z = (x_{1}, \ldots , x_{k}) \in Z\), then for every \(\ell < k\), \((x_{1}, \ldots , x_{\ell }) \in X\);

  • \(H_{i}\) is a finite collection of information sets for player i. The information sets \(h \in H_{i}\) are non-empty sets of non-terminal histories. If h contains more than one history, then player i does not know with certainty which history was realized to arrive at h. The collections of information sets for each player are not necessarily disjoint since we allow for simultaneous moves, so the same information set might belong to two or more players at the same time. The collection of all information sets for all players in the game is denoted by H;

  • \(C_{i}(h) \subseteq C_{i}\) is the finite set of choices available for player i at the information set \(h \in H_{i}\). We say \(c \in C_{i}(h)\) if there is a history \(x \in X\) and \(x_{m} = (c_{j})_{j \in \hat{I}}\) such that \(x \in h\), \(i \in \hat{I}\), \(c_{i} = c\) and \((x, x_{m}) = x' \in X \cup Z\); and

  • \(u_{i} :Z \rightarrow \mathbb {R}\) is player i’s utility function.

As an example, for the game described in Fig. 1 we have a dynamic game in its extensive form. This two-player game has the sets of histories \(X = \{\varnothing , (a), (b), (a, d)\}\) and \(Z = \{(c), (a, e), (b, d), (b, e), (a, d, f), (a, d, g)\}\); the collections of information sets \(H_{1} = \{\varnothing , h_{2}\}\) and \(H_{2} = \{h_{1}\}\), where \(h_{1} = \{(a), (b)\}\) and \(h_{2} = \{(a, d)\}\); and the sets of choices \(C_{1}(\varnothing ) = \{a, b, c\}\), \(C_{1}(h_{2}) = \{f, g\}\), \(C_{2}(h_{1}) = \{d, e\}\).

We define a partial order on the information sets of a game. An information set \(h'\) immediately follows h, or h immediately precedes \(h'\), if there exist a non-empty \(\hat{I} \subseteq I\), \(c_{i} \in C_{i}(h)\) for every \(i \in \hat{I}\), and \(x \in h\) such that \((x, (c_{i})_{i \in \hat{I}}) \in h'\).

An information set \(h'\) weakly follows h, or h weakly precedes \(h'\), if \(h = h'\) or there is a sequence \(h_{0}, h_{1}, h_{2}, \ldots , h_{\ell }\) such that \(h_{t}\) immediately follows \(h_{t - 1}\) for \(t \in \{1, 2, \ldots , \ell \}\), where \(h_{0} = h\) and \(h_{\ell } = h'\). If \(h \ne h'\), we say h strictly precedes \(h'\).

During the game, each player makes one or more choices, sometimes depending on his previous choices or on the choices of other players. However, if a player’s choice prevents himself from making some other choices, there is no reason to make a plan that includes both the former choice and any of the latter ones. Therefore, we restrict ourselves to studying those plans that only prescribe choices at information sets that are reachable under the earlier choices: a “plan of action”, as described in Rubinstein (1991). These plans we will call strategies. We also identify those strategies that can potentially reach an information set.

Looking at the game shown in Fig. 1 the sets of strategies for each player are \(S_{1} = \{(a, f), (a, g), b, c\}\), and \(S_{2} = \{d, e\}\). In classical game theory, other sequences such as (bf) would also qualify as strategies, however, player 1 prevents himself from choosing f by choosing b at an earlier information set, rendering the choice f unnecessary.

Let \(h \in H\), \(h' \in H_{i}\), where \(h'\) strictly precedes h. We say a choice \(c_{i} \in C_{i}(h')\) leads to h if there exist \(x \in h'\), \(\hat{I} \subseteq I\) with \(i \in \hat{I}\), and \(c_{j} \in C_{j}(h')\) for every \(j \in \hat{I} {\setminus } \{i\}\) such that \((x, (c_{j})_{j \in \hat{I}})\) weakly precedes h.

An information set \(h \in H\) is reachable via \(s_{i} :\tilde{H}_{i} \rightarrow \cup _{h \in \tilde{H}_{i}} C_{i}(h)\), with \(\tilde{H}_{i} \subseteq H_{i}\), if at every information set \(h' \in \tilde{H}_{i}\) that strictly precedes h, the choice \(s_{i}(h')\) leads to h. We say \(s_{i}\) is a strategy if \(\tilde{H}_{i}\) contains exactly those histories in \(H_{i}\) that are reachable via \(s_{i}\). A strategy \(s_{i}\) leads to \(h \in H\) if h is reachable via \(s_{i}\).

The set of strategies for player i is denoted by \(S_{i}\). The set of strategy combinations for the opponents of i is denoted by \(S_{-i} = \mathop {\times }\nolimits _{j \ne i} S_{j}\). A strategy combination for all players is given by \((s_{i}, s_{-i})\) where \(s_{i} \in S_{i}\) and \(s_{-i} \in S_{-i}\).

The set of strategies for player i that lead to h is denoted by \(S_{i}(h)\). In Example 1, \(S_{1}(h_{1}) = \{(a, f), (a, g), b\}\), \(S_{1}(h_{2}) = \{(a, f), (a, g)\}\), \(S_{2}(h_{2}) = \{d\}\).

The set of strategy combinations for the opponents of i that lead to h is denoted by \(S_{-i}(h)\). The set of information sets for player i that strategy \(s_{i}\) leads to is denoted by \(H_{i}(s_{i})\).

Finally we identify those strategy combinations that reach a particular information set. Let \((s_{i}, s_{-i}) \in S_{i} \times S_{-i}\) be a strategy combination for all players. We define \(H(s_{i}, s_{-i})\) as the class of information sets h such that \(s_{i} \in S_{i}(h)\) and \(s_{-i} \in S_{-i}(h)\). \(H(s_{i}, s_{-i})\) are the information sets that can be reached with the strategy combination \((s_{i}, s_{-i})\).

4 Proper rationalizability

To connect the rationalizability concepts in dynamic games with related rationalizability concepts in normal form games, we also need to connect a dynamic game with a related game in its normal form.

Definition 2

(Normal form of a dynamic game) Let G be a dynamic game. The normal form of G is the game \(G' = (I, (S_{i})_{i \in I}, (v_{i})_{i \in I})\) in which all players i choose simultaneously a strategy \(s_{i} \in S_{i}\), and each player i receives the utility \(v_{i}(s_{i}, s_{-i}) = u_{i}(z(s_{i}, s_{-i}))\) where \(z(s_{i}, s_{-i})\) is the terminal history reached by \((s_{i}, s_{-i})\).

We define a structure called an epistemic model with types, which serves as a compact way to encode belief hierarchies, so we can derive the various levels of belief for each type in the epistemic model. Then we define strategy-type combinations, which are the objects on which beliefs are constructed, and lexicographic beliefs.

A lexicographic belief \(b_{i}\) for player i on a finite set A is a sequence \((b_{i}^{1}; \ldots ; b_{i}^{m})\) where each \(b_{i}^{k}\) is a probability distribution on A. The belief \(b_{i}^{k}\) is called the level k of the lexicographic belief.

Definition 3

(Epistemic model for a normal form game) An epistemic model \(M = (T_{i}, b_{i})_{i \in I}\) for a normal form game \(G' = (I, (S_{i})_{i \in I}, (v_{i})_{i \in I})\) consists of a finite set of types \(T_{i}\) for each player i, and for each type \(t_{i} \in T_{i}\) we define a lexicographic belief \(b_{i}(t_{i}) = (b_{i}^{1}(t_{i}); \ldots ; b_{i}^{m}(t_{i}))\) on \(S_{-i} \times T_{-i} = \mathop {\times }\nolimits _{k \ne i} (S_{k} \times T_{k})\), which is the set of strategy-type combinations of i’s opponents.

To derive a lexicographic belief hierarchy for every type, consider a type \(t_{i}\) and its lexicographic belief \(b_{i}(t_{i}) = (b_{i}^{1}(t_{i}); \ldots ; b_{i}^{m}(t_{i}))\).

For the first order of the lexicographic belief hierarchy of \(t_{i}\), we have that player i deems the strategies in the support of \(b_{i}^{1}(t_{i})\) infinitely more likely than the strategies that are in the support of \(b_{i}^{2}(t_{i})\) but not in the support of \(b_{i}^{1}(t_{i})\); and deems the strategies in the support of \(b_{i}^{2}(t_{i})\) infinitely more likely than the strategies that are in the support of \(b_{i}^{3}(t_{i})\) but not in the supports of \(b_{i}^{1}(t_{i})\) or \(b_{i}^{2}(t_{i})\); and so on.

On the second order of the lexicographic belief hierarchy of \(t_{i}\), we have that player i deems the lexicographic beliefs of each type that appears in \(b_{i}^{1}(t_{i})\) infinitely more likely than the lexicographic beliefs of each type that appears in \(b_{i}^{2}(t_{i})\) but didn’t appear in \(b_{i}^{1}(t_{i})\); and deems the lexicographic beliefs of each type that appears in \(b_{i}^{2}(t_{i})\) but didn’t appear in \(b_{i}^{1}(t_{i})\) infinitely more likely than the lexicographic beliefs of each type that appears in \(b_{i}^{3}(t_{i})\) but didn’t appear in a previous level; and so on. Continuing this way it is possible to obtain the full lexicographic belief hierarchy.

We say type \(t_{j}\) is deemed possible by type \(t_{i}\) for the lexicographic belief \(b_{i}(t_{i}) = (b_{i}^{1}(t_{i}); \ldots ; b_{i}^{m}(t_{i}))\) if there exists a strategy-type combination \((s_{-i}, t_{-i}) \in (S_{j} \times \{t_{j}\}) \times \mathop {\times }\nolimits _{k \ne i, j} (S_{k} \times T_{k})\) such that \(b_{i}^{\ell }(t_{i})(s_{-i}, t_{-i}) > 0\) for some \(\ell \in \{1, \ldots , m\}\). The set of types for player j deemed possible by \(b_{i}(t_{i})\) is denoted by \(T_{j}(t_{i})\).

If positive probability is assigned to a strategy-type combination in level \(\ell \), earlier than another strategy-type combination in a level k, with \(\ell < k\), we say that the first combination is deemed infinitely more likely than the second one.

Definition 4

(Strategy-type combinations deemed infinitely more likely) Let \(b_{i}(t_{i}) = (b_{i}^{1}(t_{i}); \ldots , b_{i}^{m}(t_{i}))\) be a lexicographic belief for type \(t_{i}\) for player i. We say \(t_{i}\) deems a strategy-type combination \((s_{-i}, t_{-i})\) infinitely more likely than \((s_{-i}', t_{-i}')\) if there exists \(k \in \{1, \ldots , m\}\) such that

  1. 1.

    for all \(\ell \le k\), \(b_{i}^{\ell }(t_{i})(s'_{-i}, t_{-i}') = 0\); and

  2. 2.

    \(b_{i}^{k}(t_{i})(s_{-i}, t_{-i}) > 0\).

We focus on a particular type of lexicographic beliefs, which are such that for every type combination for i’s opponents that is deemed possible in the belief, every strategy combination for i’s opponents must receive positive probability at some level k.

Definition 5

(Cautious lexicographic belief) Consider an epistemic model \(M = (T_{i}, b_{i})_{i \in I}\). Let \(b_{i}(t_{i}) = (b_{i}^{1}(t_{i}); \ldots ; b_{i}^{m}(t_{i}))\) be a lexicographic belief for type \(t_{i} \in T_{i}\) for player i. We say \(b_{i}(t_{i})\) is cautious if for each \((s_{-i}, t_{-i}) \in \mathop {\times }\nolimits _{j \ne i} (S_{j} \times T_{j}(t_{i}))\) there is a \(k \in \{1, \ldots , m\}\) such that

$$\begin{aligned} b_{i}^{k}(t_{i})(s_{-i}, t_{-i}) > 0. \end{aligned}$$

In order to compare strategies for a player we define the expected utility for a given lexicographic belief. Note that it is defined by levels, and the comparison is made at the first level in which two strategies disagree in their expected utility.

Given a type \(t_{i}\) for player i and a lexicographic belief \(b_{i}(t_{i}) = (b_{i}^{1}(t_{i}); \ldots ; b_{i}^{m}(t_{i}))\) we define the expected utility of choosing strategy \(s_{i}\) at level k as

$$\begin{aligned} v_{i}^{k}(s_{i}, b_{i}(t_{i})) = \sum _{(s_{-i}, t_{-i}) \in S_{-i} \times T_{-i}} b_{i}^{k}(t_{i})(s_{-i}, t_{-i}) v_{i}(s_{i}, s_{-i}). \end{aligned}$$

A type \(t_{i}\) with a lexicographic belief \(b_{i}(t_{i}) = (b_{i}^{1}(t_{i}); \ldots , b_{i}^{m}(t_{i}))\) for player i prefers strategy \(s_{i}\) to \(s_{i}'\) if there exists \(k \in \{1, \ldots , m\}\) such that

  1. 1.

    for all \(\ell < k\), \(v_{i}^{\ell }(s_{i}, b_{i}(t_{i})) = v_{i}^{\ell }(s'_{i}, b_{i}(t_{i}))\); and

  2. 2.

    \(v_{i}^{k}(s_{i}, b_{i}(t_{i})) > v_{i}^{k}(s'_{i}, b_{i}(t_{i}))\).

Given a lexicographic belief \(b_{i}(t_{i})\) for type \(t_{i}\), a strategy \(s_{i}\) is optimal for \(t_{i}\) if there is no other \(s_{i}' \in S_{i}\) such that \(t_{i}\) prefers \(s_{i}'\) to \(s_{i}\).

Now we define the notion of rationalizability that will be used for normal form games: respect of preferences, due to Asheim (2001), which in turn defines the concept of proper rationalizability.

Definition 6

(Respect of preferences) Consider an epistemic model \(M = (T_{i}, b_{i})_{i \in I}\). Let \(b_{i}(t_{i}) = (b_{i}^{1}(t_{i}); \ldots ; b_{i}^{m}(t_{i}))\) be a lexicographic belief for type \(t_{i}\) for player i. We say \(t_{i}\) respects j’s preferences if for every type \(t_{j}\) of player j deemed possible by \(t_{i}\), and strategies \(s_{j}, s_{j}' \in S_{j}\) such that \(t_{j}\) prefers \(s_{j}\) to \(s_{j}'\), \(t_{i}\) deems at least one strategy-type combination in \(\mathop {\times }\nolimits _{k \in I {\setminus } \{i, j\}} (S_{k} \times T_{k}(t_{i})) \times \{(s_{j}, t_{j})\}\) infinitely more likely than every strategy-type combination in \(\mathop {\times }\nolimits _{k \in I {\setminus } \{i, j\}} (S_{k} \times T_{k}(t_{i})) \times \{(s_{j}', t_{j})\}\).

We say \(t_{i}\) respects the opponents’ preferences if \(t_{i}\) respects j’s preferences for all \(j \in I {\setminus } \{i\}\).

Definition 7

(k-fold and common full belief in caution)

  1. 1.

    Type \(t_{i}\) expresses 1-fold full belief in caution if \(t_{i}\) only deems possible opponents’ types that are cautious.

  2. 2.

    For every \(k > 1\), type \(t_{i}\) expresses k-fold full belief in caution if \(t_{i}\) only deems possible opponents’ types that express \((k - 1)\)-fold full belief in caution.

  3. 3.

    Type \(t_{i}\) expresses common full belief in caution if \(t_{i}\) expresses k-fold full belief in caution for all \(k \in \mathbb {N}\).

In a similar way we can define k-fold and common full belief in respect of preferences. Now we can define proper rationalizability, which was introduced by Schuhmacher (1999). However, in this section we use the characterization of this concept given by Asheim (2001) which uses lexicographic beliefs.

Definition 8

(Proper rationalizability) Type \(t_{i}\) is properly rationalizable if \(t_{i}\) is cautious, respects the opponents’ preferences and expresses common full belief in caution and common full belief in respect of preferences.

A strategy \(s_{i}\) for player i is properly rationalizable if there exists an epistemic model \(M =(T_{i}, b_{i})_{i \in I}\) and some type \(t_{i} \in T_{i}\) such that \(t_{i}\) is properly rationalizable, and strategy \(s_{i}\) is optimal for type \(t_{i}\).

Table 1 An epistemic model for the normal form of Fig. 1

For the game in Fig. 1, consider the epistemic model given in Table 1. The first level of belief \(b_{1}(t_{1})\) is the Dirac measure that assigns probability 1 to the strategy-type pair \((d, t_{2})\), and the second level is the Dirac measure that assigns probability 1 to \((e, t_{2})\). Analogously the belief \(b_{2}(t_{2})\) is also shorthand for a collection of Dirac measures. We shall check that each type is properly rationalizable.

Type \(t_{1}\) only deems possible type \(t_{2}\), and the strategy-type combinations \((d, t_{2})\) and \((e, t_{2})\) appear at some level of \(b_{1}(t_{1})\), so \(t_{1}\) is cautious. Similarly \(t_{2}\) only deems possible type \(t_{1}\), and the strategy-type combinations \(((a, f), t_{1})\), \(((a, g), t_{1})\), \((b, t_{1})\) and \((c, t_{1})\) appear at some level of \(b_{2}(t_{2})\), so \(t_{2}\) is cautious.

Type \(t_{1}\) believes player 2 is of type \(t_{2}\), which believes at the first level of \(b_{2}(t_{2})\) that player 1 will choose c, and at the second level that player 1 will choose (af), in which case the order of preference for player 2 is d, then e, so \(t_{1}\) respects the opponent’s preferences.

Type \(t_{2}\) believes player 1 is of type \(t_{1}\), which believes at the first level of \(b_{1}(t_{1})\) that player 2 will choose d, in which case the order of preference for player 1 is c, then (af), followed by b and finally (ag), so \(t_{2}\) respects the opponent’s preferences.

Since all the types in the epistemic model are cautious and respect the opponent’s preferences, all the types are properly rationalizable. For player 1, c is a strategy that is optimal for \(t_{1}\), and for player 2, d is a strategy that is optimal for \(t_{2}\). Therefore c and d are properly rationalizable.

5 Common belief in future and restricted past rationality

Now we turn to dynamic games, and we will define the concept of common belief in future and restricted past rationality. In Sect. 6 we will connect the concept to proper rationalizability of the normal form.

We first define an epistemic model for a dynamic game, which is rather similar to the definition for normal form games, except the beliefs depend on the information set.

Definition 9

(Epistemic model for a dynamic game) An epistemic model \(\hat{M} = (\hat{T}_{i}, \beta _{i})_{i \in I}\) for a dynamic game G consists of a finite set of types \(\hat{T}_{i}\) for each player i, and for each type \(\hat{t}_{i} \in \hat{T}_{i}\) and each information set \(h \in H_{i}\) of player i we define a conditional belief \(\beta _{i}(\hat{t}_{i}, h)\) which is a probability distribution over \(S_{-i}(h) \times \hat{T}_{-i}\), the set of strategy-type combinations of i’s opponents that lead to \(h \in H_{i}\).

Given a type \(\hat{t}_{i}\), an information set h for player i, and a conditional belief \(\beta _{i}(\hat{t}_{i}, h)\) we define the expected utility of choosing strategy \(s_{i} \in S_{i}(h)\) as

$$\begin{aligned} u_{i}(s_{i}, \beta _{i}(\hat{t}_{i}, h)) = \sum _{(s_{-i}, \hat{t}_{-i}) \in S_{-i} \times \hat{T}_{-i}} \beta _{i}(\hat{t}_{i}, h)(s_{-i}, \hat{t}_{-i}) u_{i}(z(s_{i}, s_{-i})), \end{aligned}$$

where \(z(s_{i}, s_{-i})\) is the terminal history reached by \((s_{i}, s_{-i})\).

Given a conditional belief \(\beta _{i}(\hat{t}_{i}, h)\) for type \(\hat{t}_{i}\) at the information set h, a strategy \(s_{i} \in S_{i}(h)\) is optimal for \(\hat{t}_{i}\) at h if for all \(s_{i}' \in S_{i}(h)\)

$$\begin{aligned} u_{i}(s_{i}, \beta _{i}(\hat{t}_{i}, h)) \ge u_{i}(s'_{i}, \beta _{i}(\hat{t}_{i}, h)). \end{aligned}$$

Now we define the key conditions that will be used: belief in future rationality as has been defined in Perea (2014), Bayesian updating, and a new notion that we propose, which requires players to think about the past rationality of the opponents, insofar as it concerns the strategies that reach the information set at which the player is. We define all three notions separately, then we define common belief in future rationality and common belief in restricted past rationality in an iterative way, to combine them into one concept that refines common belief in future rationality.

We should point out that in Definitions 10 and 13 we allow for information sets that weakly follow or precede another. That is because it is possible for the same information set to belong to two or more players with our definition of a dynamic game, as we allow for simultaneous moves.

Definition 10

(Belief in the opponents’ future rationality) We say that a type \(\hat{t}_{i}\) believes in j’s future rationality if at every \(h \in H_{i}\), \(\beta _{i}(\hat{t}_{i}, h)(s_{j}, \hat{t}_{j}) > 0\) only if for every \(h' \in H_{j}(s_{j})\) that weakly follows h:

$$\begin{aligned} u_{j}(s_{j}, \beta _{j}(\hat{t}_{j}, h')) \ge u_{j}(s'_{j}, \beta _{j}(\hat{t}_{j}, h')) \end{aligned}$$

for every \(s_{j}' \in S_{j}(h')\).

Type \(\hat{t}_{i}\) believes in the opponents’ future rationality if \(\hat{t}_{i}\) believes in j’s future rationality for all players \(j \in I {\setminus } \{i\}\).

Definition 11

(k-fold and common belief in future rationality)

  1. 1.

    Type \(\hat{t}_{i}\) expresses 1-fold belief in future rationality if \(\hat{t}_{i}\) believes in the opponents’ future rationality.

  2. 2.

    For every \(k > 1\), type \(\hat{t}_{i}\) expresses k-fold belief in future rationality if at every information set \(h \in H_{i}\), \(\hat{t}_{i}\) only assigns positive probability to opponents’ types that express \((k - 1)\)-fold belief in future rationality.

  3. 3.

    Type \(\hat{t}_{i}\) expresses common belief in future rationality if \(\hat{t}_{i}\) expresses k-fold belief in future rationality for every \(k \in \mathbb {N}\).

Definition 12

(Bayesian updating) A type \(\hat{t}_{i}\) satisfies Bayesian updating if for every \(h, h' \in H_{i}\) such that \(h'\) follows h and \(\beta _{i}(\hat{t}_{i}, h)(S_{-i}(h') \times \hat{T}_{-i}) > 0\), it holds that

$$\begin{aligned} \beta _{i}(\hat{t}_{i}, h')(s_{-i}, \hat{t}_{-i}) = \frac{\beta _{i}(\hat{t}_{i}, h)(s_{-i}, \hat{t}_{-i})}{\beta _{i}(\hat{t}_{i}, h)(S_{-i}(h') \times \hat{T}_{-i})} \end{aligned}$$

for every strategy-type combination \((s_{-i}, \hat{t}_{-i}) \in S_{-i}(h') \times \hat{T}_{-i}\) of player i’s opponents.

Definition 13

(Belief in the opponents’ restricted past rationality) We say that a type \(\hat{t}_{i}\) believes in j’s restricted past rationality if at every \(h \in H_{i}\), \(\beta _{i}(\hat{t}_{i}, h)(s_{j}, \hat{t}_{j}) > 0\) only if for every \(h' \in H_{j}(s_{j})\) such that \(h'\) weakly precedes h:

$$\begin{aligned} u_{j}(s_{j}, \beta _{j}(\hat{t}_{j}, h')) \ge u_{j}(s'_{j}, \beta _{j}(\hat{t}_{j}, h')) \end{aligned}$$

for every \(s_{j}' \in S_{j}(h) \cap S_{j}(h')\).

Type \(\hat{t}_{i}\) believes in the opponents’ restricted past rationality if \(\hat{t}_{i}\) believes in j’s restricted past rationality for all players \(j \in I {\setminus } \{i\}\).

The previous definition establishes that type \(\hat{t}_{i}\) must reason at h about those strategies of his opponents that can be chosen at a previous information set \(h'\), but only if those strategies can reach the information set h too. That is, i considers at h only those strategies at \(h'\) that give the highest utility to the opponent at \(h'\) from those strategies that actually reach h.

We can define k-fold and common belief in restricted past rationality, and k-fold and common belief in Bayesian updating in an analogous way to the definition of k-fold and common belief in future rationality.

Table 2 An epistemic model for the dynamic form of Fig. 1

A strategy \(s_{i}\) for player i can rationally be chosen under common belief in future and restricted past rationality and common belief in Bayesian updating if there exists an epistemic model \(\hat{M} = (\hat{T}_{i}, \beta _{i})_{i \in I}\) and some type \(\hat{t}_{i} \in \hat{T}_{i}\) such that \(\hat{t}_{i}\) expresses common belief in future and restricted past rationality and common belief in Bayesian updating, and strategy \(s_{i}\) is optimal for type \(\hat{t}_{i}\) at every information set \(h \in H_{i}(s_{i})\).

Returning to the example shown in Fig. 1, consider the epistemic model given in Table 2, for which we check that every type expresses common belief in future and restricted past rationality and satisfies Bayesian updating.

At \(\varnothing \in H_{1}\), \(\hat{t}_{1}\) believes that player 2 chooses d and is of type \(\hat{t}_{2}\). Type \(\hat{t}_{2}\) believes at \(h_{1}\), which weakly follows \(\varnothing \), that player 1 chooses (af), so the optimal strategy in \(S_{2}(h_{1}) = \{d, e\}\) for player 2 is d. Therefore \(\hat{t}_{1}\) believes in the opponent’s future rationality at \(\varnothing \). Since there are no information sets for player 2 that weakly precede \(\varnothing \), \(\hat{t}_{1}\) believes in the opponent’s restricted past rationality at \(\varnothing \).

At \(h_{2} \in H_{1}\) there are no information sets for player 2 that weakly follow \(h_{2}\), so \(\hat{t}_{1}\) believes in the opponent’s future rationality at \(h_{2}\). Now, type \(\hat{t}_{1}\) believes at \(h_{2}\) that player 2 chooses d and is of type \(t_{2}\); in fact \(S_{2}(h_{1}) \cap S_{2}(h_{2}) = \{d\}\). Therefore \(\hat{t}_{1}\) believes in the opponent’s restricted past rationality at \(h_{2}\). Moreover, \(\hat{t}_{1}\) satisfies Bayesian updating if the game moves from \(\varnothing \) to \(h_{2}\).

At \(h_{1} \in H_{2}\), \(\hat{t}_{2}\) believes that player 1 chooses (af) and is of type \(\hat{t}_{1}\). Type \(\hat{t}_{1}\) believes at \(h_{2}\), which weakly follows \(h_{1}\), that player 2 chooses d at \(h_{1}\), so the optimal strategy in \(S_{1}(h_{2})\) for player 1 is (af). Therefore \(\hat{t}_{2}\) believes in the opponent’s future rationality. Type \(\hat{t}_{1}\) believes at \(\varnothing \), which weakly precedes \(h_{1}\), that player 2 chooses d at \(h_{1}\), so the optimal strategy in \(S_{1}(\varnothing )\cap S_{1}(h_{1}) = \{(a, f), (a,g), b\}\) for player 1 is (af). Therefore \(\hat{t}_{2}\) believes in the opponent’s restricted past rationality. Finally it can easily be seen that \(\hat{t}_{2}\) satisfies Bayesian updating, as \(h_{1}\) is player 2’s only information set. We can see that among all strategies in \(S_{1}(\varnothing )\), (af) is not optimal for \(\hat{t}_{1}\) at \(\varnothing \), as c gives a higher utility.

Since all the types in the epistemic model believe in the opponent’s future and restricted past rationality and satisfy Bayesian updating, then all the types express common belief in future and restricted past rationality and common belief in Bayesian updating. For player 1, c is optimal for type \(\hat{t}_{1}\) at information set \(\varnothing \), and for player 2, d is optimal for type \(\hat{t}_{2}\) at information set \(h_{1}\). Therefore c and d can rationally be chosen under common belief in future and restricted past rationality and common belief in Bayesian updating.

6 Connection with proper rationalizability

In this section we prove one of our main theorems, which states that proper rationalizability of a strategy in the normal form implies optimality of the same strategy under common belief in future and restricted past rationality with Bayesian updating in the dynamic game.

In order to do so, we break down the proof into four smaller parts. We start by showing that optimality of a strategy for a cautious type in the normal form of the game implies optimality of the same strategy for the induced type in the dynamic game. Then we go on to show that respect of the opponent’s preferences in the normal form implies belief in the opponent’s future and restricted past rationality and Bayesian updating in the dynamic game. As a consequence, proper rationalizability in the normal form implies common belief in future and restricted past rationality and common belief in Bayesian updating in the dynamic game. This finally implies that every strategy which is properly rationalizable in the normal form can rationally be chosen under common belief in future and restricted past rationality with Bayesian updating in the dynamic game.

Theorem 1

Consider a dynamic game G. If a strategy \(s_{i}\)is properly rationalizable in the normal form of G, then \(s_{i}\)can rationally be chosen under common belief in future and restricted past rationality and common belief in Bayesian updating in the dynamic game G.

This result has a connection with van Damme (1984), who showed that every proper equilibrium in the normal form of a game implies a quasi-perfect equilibrium in the dynamic game, which in turn implies a sequential equilibrium in the dynamic game. The non-equilibrium analogue for proper equilibria is proper rationalizability. Moreover, every sequential equilibrium is a subgame perfect equilibrium, which, as shown by Perea and Predtetchinski (2019), is the equilibrium counterpart of common belief in future rationality in the case of two-player games. In this way, our theorem may be viewed as a non-equilibrium analogue to van Damme’s result.

As a first step to establishing Theorem 1, we define a way to transform an epistemic model of the normal form into an epistemic model for the dynamic game.

Let \(M = (T_{i}, b_{i})_{i \in I}\) be an epistemic model for the normal form of the game where every type \(t_{i} \in T_{i}\) is cautious for all \(i \in I\). We define the induced epistemic model for the dynamic game \(\hat{M} = (\hat{T}_{i}, \beta _{i})_{i \in I}\) in the following way: for each player i take the bijective mapping \(f_{i} :T_{i} \rightarrow \hat{T}_{i}\), effectively a renaming of the types, and let the conditional belief of type \(f_{i}(t_{i})\) at the information set \(h \in H_{i}\) be defined as

$$\begin{aligned} \beta _{i}(f_{i}(t_{i}), h)(s_{-i}, f_{-i}(t_{-i})) = \frac{b_{i}^{k}(t_{i})(s_{-i}, t_{-i})}{b_{i}^{k}(t_{i})(S_{-i}(h) \times T_{-i})}, \end{aligned}$$

where k is the smallest number for which \(b_{i}^{k}(t_{i})(S_{-i}(h) \times T_{-i}) > 0\). Here,

$$\begin{aligned} b_{i}^{k}(t_{i})(S_{-i}(h) \times T_{-i}) = \sum _{(s_{-i}, t_{-i}) \in S_{-i}(h) \times T_{-i}} b_{i}^{k}(s_{-i}, t_{-i}), \end{aligned}$$

that is, we take the first level k of the lexicographic belief for \(t_{i}\) in which there is at least one strategy combination for i’s opponents that reaches h, and normalize the probabilities accordingly. By doing this, the conditional beliefs of the types are such that the types for the dynamic game satisfy Bayesian updating. Although some information that could be useful for tie-breaking is lost when constructing the conditional beliefs for the dynamic game, such information is not required for our model.

Table 3 An epistemic model for the normal form
Table 4 The epistemic model of the dynamic game induced by Table 3

To illustrate how to transform cautious lexicographic beliefs into conditional beliefs, we consider the game from Fig. 1. Suppose the epistemic model for its normal form is the one in Table 3, then the epistemic model induced for the dynamic game is the one in Table 4.

Now that we have a way to relate epistemic models of the normal form with those of the dynamic game, we will see how the rationalizability concepts relate to each other. First we show that optimality of a strategy for a cautious type in the normal form of the game implies optimality of the same strategy for the induced type in the dynamic game. This is presented in the following lemma.

Lemma 1

Let M be an epistemic model of the normal form in which all types are cautious, \(h \in H_{i}\), \(h'\)an information set that weakly follows or weakly precedes h, and \(t_{i}\)a type for player i in M. If \(s_{i} \in S_{i}(h) \cap S_{i}(h')\)is not optimal for \(f_{i}(t_{i})\)among strategies in \(S_{i}(h) \cap S_{i}(h')\)at \(h \in H_{i}\), then there exists \(\hat{s}_{i} \in S_{i}(h) \cap S_{i}(h')\)such that \(t_{i}\)prefers \(\hat{s}_{i}\)to \(s_{i}\).Footnote 1

The optimality implication described above will be very useful to show the relations between the rationalizability concepts that we are studying. The next step is to show that respect of preferences in the normal form of the game implies belief in future and restricted past rationality.

Lemma 2

If \(t_{i}\) respects player j’s preferences, then \(f_{i}(t_{i})\) believes in j ’s future and restricted past rationality and \(f_{i}(t_{i})\) satisfies Bayesian updating.

And also, the notion of proper rationalizability implies common belief in future and restricted past rationality.

Lemma 3

If \(t_{i}\) is properly rationalizable, then \(f_{i}(t_{i})\) expresses common belief in future and restricted past rationality and common belief in Bayesian updating.

Since we know that for every normal form game there exists at least one properly rationalizable type for every player (cf. Asheim 2001; Perea 2012), then Lemma 3 implies the following result.

Corollary 1

For every dynamic game G there exists for every player i an epistemic model M and a type \(\hat{t}_{i}\) in it that expresses common belief in future and restricted past rationality and common belief in Bayesian updating.

Once we have all of these results, Lemmas 1 and 3 imply Theorem 1. Therefore, if we transform a dynamic game into its normal form and proceed to find an epistemic model in which the types express proper rationalizability, we can find an induced epistemic model for the dynamic game in which the types express common belief in future and restricted past rationality and common belief in Bayesian updating. Moreover, from Theorem 1 we have that the strategies that can be chosen under proper rationalizability can also be chosen under common belief in future and restricted past rationality and common belief in Bayesian updating.

We can check that the epistemic model in Table 2 is induced by the epistemic model in Table 1 via the transformation described before, and we have seen that all types in Table 1 are properly rationalizable. Since strategy c is optimal for type \(t_{1}\) and strategy d is optimal for type \(t_{2}\), both strategies can rationally be chosen under common belief in future and restricted past rationality and common belief in Bayesian updating according to Theorem 1.

As we can see, at information sets \(\varnothing \) and \(h_{2}\), type \(t_{1}\) of player 1 believes type \(t_{2}\) of player 2 will be and has been rational. However, if the game reaches information set \(h_{1}\), then this means that player 1 was not rational before. Nevertheless, player 2 believes that if \(h_{1}\) was reached, then player 1 is choosing optimally among strategies that lead to \(h_{1}\). Therefore, type \(t_{2}\) believes that player 1 will choose (af). Hence, player 2 can only rationally choose d under common belief in future and restricted past rationality and common belief in Bayesian updating.

Under common strong belief in rationality, if player 2 sees that \(h_{1}\) has been reached, then, if possible, he must believe that player 1 made a choice that is rational at \(\varnothing \). But choosing c at \(\varnothing \) gives the highest utility for player 1, so it is not possible for player 2 to believe that player 1 made a rational choice under common strong belief in rationality. Therefore, player 2 can believe player 1 chose any strategy that leads to \(h_{1}\), so both d and e can rationally be chosen at \(h_{1}\) under common strong belief in rationality.

Under common belief in future rationality, if player 2 sees that \(h_{1}\) was reached, then he may believe that player 1 chose irrationally at \(\varnothing \), but he must believe that from now on, player 1 will choose rationally. Therefore, player 2 can believe player 1 chose a or b at \(\varnothing \), so both d and e can rationally be chosen under common belief in future rationality.

7 Algorithm

In this section, whenever we say common belief in future and restricted past rationality, we actually mean common belief in future and restricted past rationality and common belief in Bayesian updating. Hence, we always assume common belief in Bayesian updating.

In order to find the strategies that can rationally be chosen under common belief in future and restricted past rationality, we propose an algorithm based on the backward dominance procedure in Perea (2014). Then we show that the strategies that survive the algorithm are exactly those strategies that can be chosen under common belief in future and restricted past rationality.

As can be seen from the proof in Sect. 9, the algorithm also characterizes those strategies that can be chosen under common belief in future and restricted past rationality without requiring (common belief in) Bayesian updating. Hence, for the strategies that can rationally be chosen it is not relevant whether we require Bayesian updating or not.

Definition 14

(Full and reduced decision problems at an information set) Let \(h \in H_{i}\) be an information set for player i. The pair \(\Gamma _{i}^{0}(h) = (S_{i}^{0}(h), \hat{S}_{-i}^{0}(h))\) is called the full decision problem for player i at h, where \(S_{i}^{0}(h) = S_{i}(h)\) and \(\hat{S}_{-i}^{0}(h) = S_{-i}(h)\). A pair \(\Gamma _{i}^{k}(h) = (S_{i}^{k}(h), \hat{S}_{-i}^{k}(h))\) is a reduced decision problem for player i at h, with \(S_{i}^{k}(h) \subseteq S_{i}^{0}(h)\) and \(\hat{S}_{-i}^{k}(h) \subseteq \hat{S}_{-i}^{0}(h)\).

Definition 15

(Strict dominance by a randomization) Let \(h \in H_{i}\) be an information set for player i, and \(\Gamma _{i}^{k}(h) = (S_{i}^{k}(h), \hat{S}_{-i}^{k}(h))\) be a reduced decision problem for player i at h. A strategy \(s_{i} \in S_{i}^{k}(h)\) is strictly dominated on \(\hat{S}_{-i}^{k}(h)\) by a randomization on \(A_{i} \subseteq S_{i}(h)\) if there is \(\rho _{i} \in \Delta (A_{i})\) such that

$$\begin{aligned} \sum _{s_{i}' \in A_{i}} \rho _{i}(s_{i}') u_{i}(z(s_{i}', s_{-i})) > u_{i}(z(s_{i}, s_{-i})) \end{aligned}$$

for all \(s_{-i} \in \hat{S}_{-i}^{k}(h)\).

Algorithm 1

Set \(S_{i}^{0}(h) = S_{i}(h)\) and \(\hat{S}_{-i}^{0}(h) = S_{-i}(h)\) for all \(i \in I\) and all \(h \in H_{i}\). For every \(k \ge 1\) we have:

Step k: For every player i and every information set \(h \in H_{i}\), we define

$$\begin{aligned} \begin{array}{ll} S_{i}^{k}(h) = \{s_{i} \in S_{i}^{k - 1}(h) \mid s_{i} &{}\text { is not strictly dominated on }\hat{S}_{-i}^{k - 1}(h)\\ &{}\text {by a randomization on }S_{i}(h)\},\\ \hat{S}_{-i}^{k}(h) = \{(s_{j})_{j \ne i} \in \hat{S}_{-i}^{k - 1}(h) \mid &{}\text {for all }j \ne i, s_{j}\hbox { is not strictly dominated}\\ &{}\text {on }\hat{S}_{-j}^{k - 1}(h')\text { by a randomization on }S_{j}(h')\\ &{}\text {for every }h' \in H_{j}(s_{j})\text { weakly following }h,\\ &{}\text {and }s_{j}\text { is not strictly dominated on }\hat{S}_{-j}^{k - 1}(h'')\\ &{}\text {by a randomization on }S_{j}(h) \cap S_{j}(h'')\\ &{}\text {for every }h'' \in H_{j}(s_{j})\text { weakly preceding }h\}. \end{array} \end{aligned}$$

The algorithm ends after K steps if \(S_{i}^{K + 1}(h) = S_{i}^{K}(h)\) and \(\hat{S}_{-i}^{K + 1}(h) = \hat{S}_{-i}^{K}(h)\) for every \(i \in I\) and every \(h \in H_{i}\).

Now we have the following result showing that the algorithm identifies the strategies that can be chosen under k-fold belief in future and restricted past rationality, and those that can be chosen under common belief in future and restricted past rationality.

Theorem 2

For every \(k \ge 1\)the strategies that can rationally be chosen by a type that expresses up to k-fold belief in future and restricted past rationality and up to k-fold belief in Bayesian updating are exactly the strategies \(s_{i}\)such that \(s_{i} \in S_{i}^{k + 1}(h)\)for all \(h \in H_{i}(s_{i})\), surviving the first \(k + 1\)steps of the algorithm.

The strategies that can rationally be chosen by a type that expresses common belief in future and restricted past rationality and common belief in Bayesian updating are exactly the strategies that survive the full algorithm, that is, the strategies \(s_{i}\)such that \(s_{i} \in S_{i}^{k}(h)\)for all \(k \ge 1\)and all \(h \in H_{i}(s_{i})\).

To illustrate the algorithm, we use the game from Fig. 1. We have that \(H_{1} = \{\varnothing , h_{2}\}\) and \(H_{2} = \{h_{1}\}\) and the initial sets of strategies:

$$\begin{aligned} \begin{array}{lll} S_{1}^{0}(\varnothing ) = \{(a, f), (a, g), b, c\}, &{}&{}\quad \hat{S}_{-1}^{0}(\varnothing ) = \{d, e\},\\ S_{2}^{0}(h_{1}) = \{d, e\}, &{} &{}\quad \hat{S}_{-2}^{0}(h_{1}) = \{(a, f), (a, g), b\},\\ S_{1}^{0}(h_{2}) = \{(a, f), (a, g)\}, &{} &{}\quad \hat{S}_{-1}^{0}(h_{2}) = \{d\}. \end{array} \end{aligned}$$

After the first step is applied, we obtain the following reduced decision problems:

$$\begin{aligned} \begin{array}{lll} S_{1}^{1}(\varnothing ) = \{c\}, &{} &{}\quad \hat{S}_{-1}^{1}(\varnothing ) = \{d, e\},\\ S_{2}^{1}(h_{1}) = \{d, e\}, &{} &{}\quad \hat{S}_{-2}^{1}(h_{1}) = \{(a, f)\},\\ S_{1}^{1}(h_{2}) = \{(a, f)\}, &{} &{}\quad \hat{S}_{-1}^{1}(h_{2}) = \{d\}. \end{array} \end{aligned}$$

Observe that at \(\varnothing \), b is strictly dominated by \((a, f) \in S_{1}^{0}(h_{1}) \cap S_{1}^{0}(\varnothing )\). We also have that at \(h_{2}\), (ag) is strictly dominated by \((a, f) \in S_{1}^{0}(h_{2})\). Therefore the only strategy that remains in \(\hat{S}_{-2}^{1}(h_{1})\) is (af).

At the second iteration of the algorithm we obtain:

$$\begin{aligned} \begin{array}{lll} S_{1}^{2}(\varnothing ) = \{c\}, &{} &{}\quad \hat{S}_{-1}^{2}(\varnothing ) = \{d\},\\ S_{2}^{2}(h_{1}) = \{d\}, &{} &{}\quad \hat{S}_{-2}^{2}(h_{1}) = \{(a, f)\},\\ S_{1}^{2}(h_{2}) = \{(a, f)\}, &{} &{}\quad \hat{S}_{-1}^{2}(h_{2}) = \{d\}. \end{array} \end{aligned}$$

We see that at \(h_{1}\), e is strictly dominated on \(\hat{S}_{-2}^{1}(h_{1})\) by d, so the only strategy in \(\hat{S}_{-1}^{2}(\varnothing )\) and \(S_{2}^{2}(h_{1})\) is d.

Since all the sets are singletons, the algorithm stops. Therefore the surviving strategies are c for player 1 and d for player 2, which are exactly the strategies that we found in Sect. 5 as those that can be chosen under common belief in future and restricted past rationality.

8 Concluding remarks

A new reasoning concept for dynamic games was introduced, which not only assumes rationality of the opponents in the future, but also assumes players reason about what happened in the past in the following way: if the game reaches an information set, players should consider only those strategies that actually reach that information set and believe that the opponent has chosen rationally in the past among that restricted set of strategies. In this way, players are reasoning at every information set about the past, but only a restricted part of it. We have also presented the fact that common belief in future and restricted past rationality can be obtained from using proper rationalizability in the normal form of the dynamic game, connecting these two concepts. Additionally, it was possible to define a procedure that starts from the decision problems in the dynamic game, and using strict dominance, selects the strategies that can be chosen under common belief in future and restricted past rationality.

An interesting continuation could involve the study of the robustness of the concept presented here to inessential transformations of the dynamic game as defined in Thompson (1952) and Kohlberg and Mertens (1986). For a quick glance, we can see that in Example 1 it is possible to transform the game by first allowing player 1 to choose between “c” and “not c”. If player 1 chooses not c then he has the chance to choose between a and b. It is even possible to switch around the order in which decisions are taken after not c as in Fig. 6, and in spite of that, we obtain the same prediction to the game under common belief in future and restricted past rationality, whereas concepts such as common belief in future rationality would fail to stay indifferent under these transformations.

Fig. 6
figure 6

Modification of the dynamic game

We can see that in the original game player 1 can choose c, whereas player 2 can choose both d and e under common belief in future rationality. However, in the modified game player 1 can choose c, while player 2 can only choose d under common belief in future rationality, since player 2 must believe at \(h_{1}\) that player 1 will rationally choose a and f in the future. Under common belief in future and restricted past rationality, in the modified game we also get that player 1 must choose c and player 2 must choose d. It would require some further analysis but from this example we can see common belief in future and restricted past rationality appears to be more robust to inessential transformations than common belief in future rationality.

Some more future research could include the application of this concept to other classes of games such as infinite games, repeated games and stochastic games, as well as finding an algorithm in each case that finds the choices that can be made under common belief in future and restricted past rationality.

Another problem that could be investigated in future work is whether we can find an equilibrium analogue to common belief in future and restricted past rationality, and how it would relate to existing equilibrium concepts for dynamic games. Such a search for an equilibrium analogue could be based on Perea and Predtetchinski (2019) who have shown that for two-player stochastic dynamic games with perfect information, subgame perfect equilibrium is equivalent to common belief in future rationality with a correct beliefs assumption. Since players have perfect information, the addition of restricted past rationality does not affect the result, so a natural extension would be to study the case of dynamic games with imperfect information.

Chen and Micali (2013) and Perea (2017) have proven that for finite dynamic games, the outcomes obtained under common strong belief in rationality are also reachable under common belief in future rationality, proving that common strong belief in rationality is a more restrictive concept in terms of outcomes. It would be interesting to study the relation in terms of outcomes of the concept of common strong belief in rationality and common belief in future and restricted past rationality.

9 Proofs

9.1 Proofs for Section 5

Proof

(Lemma 1) Let \(s_{i} \in S_{i}(h) \cap S_{i}(h')\) be a suboptimal choice for \(f_{i}(t_{i})\) among strategies in \(S_{i}(h) \cap S_{i}(h')\) at h. Then there is at least one \(s_{i}' \in S_{i}(h) \cap S_{i}(h')\) such that

$$\begin{aligned} u_{i}(s_{i}', \beta _{i}(f_{i}(t_{i}), h)) > u_{i}(s_{i}, \beta _{i}(f_{i}(t_{i}), h)).\qquad \qquad \qquad \qquad (*) \end{aligned}$$

Define \(\hat{s}_{i}\) as

$$\begin{aligned} \hat{s}_{i}(h'')&= s_{i}(h'') \text { for all } h'' \in H_{i}(s_{i}) \text { if }h''\text { does not weakly follow }h, \end{aligned}$$
(1a)
$$\begin{aligned} \hat{s}_{i}(h'')&= s_{i}'(h'') \text { for all } h'' \in H_{i}(s_{i}') \text { if }h''\text { weakly follows }h. \end{aligned}$$
(1b)

First we show that \(\hat{s}_{i} \in S_{i}(h) \cap S_{i}(h')\).

Since \(s_{i} \in S_{i}(h)\), there is \(s_{-i} \in S_{-i}(h)\) such that \((s_{i}, s_{-i})\) reaches h. Then at every \(h'' \in H(s_{i}, s_{-i})\) such that h follows \(h''\), we have \(\hat{s}_{i}(h'') = s_{i}(h'')\). Hence \(h \in H(\hat{s}_{i}, s_{-i})\) and \(\hat{s}_{i} \in S_{i}(h)\).

To show that \(\hat{s}_{i} \in S_{i}(h')\) we distinguish two cases: whether \(h'\) weakly precedes h or \(h'\) weakly follows h.

If \(h'\) weakly precedes h, then \(\hat{s}_{i} \in S_{i}(h')\) since \(\hat{s}_{i} \in S_{i}(h)\).

Assume now that \(h'\) weakly follows h. Since \(s_{i}' \in S_{i}(h')\), there is \(s_{-i} \in S_{-i}(h')\) such that \((s_{i}', s_{-i})\) reaches \(h'\). Then at every \(h'' \in H(s_{i}', s_{-i})\) weakly following h and weakly followed by \(h'\) we have by definition \(\hat{s}_{i}(h'') = s_{i}'(h'')\), and at every \(h'' \in H(s_{i}', s_{-i})\) such that h follows \(h''\) we know that \(\hat{s}_{i}(h'') = s_{i}(h'')\). But by perfect recall of player i, there exists a unique choice \(c_{i}^{*}(h'')\) at the information set \(h''\) such that h can be reached. Since both \(s_{i}, s_{i}' \in S_{i}(h)\), both strategies must choose \(c_{i}^{*}(h'')\). Therefore \(s_{i}(h'') = s_{i}'(h'')\) for all \(h''\) such that h follows \(h''\).

Hence, \(\hat{s}_{i}(h'') = s_{i}'(h'')\) at every \(h'' \in H(s_{i}', s_{-i})\) such that h weakly follows \(h''\). Since we have seen that \(\hat{s}_{i}(h'') = s_{i}'(h'')\) for all \(h'' \in H(s_{i}', s_{-i})\) weakly following h and weakly preceding \(h'\), the strategy combination \((\hat{s}_{i}, s_{-i})\) reaches \(h'\), and \(\hat{s}_{i} \in S_{i}(h')\).

By the two results above, we have that \(\hat{s}_{i} \in S_{i}(h) \cap S_{i}(h')\).

Now we will show that \(t_{i}\) prefers \(\hat{s}_{i}\) to \(s_{i}\). Let \(b_{i}(t_{i}) = (b_{i}^{1}(t_{i}); b_{i}^{2}(t_{i}); \ldots ; b_{i}^{m}(t_{i}))\) be the cautious lexicographic belief for type \(t_{i}\). Let k be the smallest number such that \(b_{i}^{k}(t_{i})(S_{-i}(h) \times T_{-i}) > 0\).

For \(\ell < k\), \(b_{i}^{\ell }(t_{i})(S_{-i}(h)) = 0\). Hence by (1a):

$$\begin{aligned} v_{i}^{\ell }(\hat{s}_{i}, b_{i}(t_{i})) = v_{i}^{\ell }(s_{i}, b_{i}(t_{i})) \end{aligned}$$

for all \(\ell < k\). Moreover

$$\begin{aligned} v_{i}^{k}(\hat{s}_{i}, b_{i}(t_{i}))&= \sum _{(s_{-i}, t_{-i}) \in S_{-i} \times T_{-i}} b_{i}^{k}(t_{i})(s_{-i}, t_{-i}) v_{i}(\hat{s}_{i}, s_{-i})\\&= \sum _{(s_{-i}, t_{-i}) \in S_{-i}(h) \times T_{-i}} b_{i}^{k}(t_{i}) (s_{-i}, t_{-i}) v_{i}(\hat{s}_{i}, s_{-i})\\&\quad + \sum _{(s_{-i}, t_{-i}) \in (S_{-i} {\setminus } S_{-i}(h)) \times T_{-i}} b_{i}^{k}(t_{i})(s_{-i}, t_{-i}) v_{i}(\hat{s}_{i}, s_{-i})\\&= \sum _{(s_{-i}, t_{-i}) \in S_{-i}(h) \times T_{-i}} b_{i}^{k}(t_{i}) (s_{-i}, t_{-i}) v_{i}(s'_{i}, s_{-i})\\&\quad + \sum _{(s_{-i}, t_{-i}) \in (S_{-i} {\setminus } S_{-i}(h)) \times T_{-i}} b_{i}^{k}(t_{i})(s_{-i}, t_{-i}) v_{i}(s_{i}, s_{-i})\\&= b_{i}^{k}(t_{i})(S_{-i}(h) \times T_{-i})\\&\quad \times \sum _{(s_{-i}, t_{-i}) \in S_{-i}(h) \times T_{-i}} \beta _{i}(f_{i}(t_{i}), h)(s_{-i}, f_{-i}(t_{-i})) u_{i}(z(s'_{i}, s_{-i}))\\&\quad + \sum _{(s_{-i}, t_{-i}) \in (S_{-i} {\setminus } S_{-i}(h)) \times T_{-i}} b_{i}^{k}(t_{i})(s_{-i}, t_{-i}) v_{i}(s_{i}, s_{-i})\\&= b_{i}^{k}(t_{i})(S_{-i}(h) \times T_{-i}) u_{i}(s_{i}', \beta _{i}(f_{i}(t_{i}), h))\\&\quad + \sum _{(s_{-i}, t_{-i}) \in (S_{-i} {\setminus } S_{-i}(h)) \times T_{-i}} b_{i}^{k}(t_{i})(s_{-i}, t_{-i}) v_{i}(s_{i}, s_{-i})\\&> b_{i}^{k}(t_{i})(S_{-i}(h) \times T_{-i}) u_{i}(s_{i}, \beta _{i}(f_{i}(t_{i}), h))\\&\quad + \sum _{(s_{-i}, t_{-i}) \in (S_{-i} {\setminus } S_{-i}(h)) \times T_{-i}} b_{i}^{k}(t_{i})(s_{-i}, t_{-i}) v_{i}(s_{i}, s_{-i})\\&= \sum _{(s_{-i}, t_{-i}) \in S_{-i}(h) \times T_{-i}} b_{i}^{k}(t_{i}) (s_{-i}, t_{-i}) v_{i}(s_{i}, s_{-i})\\&\quad + \sum _{(s_{-i}, t_{-i}) \in (S_{-i} {\setminus } S_{-i}(h)) \times T_{-i}} b_{i}^{k}(t_{i})(s_{-i}, t_{-i}) v_{i}(s_{i}, s_{-i})\\&= v_{i}^{k}(s_{i}, b_{i}(t_{i})), \end{aligned}$$

where (1a) and (1b) have been used in the third equality, and the inequality is obtained using (\(*\)) and the fact that \(b_{i}^{k}(t_{i})(S_{-i}(h) \times T_{-i}) > 0\). Hence we have the result we wanted to prove. \(\square \)

Proof

(Lemma 2) First we prove that respect of preferences implies belief in future rationality.

Let \(h \in H_{i}\). Suppose \(f_{i}(t_{i})\) does not believe at h in player j’s future rationality. Then

$$\begin{aligned} \beta _{i}(f_{i}(t_{i}), h)(s_{j}, f_{j}(t_{j})) > 0 \end{aligned}$$

for some \(s_{j} \in S_{j}(h')\) that is a suboptimal strategy for \(f_{j}(t_{j})\) at some \(h'\) that weakly follows h.

By Lemma 1 there exists \(\hat{s}_{j} \in S_{j}(h) \cap S_{j}(h')\) such that \(t_{j}\) prefers \(\hat{s}_{j}\) to \(s_{j}\). By the hypothesis, \(t_{i}\) respects j’s preferences, so it must deem \((\hat{s}_{j}, t_{j})\) infinitely more likely than \((s_{j}, t_{j})\). Hence, there is some k such that \(b_{i}^{k}(t_{i})(\hat{s}_{j}, t_{j}) > 0\) and \(b_{i}^{m}(t_{i})(s_{j}, t_{j}) = 0\) for all \(m \le k\). Since \(\hat{s}_{j} \in S_{j}(h)\), this implies that

$$\begin{aligned} \beta _{i}(f_{i}(t_{i}), h) (s_{j}, f_{j}(t_{j})) = 0 . \end{aligned}$$

by construction of the conditional belief at h. But this is a contradiction. Therefore, \(f_{i}(t_{i})\) believes at h in player j’s future rationality for all \(h \in H_{i}\).

Now we prove with a similar argument that respect of preferences implies belief in restricted past rationality.

Let \(h \in H_{i}\). Suppose \(f_{i}(t_{i})\) does not believe at h in player j’s restricted past rationality. Then

$$\begin{aligned} \beta _{i}(f_{i}(t_{i}), h)(s_{j}, f_{j}(t_{j})) > 0 \end{aligned}$$

for some \(s_{j} \in S_{j}(h) \cap S_{j}(h'')\) that is a suboptimal strategy for \(f_{j}(t_{j})\) among strategies in \(S_{j}(h) \cap S_{j}(h'')\) at \(h''\) that weakly precedes h. By Lemma 1 there exists \(\hat{s}_{j} \in S_{j}(h) \cap S_{j}(h'')\) such that \(t_{j}\) prefers \(\hat{s}_{j}\) to \(s_{j}\). By the hypothesis, \(t_{i}\) respects j’s preferences, so it must deem \((\hat{s}_{j}, t_{j})\) infinitely more likely than \((s_{j}, t_{j})\). Since \(\hat{s}_{j} \in S_{j}(h)\), then by construction of the conditional belief at h

$$\begin{aligned} \beta _{i}(f_{i}(t_{i}), h)(s_{j}, f_{j}(t_{j})) = 0 \end{aligned}$$

by an analogous argument as above, which is a contradiction. Therefore \(f_{i}(t_{i})\) believes at h in player j’s restricted past rationality. Finally, by construction, \(f_{i}(t_{i})\) satisfies Bayesian updating. \(\square \)

We define the set \(T^{*}(t_{i})\) as the set of types in \(t_{i}\)’s belief hierarchy in the normal form, that is, \(T^{*}(t_{i})\) is the smallest set with the property that \(t_{i} \in T^{*}(t_{i})\), and for every \(t_{j} \in T^{*}(t_{i})\), if \(t_{j}\) deems possible \(t_{k}\), then \(t_{k} \in T^{*}(t_{i})\).

Similarly we define \(\hat{T}^{*}(\hat{t}_{i})\) as the set of types in \(\hat{t}_{i}\)’s belief hierarchy in the dynamic form. More precisely, \(\hat{T}^{*}(\hat{t}_{i})\) is the smallest set such that \(\hat{t}_{i}\in \hat{T}^{*}(\hat{t}_{i})\) and for every \(\hat{t}_{j} \in \hat{T}^{*}(\hat{t}_{i})\), if \(\beta _{j}(\hat{t}_{j}, h)(s_{k}, \hat{t}_{k}) > 0\) for some \(h \in H_{j}\), then \(\hat{t}_{k} \in \hat{T}^{*}(\hat{t}_{i})\).

Proof

(Lemma 3) Let \(t_{i} \in T_{i}\) and construct the set \(T^{*}(t_{i})\). Since \(t_{i}\) is properly rationalizable, every type in \(T^{*}(t_{i})\) is cautious and respects the opponents’ preferences.

By construction, every type in \(T^{*}(t_{i})\) induces a type in \(\hat{T}^{*}(f_{i}(t_{i}))\). It then follows, by Lemma 2, that all types in \(\hat{T}^{*}(f_{i}(t_{i}))\) believe in the opponents’ future and restricted past rationality and believe the opponents satisfy Bayesian updating.

Then by definition, since all of the types in \(\hat{T}^{*}(f_{i}(t_{i}))\) only refer to types in \(\hat{T}^{*}(f_{i}(t_{i}))\), all express common belief in future and restricted past rationality and common belief in Bayesian updating.

Hence, in particular, \(f_{i}(t_{i})\) expresses common belief in future and restricted past rationality and common belief in Bayesian updating. \(\square \)

Proof

(Theorem 1) Since \(s_{i}\) is properly rationalizable, there is a type \(t_{i}\) that is properly rationalizable such that \(s_{i}\) is optimal for \(t_{i}\). By Lemma 3, \(f_{i}(t_{i})\) expresses common belief in future and restricted past rationality and common belief in Bayesian updating.

Now we show that \(s_{i}\) is also optimal for type \(f_{i}(t_{i})\) at every information set \(h \in H_{i}(s_{i})\).

Suppose that \(s_{i}\) is suboptimal for \(f_{i}(t_{i})\) at information set h. By Lemma 1, choosing \(h' = h\), there is a strategy \(\hat{s}_{i} \in S_{i}(h)\) such that \(t_{i}\) prefers \(\hat{s}_{i}\) to \(s_{i}\). Then \(s_{i}\) is not an optimal strategy for \(t_{i}\), which is a contradiction. \(\square \)

9.2 Proofs for Section 6

Before we prove Theorem 2 we require some auxiliary results, and the construction of an epistemic model according to the algorithm, which will have the desired properties.

We state the following result, first proved in Pearce (1984) for games with two players. A general proof can be found in Perea (2012).

Theorem 3

(Pearce’s lemma) Consider a reduced decision problem \(\Gamma _{i}^{k}(h) = (S_{i}^{k}(h), \hat{S}_{-i}^{k}(h))\), \(A_{i} \subseteq S_{i}^{k}(h)\)and \(s_{i} \in A_{i}\). Then \(s_{i}\)is optimal among strategies in \(A_{i}\)for some belief \(b_{i} \in \Delta (\hat{S}_{-i}^{k}(h))\)if and only if \(s_{i}\)is not strictly dominated on \(\hat{S}_{-i}^{k}(h)\)by a randomization on \(A_{i}\).

For \(i \in I\), \(h \in H_{i}\) and \(k \ge 1\) let \(B_{-i}^{k}(h)\) be the set of opponents’ strategy combinations \((s_{j})_{j \ne i} \in S_{-i}(h)\) such that there is some type \(t_{i}\) expressing up to k-fold belief in future and restricted past rationality that at h assigns positive probability to \((s_{j})_{j \ne i}\).

Lemma 4

For every player \(i \in I\), every information set \(h \in H_{i}\)and every \(k \ge 1\)we have that \(B_{-i}^{k}(h) \subseteq \hat{S}_{-i}^{k}(h)\).

Proof

We prove this statement by induction on k.

Let \(k = 1\). Consider a player \(i \in I\), an information set \(h \in H_{i}\) and let \(s_{-i} \in B_{-i}^{1}(h)\). Then there is a type \(t_{i}\) expressing up to 1-fold belief in future and restricted past rationality such that \(t_{i}\) assigns positive probability to \(s_{-i}\) at h.

Now consider an opponent \(j \ne i\). Since \(t_{i}\) believes in j’s future and restricted past rationality, then for every \(h' \in H_{j}(s_{j})\) weakly following h we can find a conditional belief \(\beta _{j}(t_{j}, h')\) for which \(s_{j}\) is optimal among strategies in \(S_{j}(h')\), and for every \(h'' \in H_{j}(s_{j})\) weakly preceding h we can find a conditional belief \(\beta _{j}(t_{j}, h'')\) for which \(s_{j}\) is optimal among strategies in \(S_{j}(h) \cap S_{j}(h'')\).

Then by Pearce’s lemma, for every \(h' \in H_{j}(s_{j})\) weakly following h, \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{0}(h')\) by a randomization on \(S_{j}(h')\) and for every \(h'' \in H_{j}(s_{j})\) weakly preceding h, \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{0}(h'')\) by a randomization on \(S_{j}(h) \cap S_{j}(h'')\). Therefore \((s_{j})_{j \ne i} \in \hat{S}_{-i}^{1}(h)\). Hence \(B_{-i}^{1}(h) \subseteq \hat{S}_{-i}^{1}(h)\), and this is true for all players \(i \in I\) and every information set \(h \in H_{i}\).

Now we proceed with the induction step. Fix \(k \ge 2\) and assume that for every player \(i \in I\) and every information set \(h \in H_{i}\), \(B_{-i}^{k - 1}(h) \subseteq \hat{S}_{-i}^{k - 1}(h)\).

Consider a player i, and let \((s_{j})_{j \ne i} \in B_{-i}^{k}(h)\). Then there is a type \(t_{i}\) that expresses up to k-fold belief in future and restricted past rationality such that \(t_{i}\) assigns positive probability to \((s_{j})_{j \ne i}\) at h.

Take an opponent \(j \ne i\). Then there must be some type \(t_{j}\) expressing up to \((k - 1)\)-fold belief in future and restricted past rationality such that \(s_{j}\) is optimal for \(t_{j}\) at every \(h' \in H_{j}(s_{j})\) weakly following h among strategies in \(S_{j}(h')\), and at every \(h'' \in H_{j}(s_{j})\) weakly preceding h among strategies in \(S_{j}(h) \cap S_{j}(h'')\).

By the induction assumption, since \(t_{j}\) assigns at every \(h' \in H_{j}\) positive probability only to opponents’ strategies in \(B_{-j}^{k - 1}(h')\), then \(t_{j}\) must assign, at every \(h' \in H_{j}\) positive probability only to opponents’ strategies in \(\hat{S}_{-j}^{k - 1}(h')\). Then \(s_{j}\) is optimal at every \(h' \in H_{j}(s_{j})\) weakly following h among strategies in \(S_{j}(h')\) for some conditional belief \(\beta _{j}(t_{j}, h')\) on \(\hat{S}_{-j}^{k - 1}(h')\), and at every \(h'' \in H_{j}(s_{j})\) weakly preceding h among strategies in \(S_{j}(h) \cap S_{j}(h'')\) for some conditional belief \(\beta _{j}(t_{j}, h'')\) on \(\hat{S}_{-j}^{k - 1}(h'')\). Therefore by Pearce’s lemma, at every \(h' \in H_{j}(s_{j})\) weakly following h, \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{k - 1}(h')\) by a randomization on \(S_{j}(h')\), and at every \(h'' \in H_{j}(s_{j})\) weakly preceding h, \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{k - 1}(h'')\) by a randomization on \(S_{j}(h) \cap S_{j}(h'')\). Hence, \((s_{j})_{j \ne i} \in \hat{S}_{-i}^{k}(h)\). Then \(B_{-i}^{k}(h) \subseteq \hat{S}_{-i}^{k}(h)\) and this is true for every player \(i \in I\) and every information set \(h \in H_{i}\).\(\square \)

For \(i \in I\) and \(k \ge 1\) let \( BR _{i}^{k}\) be the set of strategies for player i that are optimal for some type that expresses up to k-fold belief in future and restricted past rationality and common belief in Bayesian updating. We also define \(S_{i}^{k} = \{s_{i} \in S_{i} \mid s_{i} \in S_{i}^{k}(h) \text { for all } h \in H_{i}(s_{i})\}\).

Lemma 5

For every player \(i \in I\)and every \(k \ge 1\), \( BR _{i}^{k} \subseteq S_{i}^{k + 1}\).

Proof

Fix \(i \in I\) and \(k \ge 1\). Let \(s_{i} \in BR _{i}^{k}\), then there is a type \(t_{i}\) that expresses up to k-fold belief in future and restricted past rationality such that \(s_{i}\) is optimal for \(t_{i}\) at every \(h \in H_{i}(s_{i})\). By definition, at every \(h \in H_{i}(s_{i})\), \(t_{i}\) assigns positive probability to \((s_{j})_{j \ne i}\) only if \((s_{j})_{j \ne i} \in B_{-i}^{k}(h)\). By Lemma 4, at every \(h \in H_{i}(s_{i})\), \(t_{i}\) assigns positive probability to \((s_{j})_{j \ne i}\) only if \((s_{j})_{j \ne i} \in \hat{S}_{-i}^{k}(h)\). Therefore, \(s_{i}\) is optimal at \(h \in H_{i}(s_{i})\) for some conditional belief \(\beta _{i}(t_{i}, h)\) on \(\hat{S}_{-i}^{k}(h)\). Hence by Pearce’s lemma, \(s_{i}\) is not strictly dominated at \(h \in H_{i}(s_{i})\) on \(\hat{S}_{-i}^{k}(h)\) by a randomization on \(S_{i}(h)\). This implies that \(s_{i}\) survives step \(k + 1\) of the algorithm, that is, \(s_{i} \in S_{i}^{k + 1}\). Then \( BR _{i}^{k} \subseteq S_{i}^{k + 1}\) and this holds for every player \(i \in I\) and every \(k \ge 1\).\(\square \)

Lemma 6

Let \(\beta _{i}=(\beta _{i}(h))_{h \in H_{i}}\)be a conditional belief vector, where \(\beta _{i}(h)\in \Delta (S_{-i}(h))\)for every \(h \in H_{i}\). Let \(h', h'' \in H_{i}\)be such that \(h'\)precedes \(h''\), \(\beta _{i}(h')(S_{-i}(h'')) > 0,\)and

$$\begin{aligned} \beta _{i}(h'')(s_{-i})=\dfrac{\beta _{i}(h')(s_{-i})}{\beta _{i}(h')(S_{-i}(h''))} \end{aligned}$$

for all \(s_{-i} \in S_{-i}(h'').\)Consider some \(h \in H\)and \(s_{i} \in S_{i}(h'') \cap S_{i}(h),\)and suppose that \(s_{i}\)is optimal for \(\beta _{i}(h')\)among strategies in \(S_{i}(h') \cap S_{i}(h)\). Then, \(s_{i}\)is optimal for \(\beta _{i}(h'')\)among strategies in \(S_{i}(h'') \cap S_{i}(h)\).

Proof

Suppose that \(s_{i}\) is optimal for \(\beta _{i}(h')\) among strategies in \(S_{i}(h') \cap S_{i}(h)\). Then,

$$\begin{aligned} u_{i}(s_{i}, \beta _{i}(h'))&= \sum _{s_{-i}\in S_{-i}(h')}\beta _{i}(h')(s_{-i}) u_{i}(z(s_{i},s_{-i}))\\&= \sum _{s_{-i} \in S_{-i}(h'')} \beta _{i}(h')(s_{-i}) u_{i}(z(s_{i}, s_{-i})) \\&\quad + \sum _{s_{-i} \in S_{-i}(h') {\setminus } S_{-i}(h'')}\beta _{i}(h')(s_{-i}) u_{i}(z(s_{i}, s_{-i})) \\&= \beta _{i}(h')(S_{-i}(h'')) \sum _{s_{-i}\in S_{-i}(h'')} \beta _{i}(h'')(s_{-i}) u_{i}(z(s_{i},s_{-i})) \\&\quad + \sum _{s_{-i}\in S_{-i}(h') {\setminus } S_{-i}(h'')} \beta _{i}(h')(s_{-i}) u_{i}(z(s_{i},s_{-i})) \\&= \beta _{i}(h')(S_{-i}(h'')) u_{i}(s_{i},\beta _{i}(h''))\\&\quad + \sum _{s_{-i}\in S_{-i}(h') {\setminus } S_{-i}(h'')}\beta _{i}(h')(s_{-i}) u_{i}(z(s_{i},s_{-i})). \end{aligned}$$

Here, the third equality follows from the assumption that

$$\begin{aligned} \beta _{i}(h'')(s_{-i})=\dfrac{\beta _{i}(h')(s_{-i})}{\beta _{i}(h')(S_{-i}(h''))} \end{aligned}$$

for all \(s_{-i}\in S_{-i}(h'')\).

Assume now, contrary to what we want to prove, that \(s_{i}\) is not optimal for \(\beta _{i}(h'')\) among strategies in \(S_{i}(h'')\cap S_{i}(h).\) That is, there is some strategy \(s_{i}' \in S_{i}(h'') \cap S_{i}(h)\) for which

$$\begin{aligned} u_{i}(s_{i},\beta _{i}(h'')) < u_{i}(s_{i}',\beta _{i}(h'')). \end{aligned}$$

Let \(s_{i}''\) be the strategy that coincides with \(s_{i}'\) at all information sets in \(H_{i}\) weakly following \(h''\), and that coincides with \(s_{i}\) at all other information sets in \(H_{i}\). As \(s_{i}, s_{i}' \in S_{i}(h'')\cap S_{i}(h)\), it follows that \(s_{i}''\in S_{i}(h'')\cap S_{i}(h)\). Since \(h'\) precedes \(h''\), it follows that \(s_{i}'' \in S_{i}(h')\cap S_{i}(h)\). Moreover, by construction of \(s_{i}''\),

$$\begin{aligned} u_{i}(s_{i}'',\beta _{i}(h'))&=\beta _{i}(h')(S_{-i}(h'')) u_{i}(s_{i}'', \beta _{i}(h''))\\&\quad + \sum _{s_{-i} \in S_{-i}(h') {\setminus } S_{-i}(h'')}\beta _{i}(h')(s_{-i}) u_{i}(z(s_{i}'', s_{-i}))\\&= \beta _{i}(h')(S_{-i}(h'')) u_{i}(s_{i}', \beta _{i}(h''))\\&\quad + \sum _{s_{-i} \in S_{-i}(h') {\setminus } S_{-i}(h'')} \beta _{i}(h')(s_{-i}) u_{i}(z(s_{i}, s_{-i}))\\&> \beta _{i}(h')(S_{-i}(h'')) u_{i}(s_{i},\beta _{i}(h''))\\&\quad + \sum _{s_{-i} \in S_{-i}(h') {\setminus } S_{-i}(h'')} \beta _{i}(h')(s_{-i}) u_{i}(z(s_{i}, s_{-i}))\\&= u_{i}(s_{i}, \beta _{i}(h')). \end{aligned}$$

For the inequality, we have been using the assumption that \(\beta _{i}(h')(S_{-i}(h''))>0\). However, this would mean that \(u_{i}(s_{i}'', \beta _{i}(h')) > u_{i}(s_{i}, \beta _{i}(h'))\). As \(s_{i}'' \in S_{i}(h') \cap S_{i}(h)\), this contradicts our assumption that \(s_{i}\) is optimal for \(\beta _{i}(h')\) among strategies in \(S_{i}(h') \cap S_{i}(h).\) Hence, \(s_{i}\) must be optimal for \(\beta _{i}(h'')\) among strategies in \(S_{i}(h'')\cap S_{i}(h)\). This completes the proof. \(\square \)

Lemma 7

Let \(h, h' \in H_{i}\)be such that \(h'\)precedes h. Then if \(s_{-i} \in \hat{S}_{-i}^{k}(h') \cap S_{-i}(h)\)we have that \(s_{-i} \in \hat{S}_{-i}^{k}(h)\).

Proof

Let \(s_{-i} \in \hat{S}_{-i}^{k}(h') \cap S_{-i}(h)\), with \(s_{-i} = (s_{j})_{j \ne i}\). Then, for every player \(j \ne i\), we have for every information set \(h'' \in H_{j}(s_{j})\) weakly following \(h'\) that \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{k - 1}(h'')\) by a randomization on \(S_{j}(h'')\), and for every information set \(h''' \in H_{j}(s_{j})\) weakly preceding \(h'\) that \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{k - 1}(h''')\) by a randomization on \(S_{j}(h') \cap S_{j}(h''')\).

Take an information set \(h'' \in H_{j}(s_{j})\) that weakly follows h. Then \(h''\) weakly follows \(h'\), and we know from above that \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{k - 1}(h'')\) by a randomization on \(S_{j}(h'')\).

Now take an information set \(h''' \in H_{j}(s_{j})\) that weakly precedes h. Then either \(h'''\) weakly precedes \(h'\), or \(h'''\) weakly follows \(h'\).

If \(h'''\) weakly precedes \(h'\), then we know from above that \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{k - 1}(h''')\) by a randomization on \(S_{j}(h') \cap S_{j}(h''')\). As \(S_{j}(h) \subseteq S_{j}(h')\), we conclude that \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{k - 1}(h''')\) by a randomization on \(S_{j}(h) \cap S_{j}(h''')\).

On the other hand, if \(h'''\) weakly follows \(h'\), then we know from above that \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{k - 1}(h''')\) by a randomization on \(S_{j}(h''')\). Hence, in particular, \(s_{j}\) is not strictly dominated on \(\hat{S}_{-j}^{k - 1}(h''')\) by a randomization on \(S_{j}(h) \cap S_{j}(h''')\).

All this implies that \(s_{-i} \in \hat{S}_{-i}^{k}(h).\)\(\square \)

For every \(i \in I \), \(h \in H\) and \(k \ge 1\) we define \(R_{i}^{k}(h)\) as the set of strategies \(s_{i} \in S_{i}(h)\) such that \(s_{i}\) is not strictly dominated on \(\hat{S}_{-i}^{k - 1}(h')\) at every \(h' \in H_{i}(s_{i})\) weakly following h among strategies in \(S_{i}(h')\), and \(s_{i}\) is not strictly dominated on \(\hat{S}_{-i}^{k - 1}(h'')\) at every \(h'' \in H_{i}(s_{i})\) weakly preceding h among strategies in \(S_{i}(h) \cap S_{i}(h'')\). Notice that \(R_{i}^{k}(h) \subseteq S_{i}^{k}(h)\) for all \(i\in I\), \(h \in H_{i}\) and \(k \ge 1\).

Suppose that the algorithm ends after K steps, that is \(S_{i}^{K + 1}(h) = S_{i}^{K}(h)\) and \(\hat{S}_{-i}^{K + 1}(h) = \hat{S}_{-i}^{K}(h)\) for every player \(i \in I\) and every information set \(h \in H_{i}\). In order to prove that \(S_{i}^{k + 1} \subseteq BR _{i}^{k}\) we construct an epistemic model with the following characteristics:

  1. 1.

    For every information set h, every player i and every strategy \(s_{i} \in R_{i}^{1}(h)\) there is a type \(t_{i}^{s_{i}, h}\) such that \(s_{i}\) is optimal for \(t_{i}^{s_{i}, h}\) at every \(h' \in H_{i}(s_{i})\) weakly following h among strategies in \(S_{i}(h')\), and at every \(h'' \in H_{i}(s_{i})\) weakly preceding h among strategies in \(S_{i}(h) \cap S_{i}(h'')\).

  2. 2.

    For every \(k \ge 2\), if \(s_{i} \in R_{i}^{k}(h)\) then the associated type \(t_{i}^{s_{i}, h}\) expresses up to \((k - 1)\)-fold belief in future and restricted past rationality.

  3. 3.

    If \(s_{i} \in R_{i}^{K}(h)\) then the associated type \(t_{i}^{s_{i}, h}\) expresses common belief in future and restricted past rationality.

  4. 4.

    Every type satisfies Bayesian updating.

Construction of the epistemic model


We start with the construction of beliefs for the model. For \(i \in I\) take an information set \(h \in H\) and let \(D_{i}^{k}(h) = R_{i}^{k}(h) {\setminus } R_{i}^{k + 1}(h)\) for all \(k \ge 1\).

Consider \(k \in \{1, 2, \ldots , K - 1\}\) and \(s_{i} \in D_{i}^{k}(h)\). By definition and Pearce’s lemma, for every \(h' \in H_{i}(s_{i})\) weakly following h there is a conditional belief \(\hat{\beta }_{i}^{s_{i}, h}(h')\) on \(\hat{S}_{-i}^{k - 1}(h')\) such that \(s_{i}\) is optimal for \(\hat{\beta }_{i}^{s_{i}, h}(h')\) among strategies in \(S_{i}(h')\), and for every \(h'' \in H_{i}(s_{i})\) weakly preceding h there is a conditional belief \(\hat{\beta }_{i}^{s_{i}, h}(h'')\) on \(\hat{S}_{-i}^{k - 1}(h'')\) such that \(s_{i}\) is optimal for \(\hat{\beta }_{i}^{s_{i}, h}(h'')\) among strategies in \(S_{i}(h) \cap S_{i}(h'')\). For every other \(h''' \in H_{i}\), define \(\hat{\beta }_{i}^{s_{i}, h}(h''')\) on \(\hat{S}_{-i}^{k - 1}(h''')\) arbitrarily.

Consider \(s_{i} \in R_{i}^{K}(h)\). Then \(s_{i}\in R_{i}^{K + 1}(h)\) as well. By definition of \(R_{i}^{K + 1}(h)\), for every \(h' \in H_{i}(s_{i})\) weakly following h there is a conditional belief \(\hat{\beta }_{i}^{s_{i}, h}(h')\) on \(\hat{S}_{-i}^{K}(h')\) such that \(s_{i}\) is optimal for \(\hat{\beta }_{i}^{s_{i}, h}(h')\) among strategies in \(S_{i}(h')\), and for every \(h'' \in H_{i}(s_{i})\) weakly preceding h there is a conditional belief \(\hat{\beta }_{i}^{s_{i}, h}(h'')\) on \(\hat{S}_{-i}^{K}(h'')\) such that \(s_{i}\) is optimal for \(\hat{\beta }_{i}^{s_{i},h}(h'')\) among strategies in \(S_{i}(h)\cap S_{i}(h'')\). For every other \(h''' \in H_{i}\), define \(\hat{\beta }_{i}^{s_{i}, h}(h''')\) on \(\hat{S}_{-i}^{K}(h''')\) arbitrarily.

For every \(k \ge 1\) and every \(h' \in H_{i}\), we define \(R_{-i}^{k}(h') = \mathop {\times }\nolimits _{j \ne i} R_{j}^{k}(h').\) Then, by construction, \(\hat{S}_{-i}^{k}(h') = R_{-i}^{k}(h')\). Hence, for every \(s_{i} \in D_{i}^{k}(h)\) with \(k \in \{1,2, \ldots , K - 1\}\), we have that \(\hat{\beta }_{i}^{s_{i}, h}(h') \in \Delta (R_{-i}^{k - 1}(h'))\) for all \(h' \in H_{i}\). Moreover, for every \(s_{i} \in R_{i}^{K}(h)\) we have that \(\hat{\beta }_{i}^{s_{i}, h}(h') \in \Delta (R_{-i}^{K}(h'))\) for all \(h' \in H_{i}\).

Take some \(s_{i} \in R_{i}^{1}(h)\). We now transform the conditional belief vector \(\hat{\beta }_{i}^{s_{i}, h} = (\hat{\beta }_{i}^{s_{i}, h}(h'))_{h' \in H_{i}}\) into a conditional belief vector \(\beta _{i}^{s_{i}, h}\) that satisfies Bayesian updating, as follows. Consider an information set \(h' \in H_{i}\). Suppose first that there is some \(h'' \in H_{i}\) preceding \(h'\) with \(\hat{\beta }_{i}^{s_{i}, h}(h'')(S_{-i}(h')) > 0\). Then, let \(h''\) be the unique information set in \(H_{i}\) such that \(h''\) precedes \(h'\), \(\hat{\beta }_{i}^{s_{i}, h}(h'')(S_{-i}(h')) > 0\), and there is no \(h''' \in H_{i}\) preceding \(h''\) with \(\hat{\beta }_{i}^{s_{i}, h}(h''')(S_{-i}(h')) > 0\). In that case, define

$$\begin{aligned} \beta _{i}^{s_{i}, h}(h')(s_{-i}) := \frac{\hat{\beta }_{i}^{s_{i}, h}(h'')(s_{-i})}{\hat{\beta }_{i}^{s_{i}, h}(h'')(S_{-i}(h'))} \end{aligned}$$
(1)

for every \(s_{-i} \in S_{-i}(h')\). If, on the other hand, there is no \(h'' \in H_{i}\) preceding \(h'\) with \(\hat{\beta }_{i}^{s_{i}, h}(h'')(S_{-i}(h')) > 0\), then we let

$$\begin{aligned} \beta _{i}^{s_{i}, h}(h')(s_{-i}) := \hat{\beta }_{i}^{s_{i}, h}(h')(s_{-i}) \end{aligned}$$
(2)

for every \(s_{-i} \in S_{-i}(h')\).

By construction, the conditional belief vector \(\beta _{i}^{s_{i}, h}\) satisfies Bayesian updating. We now show that, for every \(h' \in H_{i}(s_{i})\) weakly preceding or weakly following h, the strategy \(s_{i}\) is optimal for \(\beta _{i}^{s_{i}, h}(h')\) among strategies in \(S_{i}(h')\cap S_{i}(h)\). Consider some \(h' \in H_{i}(s_{i})\) weakly preceding or weakly following h. If there is no \(h'' \in H_{i}\) preceding \(h'\) with \(\hat{\beta }_{i}^{s_{i}, h}(h'')(S_{-i}(h')) > 0\) then by (2), \(\beta _{i}^{s_{i},h}(h')=\hat{\beta }_{i}^{s_{i}, h}(h')\). Since, by construction, \(s_{i}\) is optimal for \(\hat{\beta }_{i}^{s_{i}, h}(h')\) among strategies in \(S_{i}(h') \cap S_{i}(h)\), it follows that \(s_{i}\) is optimal for \(\beta _{i}^{s_{i}, h}(h')\) among strategies in \(S_{i}(h')\cap S_{i}(h)\). Suppose now that there is some \(h'' \in H_{i}\) preceding \(h'\) with \(\hat{\beta }_{i}^{s_{i}, h}(h'')(S_{-i}(h')) > 0\). Then, \(\beta _{i}^{s_{i}, h}(h')\) is obtained from \(\hat{\beta }_{i}^{s_{i}, h}(h'')\) by (1), where \(h'' \in H_{i}\) preceeds \(h'\), \(\hat{\beta }_{i}^{s_{i}, h}(h'')(S_{-i}(h')) > 0\), and there is no \(h''' \in H_{i}\) preceding \(h''\) with \(\hat{\beta }_{i}^{s_{i}, h}(h''')(S_{-i}(h')) > 0\). By construction, \(s_{i}\) is optimal for \(\hat{\beta }_{i}^{s_{i}, h}(h'')\) among strategies in \(S_{i}(h'') \cap S_{i}(h)\). As \( h'' \in H_{i}\) precedes \(h' \in H_{i}\), it follows by (2) and Lemma 6 that \(s_{i}\) is optimal for \(\beta _{i}^{s_{i},h}(h^{\prime })\) among strategies in \(S_{i}(h') \cap S_{i}(h)\). Hence, we conclude that for every \(h' \in H_{i}(s_{i})\) weakly preceding or weakly following h, the strategy \(s_{i}\) is optimal for \(\beta _{i}^{s_{i}, h}(h')\) among strategies in \(S_{i}(h') \cap S_{i}(h)\).

Suppose that \(s_{i} \in D_{i}^{k}(h)\) for some \(k \in \{1, 2, \ldots , K - 1\}\). We prove that \(\beta _{i}^{s_{i}, h}(h') \in \Delta (R_{-i}^{k - 1}(h'))\) for all \(h' \in H_{i}\). Take some \( h' \in H_{i}\), and suppose there is no \(h'' \in H_{i}\) preceding \(h'\) with \(\hat{\beta }_{i}^{s_{i}, h}(h'')(S_{-i}(h')) > 0\). Then by (2), \(\beta _{i}^{s_{i}, h}(h') = \hat{\beta }_{i}^{s_{i}, h}(h')\). As \(\hat{\beta }_{i}^{s_{i}, h}(h') \in \Delta (R_{-i}^{k - 1}(h'))\), it follows that \(\beta _{i}^{s_{i}, h}(h') \in \Delta (R_{-i}^{k - 1}(h'))\). Assume next that there is some \(h'' \in H_{i}\) preceding \(h'\) with \(\hat{\beta }_{i}^{s_{i}, h}(h'')(S_{-i}(h'))>0\). Then, \(\beta _{i}^{s_{i}, h}(h')\) is obtained from \(\hat{\beta }_{i}^{s_{i}, h}(h'')\) by (1), where \(h'' \in H_{i}\) precedes \(h'\), \(\hat{\beta }_{i}^{s_{i}, h}(h'')(S_{-i}(h')) > 0\), and there is no \(h''' \in H_{i}\) preceding \(h''\) with \(\hat{\beta }_{i}^{s_{i}, h}(h''')(S_{-i}(h')) > 0\). Take some \(s_{-i} \in S_{-i}(h')\) with \(\beta _{i}^{s_{i}, h}(h')(s_{-i}) > 0\). By (1) we then must have that \(\hat{\beta }_{i}^{s_{i}, h}(h'')(s_{-i}) > 0\). Since we have seen that \(\hat{\beta }_{i}^{s_{i}, h}(h'') \in \Delta (R_{-i}^{k - 1}(h''))\), it must be that \(s_{-i} \in R_{-i}^{k - 1}(h'')\). Hence, \(s_{-i} \in R_{-i}^{k - 1}(h'') \cap S_{-i}(h')\). Since \(R_{-i}^{k - 1}(h'') = \hat{S}_{-i}^{k - 1}(h'')\), we know that \(s_{-i} \in \hat{S}_{-i}^{k - 1}(h'') \cap S_{-i}(h')\). As \(h''\) precedes \(h'\), by Lemma 7, \(s_{-i}\in \hat{S}_{-i}^{k - 1}(h') = R_{-i}^{k - 1}(h')\). Since this holds for every \(s_{-i}\in S_{-i}(h')\) with \(\beta _{i}^{s_{i}, h}(h')(s_{-i}) > 0\), we conclude that \(\beta _{i}^{s_{i}, h}(h')\in \Delta (R_{-i}^{k - 1}(h'))\). In the same fashion, it can be shown that for every \(s_{i}\in R_{i}^{K}(h)\), we have that \(\beta _{i}^{s_{i}, h}(h')\in \Delta (R_{-i}^{K}(h'))\) for all \(h' \in H_{i}\).

Now we proceed with the construction of types for the epistemic model. For player \(i \in I\) we define the set of types \(T_{i} = \{t_{i}^{s_{i}, h} \mid h \in H \text { and } s_{i} \in R_{i}^{1}(h)\}\). For every player \(i \in I\), every information set \(h \in H\) and every \(k \in \{1,\ldots , K\}\) let \(T_{i}^{k}(h) = \{t_{i}^{s_{i}, h} \mid s_{i}\in R_{i}^{k}(h)\}\). Since \(R_{i}^{K}(h) \subseteq R_{i}^{K - 1}(h) \subseteq \cdots \subseteq R_{i}^{2}(h)\subseteq R_{i}^{1}(h)\), then \(T_{i}^{K}(h) \subseteq T_{i}^{K - 1}(h) \subseteq \cdots \subseteq T_{i}^{2}(h) \subseteq T_{i}^{1}(h)\) for every player \(i \in I\) and every information set \(h \in H\).

For every player i, every \(k \ge 1\), and every two information sets \(h, h'\) where h precedes \(h'\), we know by Lemma 7 that \(R_{i}^{k}(h) \cap S_{i}(h') \subseteq R_{i}^{k}(h')\). Hence, if \(t_{i}^{s_{i}, h}\in T_{i}^{k}(h)\) with \(s_{i}\in R_{i}^{k}(h) \cap S_{i}(h')\), then \(s_{i} \in R_{i}^{k}(h')\). In that case, we identify the types \(t_{i}^{s_{i}, h}\) and \(t_{i}^{s_{i}, h'}\). Hence, formally, \(t_{i}^{s_{i}, h}=t_{i}^{s_{i}, h'}\) whenever h precedes \(h'\) and \(s_{i} \in S_{i}(h')\).

For every player \(i \in I\) and every information set \(h \in H\) we now construct the beliefs for each type in \(T_{i}^{1}(h)\).

Consider \(t_{i}^{s_{i}, h}\) with \(s_{i} \in D_{i}^{1}(h)\), that is, \( t_{i}^{s_{i}, h} \in T_{i}^{1}(h) {\setminus } T_{i}^{2}(h)\). We define the conditional belief vector \(\beta _{i}(t_{i}^{s_{i}, h})\) in the following way: For each \(j \ne i\) take an arbitrary type \(\hat{t}_{j}\) and consider an information set \(h' \in H_{i}\). Let

$$\begin{aligned} \beta _{i}(t_{i}^{s_{i}, h}, h')((s_{j}, t_{j})_{j \ne i}) = {\left\{ \begin{array}{ll} \beta _{i}^{s_{i}, h}(h')((s_{j})_{j \ne i}) &{} \text {if }t_{j} = \hat{t}_{j}\text { for every }j \ne i, \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Then at every \(h' \in H_{i}\), type \(t_{i}^{s_{i}, h}\) holds the same belief about the opponents’ strategy choices as \(\beta _{i}^{s_{i}, h}\). Moreover, \(t_{i}^{s_{i}, h}\) satisfies Bayesian updating, as \(\beta _{i}^{s_{i}, h}\) satisfies Bayesian updating. By construction of the beliefs, \(s_{i}\) is optimal for \(\beta _{i}^{s_{i}, h}(h')\) at every \(h' \in H_{i}(s_{i})\) weakly following h among strategies in \(S_{i}(h')\) and \(s_{i}\) is optimal for \(\beta _{i}^{s_{i}, h}(h'')\) at every \(h'' \in H_{i}(s_{i})\) weakly preceding h among strategies in \(S_{i}(h)\cap S_{i}(h'')\).

Therefore \(s_{i}\) is optimal for type \(t_{i}^{s_{i}, h}\) at every \(h' \in H_{i}(s_{i})\) weakly following h among strategies in \(S_{i}(h')\) and at every \(h'' \in H_{i}(s_{i})\) weakly preceding h among strategies in \(S_{i}(h) \cap S_{i}(h'')\).

Now consider \(t_{i}^{s_{i}, h}\) with \(s_{i}\in D_{i}^{k}(h)\) for some \(k \in \{2, 3, \ldots , K - 1\}\). Hence \(t_{i}^{s_{i}, h} \in T_{i}^{k}(h) {\setminus } T_{i}^{k + 1}(h)\). We define the conditional belief vector \(\beta _{i}(t_{i}^{s_{i}, h})\) as follows: For every information set \(h' \in H_{i}\) let \(\beta _{i}(t_{i}^{s_{i}, h}, h')\) be the conditional belief at \(h'\) about the opponents’ strategy-type pairs given by:

$$\begin{aligned} \beta _{i}(t_{i}^{s_{i}, h}, h')((s_{j},t_{j})_{j\ne i})= {\left\{ \begin{array}{ll} \beta _{i}^{s_{i}, h}(h')((s_{j})_{j\ne i}) &{} \text {if } t_{j}=t_{j}^{s_{j},h'}\text { for every }j\ne i,\\ 0&{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(3)

We first show that type \(t_{i}^{s_{i}, h}\) satisfies Bayesian updating. Suppose that \(h', h'' \in H_{i}\) are such that \(h'\) precedes \(h''\) and \(\beta _{i}(t_{i}^{s_{i}, h}, h')(S_{-i}(h'') \times T_{-i}) > 0\). Then by (3), \(\beta _{i}^{s_{i}, h}(h')(S_{-i}(h'')) > 0\). As \(\beta _{i}^{s_{i}, h}\) satisfies Bayesian updating, we must have that

$$\begin{aligned} \beta _{i}^{s_{i}, h}(h'')((s_{j})_{j\ne i}) = \frac{\beta _{i}^{s_{i}, h}(h')((s_{j})_{j\ne i})}{\beta _{i}^{s_{i}, h}(h')(S_{-i}(h''))} \end{aligned}$$
(4)

for all \((s_{j})_{j\ne i} \in S_{-i}(h'')\). Now, let \(\beta _{i}(t_{i}^{s_{i}, h}, h'')((s_{j},t_{j})_{j\ne i}) > 0\). Then by (3), \(t_{j} = t_{j}^{s_{j}, h''}\) for every \(j\ne i\). Since \(s_{j}\in S_{j}(h'')\) and \(h'\) precedes \(h''\) we know by construction, that \(t_{j}^{s_{j}, h'} = t_{j}^{s_{j},h''}\) for every \(j\ne i\). This implies that

$$\begin{aligned} \beta _{i}(t_{i}^{s_{i}, h}, h'')((s_{j}, t_{j})_{j\ne i})&= \beta _{i}(t_{i}^{s_{i}, h}, h'')((s_{j}, t_{j}^{s_{j}, h''})_{j \ne i})\\&= \beta _{i}^{s_{i}, h}(h'')((s_{j})_{j \ne i})\\&= \frac{\beta _{i}^{s_{i}, h}(h')((s_{j})_{j \ne i})}{\beta _{i}^{s_{i}, h}(h')(S_{-i}(h''))}\\&= \frac{\beta _{i}(t_{i}^{s_{i}, h}, h')((s_{j}, t_{j}^{s_{j}, h'})_{j \ne i})}{\beta _{i}(t_{i}^{s_{i}, h}, h')(S_{-i}(h'') \times T_{-i})}\\&= \frac{\beta _{i}(t_{i}^{s_{i}, h}, h')((s_{j}, t_{j}^{s_{j}, h''})_{j \ne i})}{\beta _{i}(t_{i}^{s_{i}, h}, h')(S_{-i}(h'')\times T_{-i})}\\&= \frac{\beta _{i}(t_{i}^{s_{i}, h}, h')((s_{j}, t_{j})_{j \ne i})}{\beta _{i}(t_{i}^{s_{i}, h}, h')(S_{-i}(h'') \times T_{-i})}. \end{aligned}$$

Here, the first and the last equality follow from the fact that \(t_{j}=t_{j}^{s_{j}, h''}\) for every \(j \ne i\), the second equality from (3) applied to \(h''\), the third equality from (4), the fourth equality from (3) applied to \(h'\), and the fifth equality from the fact that \(t_{j}^{s_{j}, h'} = t_{j}^{s_{j}, h''}\) for every \(j \ne i\). Hence, we conclude that \(t_{i}^{s_{i} ,h}\) satisfies Bayesian updating.

By construction of the beliefs, strategy \(s_{i}\) is optimal for \(\beta _{i}^{s_{i}, h}(h')\) at every \(h' \in H_{i}(s_{i})\) weakly following h among strategies in \(S_{i}(h')\) and \(s_{i}\) is optimal for \(\beta _{i}^{s_{i}, h}(h'')\) at every \(h'' \in H_{i}(s_{i})\) weakly preceding h among strategies in \(S_{i}(h) \cap S_{i}(h'')\). Therefore \(s_{i}\) is optimal for type \(t_{i}^{s_{i}, h}\) at every \(h' \in H_{i}(s_{i})\) weakly following h among strategies in \(S_{i}(h')\) and at every \(h'' \in H_{i}(s_{i})\) weakly preceding h among strategies in \(S_{i}(h) \cap S_{i}(h'')\).

Recall that at every \(h' \in H_{i},\) the belief \(\beta _{i}^{s_{i}, h}(h') \in \Delta (R_{-i}^{k - 1}(h'))\) assigns positive probability only to opponents’ strategies in \(R_{j}^{k - 1}(h')\). Hence type \(t_{i}^{s_{i}, h}\) assigns at every \(h' \in H_{i}\) positive probability only to opponents’ types \(t_{j}^{s_{j}, h'}\) where \(s_{j}\in R_{j}^{k - 1}(h')\). That is, type \(t_{i}^{s_{i}, h}\) assigns at every \(h' \in H_{i}\) positive probability only to opponents’ types in \(T_{j}^{k - 1}(h')\).

Finally, consider types \(t_{i}^{s_{i}, h}\) with \(s_{i} \in R_{i}^{K}(h)\), that is, \(t_{i}^{s_{i}, h} \in T_{i}^{K}(h)\). We define the conditional belief vector \(\beta _{i}(t_{i}^{s_{i}, h})\) as follows: For every \(h' \in H_{i}\) let \(\beta _{i}(t_{i}^{s_{i}, h}, h')\) be the conditional belief at \(h'\) about the opponents’ strategy-type pairs given by:

$$\begin{aligned} \beta _{i}(t_{i}^{s_{i}, h}, h')((s_{j}, t_{j})_{j \ne i}) = {\left\{ \begin{array}{ll} \beta _{i}^{s_{i}, h}(h')((s_{j})_{j \ne i}) &{} \text {if }t_{j} = t_{j}^{s_{j}, h'}\text { for every }j \ne i, \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

In the same way as above, it can be concluded that \(t_{i}^{s_{i}, h}\) satisfies Bayesian updating, that strategy \(s_{i}\) is optimal for type \(t_{i}^{s_{i}, h}\) at every \(h' \in H_{i}(s_{i})\) weakly following h among strategies in \(S_{i}(h')\) and at every \(h'' \in H_{i}(s_{i})\) weakly preceding h among strategies in \(S_{i}(h) \cap S_{i}(h'')\), and that type \(t_{i}^{s_{i}, h}\) assigns at every \(h' \in H_{i}\) positive probability only to opponents’ types in \(T_{j}^{K}(h').\)\(\blacklozenge \)

Now we proceed to prove some properties of this epistemic model.

Lemma 8

For the epistemic model constructed above, every type \(t_{i} \in T_{i}^{k}(h)\)expresses up to \((k - 1)\)-fold belief in future and restricted past rationality and common belief in Bayesian updating for \(k \ge 2\).

Proof

Since all types in the epistemic model satisfy Bayesian updating, all types express common belief in Bayesian updating. We prove the result by induction on k.

Let \(k = 2\), and consider a player \(i \in I\) and an information set \(h \in H\). Take \(t_{i} \in T_{i}^{2}(h)\), then \(t_{i} = t_{i}^{s_{i}, h}\) for some \(s_{i} \in R_{i}^{2}(h)\). By construction, type \(t_{i}^{s_{i}, h}\) assigns at every \(h' \in H_{i}\) positive probability only to opponents’ strategy-type pairs \((s_{j}, t_{j}^{s_{j}, h'})\) where \(s_{j} \in R_{j}^{1}(h')\) and \(t_{j}^{s_{j}, h'} \in T_{j}^{1}(h')\).

For every such strategy-type pair \((s_{j}, t_{j}^{s_{j}, h'})\), strategy \(s_{j}\) is optimal for type \(t_{j}^{s_{j}, h'}\) at every \(h'' \in H_{j}(s_{j})\) weakly following \(h'\) among strategies in \(S_{j}(h'')\), and at every \(h''' \in H_{j}(s_{j})\) weakly preceding \(h'\) among strategies in \(S_{j}(h') \cap S_{j}(h''')\). Therefore type \(t_{i}^{s_{i}, h}\) assigns at every \(h' \in H_{i}\) positive probability only to opponents’ strategy-type pairs \((s_{j}, t_{j}^{s_{j}, h'})\) where \(s_{j}\) is optimal for type \(t_{j}^{s_{j}, h'}\) at every \(h'' \in H_{j}(s_{j})\) weakly following \(h'\) among strategies in \(S_{j}(h'')\), and at every \(h''' \in H_{j}(s_{j})\) weakly preceding \(h'\) among strategies in \(S_{j}(h') \cap S_{j}(h''')\). This means that \(t_{i}^{s_{i}, h}\) believes in the opponents’ future and restricted past rationality. Then \(t_{i}^{s_{i}, h}\) expresses up to 1-fold belief in future and restricted past rationality.

Now the induction step. Fix \(k \ge 3\) and assume that for every player \(i \in I\) and every information set \(h \in H\), every type \(t_{i} \in T_{i}^{k - 1}(h)\) expresses up to \((k - 2)\)-fold belief in future and restricted past rationality.

Consider a player \(i \in I\) and an information set \(h \in H\). Take \(t_{i} \in T_{i}^{k}(h)\), which means \(t_{i} = t_{i}^{s_{i}, h}\) for some \(s_{i} \in R_{i}^{k}(h)\). Type \(t_{i}^{s_{i}, h}\) assigns at every \(h' \in H_{i}\) positive probability only to opponents’ strategy-type pairs \((s_{j}, t_{j}^{s_{j}, h'})\) where \(s_{j} \in R_{j}^{k - 1}(h')\) and \(t_{j}^{s_{j}, h'} \in T_{j}^{k - 1}(h')\). For every such strategy-type pair, \(s_{j}\) is optimal for type \(t_{j}^{s_{j}, h'}\) at every \(h'' \in H_{j}(s_{j})\) weakly following \(h'\) among strategies in \(S_{j}(h'')\), and at every \(h''' \in H_{j}(s_{j})\) weakly preceding \(h'\) among strategies in \(S_{j}(h') \cap S_{j}(h''')\).

By the induction assumption, since type \(t_{j}^{s_{j}, h'} \in T_{j}^{k - 1}(h')\) then \(t_{j}^{s_{j}, h'}\) expresses up to \((k - 2)\)-fold belief in future and restricted past rationality. Then type \(t_{i}^{s_{i}, h}\) assigns at every \(h' \in H_{i}\) positive probability only to opponents’ strategy-type pairs \((s_{j}, t_{j}^{s_{j}, h'})\) where \(s_{j}\) is optimal for type \(t_{j}^{s_{j}, h'}\) at every \(h'' \in H_{j}(s_{j})\) weakly following \(h'\) among strategies in \(S_{j}(h'')\), and at every \(h''' \in H_{j}(s_{j})\) weakly preceding \(h'\) among strategies in \(S_{j}(h') \cap S_{j}(h''')\), and type \(t_{j}^{s_{j}, h'} \in T_{j}^{k - 1}(h')\) expresses up to \((k - 2)\)-fold belief in future and restricted past rationality. Hence, \(t_{i}^{s_{i}, h}\) expresses up to \((k - 1)\)-fold belief in future and restricted past rationality. This holds for all players \(i \in I\) and all information sets \(h \in H\), so every type \(t_{i} \in T_{i}^{k}(h)\) expresses up to \((k - 1)\)-fold belief in future and restricted past rationality. By induction, the result is true for every \(k \ge 2\).\(\square \)

Lemma 9

Given the epistemic model constructed above, for every \(k \ge K - 1\), every type \(t_{i} \in T_{i}^{K}(h)\)expresses up to k -fold belief in future and restricted past rationality and expresses common belief in Bayesian updating.

Proof

As before, all types in the epistemic model satisfy Bayesian updating so all types express common belief in Bayesian updating. The result is proven by induction on k.

Let \(k = K - 1\). By Lemma 8 we know that every type \(t_{i} \in T_{i}^{K}(h)\) expresses up to \((K - 1)\)-fold belief in future and restricted past rationality, so the result is true for \(k = K - 1\).

Now we do the induction step. Fix \(k \ge K\) and assume that for every player \(i \in I\) and every information set \(h \in H\), every type \(t_{i} \in T_{i}^{K}(h)\) expresses up to \((k - 1)\)-fold belief in future and restricted past rationality. Consider a player \(i \in I\), an information set \(h \in H\) and a type \(t_{i} \in T_{i}^{K}(h)\), that is \(t_{i} = t_{i}^{s_{i}, h}\) for some \(s_{i} \in R_{i}^{K}(h)\). By construction, \(t_{i}^{s_{i}, h}\) assigns at every \(h' \in H_{i}\) positive probability only to opponents’ strategy-type pairs \((s_{j}, t_{j}^{s_{j}, h'})\) where \(s_{j} \in R_{j}^{K}(h')\) and \(t_{j}^{s_{j}, h'} \in T_{j}^{K}(h')\). Then for every such pair \((s_{j}, t_{j}^{s_{j}, h'})\) the strategy \(s_{j}\) is optimal for type \(t_{j}^{s_{j}, h'}\) at every \(h'' \in H_{j}(s_{j})\) weakly following \(h'\) among strategies in \(S_{j}(h'')\) and at every \(h''' \in H_{j}(s_{j})\) weakly preceding \(h'\) among strategies in \(S_{j}(h') \cap S_{j}(h''')\).

By the induction assumption, every type \(t_{j}^{s_{j}, h'} \in T_{j}^{K}(h')\) expresses up to \((k - 1)\)-fold belief in future and restricted past rationality. Therefore, type \(t_{i}^{s_{i}, h}\) assigns at every \(h' \in H_{i}\) positive probability only to opponents’ strategy-type pairs \((s_{j}, t_{j}^{s_{j}, h'})\) where \(s_{j}\) is optimal for type \(t_{j}^{s_{j}, h'}\) at every \(h'' \in H_{j}(s_{j})\) weakly following \(h'\) among strategies in \(S_{j}(h'')\), and at every \(h''' \in H_{j}(s_{j})\) weakly preceding \(h'\) among strategies in \(S_{j}(h') \cap S_{j}(h''')\), and type \(t_{j}^{s_{j}, h'} \in T_{j}^{K}(h')\) expresses up to \((k - 1)\)-fold belief in future and restricted past rationality. Then type \(t_{i}^{s_{i}, h}\) expresses up to k-fold belief in future and restricted past rationality, and this holds for every player \(i \in I\) and every information set \(h \in H\). Hence, every type \(t_{i} \in T_{i}^{K}(h)\) expresses up to k-fold belief in future and restricted past rationality. By induction, the result holds for every \(k \ge K - 1\). \(\square \)

The next result follows from Lemma 9 and the definition of common belief in future and restricted past rationality.

Corollary 2

Given the epistemic model constructed above, every type \(t_{i} \in T_{i}^{K}(h)\) expresses common belief in future and restricted past rationality and expresses common belief in Bayesian updating.

Now we proceed with the proof for Theorem 2.

Proof

(Theorem 2) The first part of the theorem can be stated as \( BR _{i}^{k} = S_{i}^{k + 1}\) for every player i and every k. We show this holds by dividing the proof in two parts.

First we prove that \(S_{i}^{k + 1} \subseteq BR _{i}^{k}\) for every player i and every k. Consider a player \(i \in I\) and \(k \ge 1\). Take some \(s_{i} \in S_{i}^{k + 1}\). Then \(s_{i} \in S_{i}^{k + 1}(h)\) for all \(h \in H_{i}(s_{i})\). This implies that \(s_{i} \in R_{i}^{k + 1}(\varnothing )\). Hence, type \(t_{i}^{s_{i}, \varnothing }\) is in \(T_{i}^{k + 1}(\varnothing )\), so by Lemma 8, \(t_{i}^{s_{i}, \varnothing }\) expresses up to k-fold belief in future and restricted past rationality and expresses common belief in Bayesian updating. Moreover, \(s_{i}\) is optimal for \(t_{i}^{s_{i}, \varnothing }\) at every \(h \in H_{i}(s_{i})\) weakly following \(\varnothing \) among strategies in \(S_{i}(h)\). Therefore \(s_{i} \in BR _{i}^{k}\). So every strategy \(s_{i} \in S_{i}^{k + 1}\) is also in \( BR _{i}^{k}\), that is \(S_{i}^{k + 1} \subseteq BR _{i}^{k}\), and this holds for all players \(i \in I\) and \(k \ge 1\). Moreover, from Lemma 5 we know that \( BR _{i}^{k} \subseteq S_{i}^{k + 1}\). Hence \( BR _{i}^{k} = S_{i}^{k + 1}\).

For the second part of the theorem, consider a strategy \(s_{i}\) that can rationally be chosen by a type that expresses common belief in future and restricted past rationality and expresses common belief in Bayesian updating. Then \(s_{i} \in BR _{i}^{k} = S_{i}^{k + 1}\) for all k, so \(s_{i}\) survives the full algorithm. Hence, every strategy \(s_{i}\) that can rationally be chosen by a type that expresses common belief in future and restricted past rationality and common belief in Bayesian updating survives the full algorithm.

Now, take a strategy \(s_{i}\) that survives the full algorithm. Hence, \(s_{i} \in S_{i}^{K}(h)\) for all \(h \in H_{i}(s_{i})\). Then \(s_{i} \in R_{i}^{K}(\varnothing )\), and by Corollary 2 we know type \(t_{i}^{s_{i}, \varnothing } \in T_{i}^{K}(\varnothing )\) expresses common belief in future and restricted past rationality and expresses common belief in Bayesian updating. Moreover, by the construction of the epistemic model, the strategy \(s_{i}\) is optimal for the type \(t_{i}^{s_{i}, \varnothing }\) at every \(h \in H_{i}(s_{i})\) weakly following \(\varnothing \) among strategies in \(S_{i}(h)\). Hence \(s_{i}\) is optimal for a type that expresses common full belief in future and restricted past rationality and expresses common belief in Bayesian updating. Therefore, every strategy \(s_{i}\) that survives the full algorithm is optimal for a type that expresses common belief in future and restricted past rationality and expresses common belief in Bayesian updating. \(\square \)