1 Introduction

Nash equilibrium is the only concept of a solution which can be sustained in a game where rational players, besides knowing their own strategy set and payoff as a function of their own strategy, have complete information about the game they are playing. A player has complete information when they know the following: They are participating in a game (i.e., interacting with other players, conscious decision makers with their own goals, not random “nature”), the number of those players, their strategy sets and payoff functions, the influence of the choices of the others on payoffs, and that the other players are rational. In dynamic games, it is assumed that players can either directly observe the choices of the other players, or their influence on the state variable.

Actually, in the majority of real-life decision-making problems of a game-theoretic nature, these assumptions are not fulfilled. Usually, the other players’ payoff functions and sets of strategies are not known exactly.

This fact results in a need to introduce various concepts of equilibria with imperfect, incomplete, or distorted information. This branch of game theory is developing rapidly, with numerous concepts based on various assumptions on what kind of imperfection is allowed: Bayesian equilibria, introduced by Harsanyi [1], \(\varDelta \)-rationalizability by Battigalli and Siniscalchi [2], conjectural equilibria by Battigalli and Guaitoli [3], cursed equilibrium considered by Eyster and Rabin [4], self-confirming equilibria by Fudenberg and Levine [5], and studied, among others, by Azrieli [6], conjectural categorical equilibria introduced by Azrieli [7], stereotypical beliefs by Cartwright and Wooders [8], subjective equilibria by Kalai and Lehrer [9, 10], rationalizable conjectural equilibria by Rubinstein and Wolinsky [11], correlated equilibria by Aumann [12, 13] (to some extent) and belief distorted Nash equilibria for set-valued beliefs introduced by Wiszniewska-Matyszkiel ([14, 15], with prerequisites in [16]). Most of them assume that players are rational. A detailed review of these concepts can be found in Wiszniewska-Matyszkiel [14].

Only two of the aforementioned concepts can deal with information which is not only incomplete, but can be seriously distorted: the subjective equilibria of Kalai and Lehrer [9, 10] and belief distorted Nash equilibria (BDNE) for set-valued beliefs of Wiszniewska-Matyszkiel [14, 15]. Only BDNE is applicable in dynamic games, including those with an infinite time horizon, in which information is gradually disclosed during play.

In the approaches presented in a previous, theoretical paper of the author on this subject [14], and the paper applying these concepts to environmental economics [15], beliefs take the form of a multivalued correspondence. Such a form of beliefs suggests a way of defining the “anticipated” future payoff of a player as the payoff which can be obtained given the worst realization under this belief, assuming that the player will choose optimally in the future. We refer to this approach as the “inf-approach” (due to the alternative used in assessing the future payoff), while the approach used in this paper is referred to as “exp-approach.” Let us emphasize that the inf-approach, with its pessimistic attitude to the future, is not very realistic.

To address this issue in this paper, beliefs are assumed to take the form of probability distributions over the set of possible future trajectories of states and statistics. If players are able to estimate the probability distribution of these parameters in the future, it is inherent that they take into account the expected future payoff, and the verification of beliefs can be assessed quantitatively using this probability distribution.

We introduce the concepts of pre-belief distorted Nash equilibrium (pre-BDNE), \(\varepsilon \)-belief distorted Nash equilibrium (\(\varepsilon \)-BDNE), belief distorted Nash equilibrium (BDNE), and various concepts of the self-verification of beliefs of the form considered. Existence and equivalence theorems are proven, and the notion of BDNE is compared to the notions of Nash equilibrium, subjective equilibrium and BDNE for set-valued beliefs. These concepts are illustrated by several examples: extracting a common renewable resource, a large minority game, and a repeated Prisoner’s Dilemma.

It is worth emphasizing that the concept of BDNE, both for set-valued and probabilistic beliefs, is not a concept of bounded rationality. In our approach, we assume that players are rational, although they may have false information about the game they are playing. This false information is such that it cannot be falsified during subsequent play.

The paper is composed as follows. The problem is defined in Sect. 2, where the formal definition in Sect. 2.2 is preceded by a brief introduction in Sect. 2.1. The concepts of pre-BDNE, notions of the self-verification of beliefs, and finally, \(\varepsilon \)-BDNE and BDNE are defined in Sect. 3, where theoretical results on existence and equivalence are also stated. In Sect. 4, some examples are studied. Appendix A is devoted to large games and B contains a very general existence result, using a higher level of mathematical abstraction than the main part of the paper.

2 Formulation of the Problem

This section defines a dynamic game, as well as a derived game with distorted information and probabilistic beliefs. The dynamic game is identical to that considered in Wiszniewska-Matyszkiel [14] based on the inf-approach, while the game with distorted information substantially differs, particularly in the structure of beliefs and expected payoffs.

2.1 Brief Introduction of the Problem and Concepts

Before giving a detailed introduction of the problem, we briefly describe it, without full mathematical precision.

We consider a discrete time dynamic game with the set of players \(\mathbb {I}\), where the payoff of player i under strategy profile S can be written as \(\varPi _{i}(S):=\sum _{t=t_0}^T\frac{P_i(S_i(t),u^S(t),X^S(t))}{(1+r_i)^{t-t_0}}+\frac{G_i(X^S(T+1))}{(1+r_i)^{T+1-t_0}},\) (or only the first part in the case of an infinite time horizon), where \(u^S\) denotes a statistic describing the players’ behavior under S (e.g., the aggregate of all the strategies used by the players), observable ex post, while \(X^S\) denotes the trajectory of the state variable resulting from choosing S, which is defined by \(X^S(t+1)=\phi (X^S(t),u^S(t))\) with \(X^S(0)=\bar{x}.\) All past and current states are observable.

At a Nash equilibrium, the basic concept of a solution to a noncooperative game, each player maximizes their payoff given the strategies of the remaining players.

We assume that players do not have complete information about the game they are playing. Therefore, at each stage of the game, they formulate beliefs about the future path of \(X^S\) and \(u^S\). The beliefs formulated at time t, \(B_i(t,a,H^S)\), are based on the current decision a and the already observed part of the history \(H^S=(X^S,u^S)\), denoted \(H^S|_t\). They take the form of a probability distribution on the set of future paths of \((X^S,u^S)\).

These beliefs define the expected payoff of player i, \(\varPi _i^e(t,H^S|_t,S(t))\) as the sum of the actual current payoff at time t, \(P_i(S(t),u^S(t),X^S(t))\), and the expected (with respect to beliefs) value of the future optimal payoff along \(H^S\).

A preliminary concept of a pre-belief distorted Nash equilibrium (pre-BDNE) says that a profile S is a pre-BDNE iff at each stage each player maximizes \(\varPi _i^e(t,H^S|_t,S(t))\). A belief distorted Nash equilibrium (BDNE) is a pre-BDNE S for which \((X^S,u^S)\) is the most likely path (with maximum likelihood normalized to 1), while an \(\varepsilon \)-belief distorted Nash equilibrium (\(\varepsilon \)-BDNE) is an \(\varepsilon \)-BDNE S for which \((X^S,u^S)\) has likelihood at least \(1-\varepsilon \).

2.2 Formal Introduction

A game with distorted information is a tuple of the following objects:

\(((\mathbb {I},\mathfrak {I},\lambda ), \mathbb {T}, \mathbb {X}, \{D_{i}\}_{i\in \mathbb {I}}, U, \phi , \{P_{i}\}_{i\in \mathbb {I}}, \{G_{i}\}_{i\in \mathbb {I}},\{B_{i}\}_{i\in \mathbb {I}}, \{r_{i}\}_{i\in \mathbb {I}}, L)\), i.e., the space of players, set of time points, set of states, sets of the players’ possible decisions, statistic, the system’s reaction function, current payoffs, terminal payoffs, beliefs, discount rates, and likelihood, respectively, briefly described in Sect. 2.1, and in detail in the sequel.

The set of players is denoted by \(\mathbb {I}\). In order that the definitions encompass both games with finitely many players and large games, we introduce a structure on \(\mathbb {I}\) consisting of a \(\sigma \)-field \(\mathfrak {I}\) of its subsets and a measure \(\lambda \) on it (in standard games with finitely many players, \(\mathfrak {I}\) is the whole power set, while \(\lambda \equiv 1\)). For readers who are not familiar with games involving a measure space of players, there is a short introduction in Appendix A.

The game is dynamic, played over a discrete set of times \(\mathbb {T}=\{t_{0},t_{0}+1,\ldots ,T\}\) or \(\mathbb {T}=\{t_{0},t_{0}+1,\ldots \}\). We also introduce the symbol \(\overline{\mathbb {T}}\) denoting \(\{t_{0},t_{0}+1,\ldots ,T+1\}\) for finite T and equal to \(\mathbb {T}\) in the opposite case.

At each moment, player i chooses a decision from their decision set \(D_{i}\). We also denote the common superset of these sets as \(\mathbb {D}\)—the set of the (combined) decisions of the players with chosen \(\sigma \)-field of its subsets denoted by \(\mathcal {D}\).

We call any measurable function \(\delta :\mathbb {I\rightarrow D}\) with \(\delta (i)\in D_i\), a static profile. The set of all static profiles is denoted by \(\Sigma ^{static }\). We assume that it is non-empty.

The next important object is a finite, m-dimensional, statistic of the whole profile, which influences players’ payoffs. Such statistics might be, e.g., aggregate extraction in models of the exploitation of renewable resources, or prices in models of markets. Such a definition does not reduce generality, since in games with finitely many players this statistic may be the whole profile. Formally, a statistic of a static profile is a function \(U:\Sigma ^{static }\mathop {\rightarrow }\limits ^{onto }\mathbb {U}\subset \mathbb {R}^{m}\) defined by

$$\begin{aligned} U(\delta ):=\left[ \int _{\mathbb {I}}g_{k}(i,\delta (i))d \lambda (i)\right] _{k=1}^{m} \end{aligned}$$
(1)

for measurable functions \(g_{k}:\mathbb {I}\times \mathbb {D} \rightarrow \mathbb {R}\). The resultant set \(\mathbb {U}\) is called the set of profile statistics.

If \(\varDelta :\mathbb {T}\rightarrow \Sigma ^{static } \) represents the choices resulting from static profiles at various moments, then we denote by \(u^{\varDelta }\) the function \(u^{\varDelta }:\mathbb {T}\rightarrow \mathbb {U}\) such that \(u^{\varDelta }(t)=U(\varDelta (t))\).

The game is played in an environment (or system) with set of states \(\mathbb {X}\).

Given a function \(u^{\varDelta }:\mathbb {T}\rightarrow \mathbb {U}\), the state variable evolves according to the equation

$$\begin{aligned} X^{\varDelta }(t+1)=\phi (X^{\varDelta }(t),u^{\varDelta }(t)) \text { with the initial condition } X^{\varDelta }(t_{0})=\bar{x}, \end{aligned}$$
(2)

where \(\phi : \mathbb X \times \mathbb U \rightarrow \mathbb X\) is called the reaction function of the system.

At each moment, player i obtains current payoff, \(P_{i}:D_i\times \mathbb {U}\times \mathbb {X} \rightarrow \mathbb {R\cup \{-\infty \}}\). In the case of a finite time horizon, player i also obtains a terminal payoff (at the end of the game) defined by the function \(G_{i}:\mathbb {X}\rightarrow \mathbb {R\cup \{-\infty \}}\).

Players sequentially observe the history of the game: At time t, they know the states X(s) for \(s\le t\) and the statistics u(s) for the chosen static profiles at moments \(s<t\). In order to simplify the notation, we introduce the set of histories of the game \(\mathbb {H}:=\mathbb {X}^{T-t_{0}+2}\mathbb {\times }\mathbb {U}^{T-t_{0}+1}\) and for such a history \(H\in \mathbb {H}\), we denote the history observed at time t by \(H|_{t}\).

Given the history observed at time t, \(H|_{t}\), players formulate their suppositions about future values of u and X, depending on their decision a made at time t. This is formalized as a function describing the beliefs of player i, \(B_{i}:\mathbb {T}\times D_i\times \mathbb {H}\rightarrow \mathcal {M}_{1}\left( \mathbb {H}\right) \), where \(\mathcal {M}_{1}\left( \mathbb {H}\right) \) denotes the set of all probability measures on \(\mathbb {H}\). We assume that beliefs \(B_{i}(t,a,H)\) only depend on H through \(H|_{t}\), and that for every \(H^{\prime } \) in the support of \(B_{i}(t,a,H)\), we have \(H^{\prime }|_{t}=H|_{t}\).

Players have compound strategies dependent on time and the history of the game observed at this time. The strategy of player i is a function \(S_{i}:\mathbb {T}\times \mathbb {H} \rightarrow D_i\) such that \(S_i(t,H)\) only depends on H through \(H|_t\). Combining the players’ strategies, we obtain the function \(S:\mathbb {I}\times \mathbb {T}\times \mathbb {H}\rightarrow \mathbb {D}\).

A profile (of strategies) is a combination of strategies such that for each t and H, the function \(S_{\bullet }(t,H)\) is a static profile. The set of all profiles is denoted by \(\Sigma \). Since the choice of a profile S determines the history of the game, we denote this history, consisting of trajectory \(X^S\) and statistic \(u^S\) (defined in Eqs. (1) and (2), respectively), by \(H^{S}\).

To simplify the notation, we consider the open-loop form of profile S, \(S^{OL}:\mathbb {T}\rightarrow \Sigma ^{static } \), defined by

$$\begin{aligned} S_{i}^{OL}(t)=S_{i}(t,H^S). \end{aligned}$$
(3)

If the players choose a profile S, then the discounted payoff of player i, \(\varPi _{i}:\Sigma \rightarrow \overline{\mathbb {R}}\), depends only on the open-loop form of the profile and is equal to

$$\begin{aligned} \varPi _{i}(S)=\sum _{t=t_{0}}^{T}\frac{P_{i}\left( S_{i}^{OL}(t),u^S(t), X^{S}(t) \right) }{ \left( {1+r_{i}}\right) ^{t-t_{0}}}+\frac{G_{i}\left( X^{S}(T+1) \right) }{\left( {1+r_{i}}\right) ^{T+1-t_{0}}}, \end{aligned}$$
(4)

where \(r_{i}>0\) is the discount rate of player i. For infinite T, we set \(G_i\equiv 0\). We assume that the \(\varPi _{i}(S)\) are well defined.

This ends the definition of the dynamic game.

However, the players do not know the profile. Therefore, in their calculations, they can only use the expected payoff functions, \(\varPi _{i}^{e}:\mathbb {T}\times \mathbb {H} \times {\varSigma }^{{{static }}}\rightarrow \overline{\mathbb {R}}\), corresponding to their beliefs.

$$\begin{aligned} \varPi _{i}^{e}(t,H^S,\delta ):=P_{i}\left( \delta _{i},U(\delta ) ,X^{S^{OL}}(t) \right) + \frac{V_{i}(t+1,B_{i}(t,S_{i}^{OL}(t),H^{S}))}{1+r_{i}}, \end{aligned}$$
(5)

where \(V_{i}:\overline{\mathbb {T}}\times \left( \mathcal {M}_{1}(\mathbb {H})\right) )\rightarrow \overline{\mathbb {R}}\), (the function defining the expected future payoff) represents the discounted value of player i’s expected future payoff assuming that he acts optimally in the future under beliefs, i.e.,

$$\begin{aligned} V_{i}(t,\beta )=E_{\beta }v_{i}(t,H)=\int _{\mathbb {I}} v_{i}(t,H)d \beta (H), \end{aligned}$$
(6)

where the function \(v_{i}:\overline{\mathbb {T}} \times \mathbb {H} \rightarrow \overline{\mathbb {R}}\) is the present value of the future payoff of player i under the assumption that they act optimally in the future, given u and X:

$$\begin{aligned} v_{i}(t,(X,u))=\sup _{d:\mathbb {T}\rightarrow D_{i}} \left[ \sum _{s=t}^{T} \frac{P_{i}(d(s),u(s),X(s))}{\left( 1+r_{i}\right) ^{s-t}}+\frac{G_{i}\left( X(T+1) \right) }{\left( 1+r_{i}\right) ^{T+1-t}} \right] . \end{aligned}$$
(7)

Note that this definition of expected payoff mimics, to some extent, the Bellman equation for calculating the best responses of players’ to the strategies of the others, used to derive Nash equilibria.

3 Nash Equilibria and Belief Distorted Nash Equilibria

One of the basic concepts in game theory, Nash equilibrium, assumes that every player (almost every in the case of games with a continuum of players) chooses a strategy which maximizes their payoff given the strategies of the remaining players.

Notational convention For any profile S and strategy d (both static and dynamic) of player i, \(S^{i,d} \text { represents the modification of the profile } S \text {where the strategy of player} i \text { is replaced by }d.\)

Definition 3.1

A profile S is a Nash equilibrium iff for a.e. \(i\in \mathbb {I}\) and for every strategy \(d \in D_i\), \(\varPi _{i}(S)\ge \varPi _{i}(S^{i,d})\).

The abbreviation “a.e.” (almost every) in games with finitely many players reduces to “every.”

3.1 Toward Belief Distorted Nash Equilibria: Pre-Belief Distorted Nash Equilibria and their Properties

The assumption that a player knows the strategies of the remaining players, or at least the statistic for these strategies which influences their payoff, is not usually fulfilled in real-life situations. Moreover, the details of the other players’ payoff functions or available strategy sets are sometimes not known precisely. Therefore, given their beliefs, players maximize their expected payoffs.

Definition 3.2

A profile S is a pre-belief distorted Nash equilibrium (pre-BDNE for short) for belief B iff for a.e. \(i\in \mathbb {I}\), for every decision a of player i and every \(t\in \mathbb {T}\), we have \(\varPi _{i}^{e}(t,H^S,S^OL (t))\ge \varPi _{i}^{e}(t,H^S, (S^OL (t))^{i,a})\).

In other words, a profile S is a pre-BDNE iff almost every player maximizes their expected payoff given the current values of \(X^S\) and \(u^S\) and beliefs about their future values.

Remark 3.1

In one-shot games (i.e., for \(T=t_{0}\) and \(G\equiv 0\)), a profile is a pre-BDNE iff it is a Nash equilibrium.

\(\square \)

Next, we state an existence result for games with a continuum of players.

Theorem 3.1

Let \((\mathbb {I},\mathfrak {I},\lambda )\) be an atomless measure space and let \(D_i\subseteq \mathbb {R}^{n}\), together with the \(\sigma \)-field of Borel subsets. Assume that for every t, x, H and for almost every i, the following continuity assumptions hold: The functions \(P_{i}(a,u,x)\) and \(V_{i}(t,B_{i}(t,a,H))\) are upper semicontinuous in (au) jointly, while for every a, they are continuous in u and for all k, the functions \(g_{k}(i,a)\) are continuous in a for \(a\in D_{i}\).

Assume also that the sets \(D_{i}\) are compact and the following measurability assumptions hold: The graph of \(D_{\bullet }\) is measurable, and for every t, x, u, k, and H, the \(P_{i}(a,u,x)\), \(r_{i}\), \(V_{i}(t,B_{i}(t,a,H))\), and \(g_{k}(i,a)\) are measurable in (ia). Moreover, assume that for each k, \(g_k\) is integrably bounded, i.e., there exists an integrable function \(\Gamma :\mathbb {I\rightarrow R}\) such that for every \(a\in D_{i}\), \(\left| g_{k}(i,a)\right| \le \Gamma (i)\).

Under these assumptions, there exists a pre-BDNE for B.

Theorem 3.1 states that under some measurability, compactness, and continuity assumptions, there exists a pre-BDNE. This is a conclusion from a more general existence result — Theorem B.2, proved by a general Nash equilibrium result from Wiszniewska-Matyszkiel [17], using the concept of analyticity of sets. Since it requires introducing specific terminology, for the sake of coherence and also for readers who are less interested in nonstandard mathematics, Theorem B.2 is stated and proven in Appendix B. The proof of Theorem 3.1 is also given in Appendix B, after the formulation and proof of Theorem B.2.

Now we turn to show some properties of pre-BDNE for a special kind of belief.

Definition 3.3

A belief \(B_{i}\) has perfect foresight for a profile S, iff for all t, \(B_{i}(t,S_{i}^{OL}(t),H^{S})\) is concentrated at \({\{H^{S}\}}\).

For perfect foresight, we state the equivalence between Nash equilibria and pre-BDNE for a continuum of players.

Theorem 3.2

Let \((\mathbb {I},\mathfrak {I},\lambda )\) be an atomless measure space. Assume that for all i, x, and u, the \(P_i(\bullet ,u,x)\) are upper semicontinuous, \(D_i\) are compact, \(\sup _{S\in \Sigma }\varPi _i(S)<+\infty \) and for every \(S\in \Sigma \), \(\max _{d:\mathbb {T} \rightarrow D_i}\varPi _i(S^{i,d})\) is attained.

  1. (a)

    Any Nash equilibrium profile \(\bar{S}\) is a pre-BDNE for any belief corresponding to perfect foresight at \(\bar{S}\) and all profiles \(\bar{S}^{i,d}\).

  2. (b)

    If a profile \(\bar{S}\) is a pre-BDNE for a belief B with perfect foresight at both \(\bar{S}\) and \(\bar{S}^{i,d}\) for a.e. player i and any of their strategies d, then it is a Nash equilibrium.

Proof

In all the subsequent reasonings, we consider player i, who is not a member of the set of players of measure 0 for whom the condition of payoff maximization (actual for Nash equilibrium, expected for BDNE) does not hold.

In the case of a continuum of players, the statistics for the profiles, and consequently the trajectories corresponding to them, are identical for \(\bar{S}\) and all \(\bar{S}^{i,d}\). We denote this statistic by u and the corresponding trajectory by X.

To continue, we need the next lemma stating that along the path of perfect foresight, the equation for the expected payoff of player i becomes the Bellman equation for optimizing their actual payoff, while \(V_{i}\) is the value function. \(\square \)

Lemma 3.1

Let i be a player maximizing their payoff at a Nash equilibrium \(\bar{S}\), while \(\tilde{V}_i\) is the value function for this maximization. If B has perfect foresight for both profile \(\bar{S}\) and profiles \(\bar{S}^{i,d}\) for any \(d\in D_i\), then for all t, the values of \(V_i\) and \(\tilde{V_i}\) coincide and \(\bar{S}_i^{OL}(t)\in Argmax _{a\in D_{i}}\varPi _{i}^{e}(t,H^{\bar{S}},(\bar{S}^{OL}(t))^{i,a})\).

Proof

(of Lemma 3.1) Note that, given the profile of strategies of the remaining players coincides with \(\bar{S}\), the value function for the decision problem of player i can be written as \(\widetilde{V}_{i}:\mathbb {T}\rightarrow \overline{\mathbb {R}}\), unlike in standard dynamic optimization problems, since, because of the negligibility of every single player, the trajectory X is fixed.

$$\begin{aligned} \widetilde{V}_{i}(t)=\sup _{d:\mathbb {T}\rightarrow D_i}\left[ \sum _{s=t}^{T}P_{i}(d(s),u(s),X(s))\cdot \left( \frac{1}{1+r_{i}}\right) ^{s-t}\right. \nonumber \\ \left. +G_{i}\left( X(T+1)\right) \cdot \left( \frac{1}{1+r_{i}}\right) ^{T+1-t} \right] \end{aligned}$$
(8)

(recall that for infinite T, we take \(G\equiv 0\)).

Since the payoff is well defined and the maximum is attained at \(\bar{S}_i\), the value function fulfills the Bellman equation

$$\begin{aligned} \widetilde{V}_{i}(t)=\sup _{a\in D_{i}}\left[ P_{i}(a,u(t),X(t))+ \widetilde{V}_{i}(t+1)\cdot \left( \frac{1}{1+r_{i}}\right) \right] \end{aligned}$$
(9)

and

$$\begin{aligned} \bar{S}_{i}(t)\in \mathrm {Argmax}_{a\in D_{i}} \left[ P_{i}(a,u(t),X(t))+ \widetilde{V}_{i}(t+1)\cdot \left( \frac{1}{1+r_{i}}\right) \right] . \end{aligned}$$
(10)

Using Eq. (8) to substitute an expression for \(\widetilde{V}_{i}(t+1)\) on the r.h.s. of the Bellman equation, Eq. (9), we obtain

$$\begin{aligned} \widetilde{V}_{i}(t)= & {} \sup _{a\in D_{i}} \left[ P_{i}(a,u(t),X(t))) \right. \\&\quad \left. + \frac{1}{1+r_i}\left( \sup _{d:\mathbb {T}\rightarrow D_{i}} \left\{ \sum _{s=t+1}^{T}\frac{P_{i}(d(s),u(s),X(s))}{ \left( {1+r_{i}} \right) ^{s-(t+1)}} +\frac{G_{i}(X(T+1))}{ \left( {1+r_{i}} \right) ^{T+1-(t+1)}} \right\} \right) \right] . \end{aligned}$$

Note that the last supremum is equal not only to \(\widetilde{V}_{i}(t+1)\), but also to \(v_{i}(t+1,(X,u))\). Since the belief assigns probability one to the history (Xu) for all profiles \(\bar{S}^{i,d}\), this supremum is also equal to \(V_{i}(t+1,B_{i}(t,a,H^{\bar{S}^{i,d}}))\). Therefore,

$$\begin{aligned} \widetilde{V}_{i}(t)= & {} \sup _{a\in D_{i}} \left[ P_{i}(a,u(t),X(t))+ \frac{V_{i}(t+1,B_{i}(t,a,H^{\bar{S}^{i,d}}))}{1+r_{i}} \right] \nonumber \\= & {} \sup _{a\in D_{i}}\varPi _{i}^{e}\left( t,\bar{S}^{i,d},\left( S^{OL }(t)\right) ^{i,a}\right) . \end{aligned}$$

We only have to show that \(\bar{S}_{i}(t)\in Argmax _{a\in D_{i}}\varPi _{i}^{e}(t,H^{\bar{S}},\left( \bar{S}^{OL}(t)\right) ^{i,a}).\)

From the definition of \(\varPi _i^e\), Eq. (5), this set is equal to

$$\begin{aligned} Argmax _{a\in D_{i}} \left[ P_{i}(a,u(t),X(t))+\frac{V_{i}(t+1,B_{i}(t,a,H^{\bar{S}^{i,d}}))}{1+r_{i}} \right] \\ =Argmax _{a\in D_{i}} \left[ P_{i}(a,u(t),X(t))+\frac{\widetilde{V}_{i}(t+1)}{1+r_{i}} \right] . \end{aligned}$$

Hence Eqs. (8) and (10) are satisfied, which ends the Proof of Lemma 3.1. \(\square \)

Statement (a) from Theorem 3.2 is an immediate conclusion from Lemma 3.1.

Statement (b): Let \(\bar{S}\) be a pre-BDNE for B which has perfect foresight at \(\bar{S}\) and all \(\bar{S}^{i,d}\). We consider \(\widetilde{V}_{i}\) as in Lemma 3.1, Eq. (8). From the definition of pre-BDNE and perfect foresight, \(\bar{S}_{i}(t)\in Argmax _{a\in D_{i}}\varPi _i^{e}(t,H^{\bar{S}},\left( \bar{S}^{OL}(t))^{i,a}\right) \), which is equal to \(Argmax _{a\in D_{i}}\) \(\left[ P_{i}(a,u(t),X(t))+\frac{1}{1+r_{i}}\max _{d:\mathbb {T}\rightarrow D_{i}}\left( \sum _{s=t+1}^{T}\frac{P_{i}(d(s),u(s),X(s))}{\left( 1+r_{i}\right) ^ {s-(t+1)}}\right. \right. \left. \left. +\frac{G_{i}\left( X(T+1)\right) }{\left( 1+r_{i}\right) ^{T-t)}}\right) \right] \) . From Eq. (8), this set is equal to \(Argmax _{a\in D_{i}}\) \(\left[ P_{i}(a,u(t),X(t))\right. \left. +\left( \frac{1}{1+r_{i}}\right) \cdot \widetilde{V}_{i}(t+1)\right] \), the set in Expression (10).

At this stage, we need to show the sufficiency of the Bellman equation with the appropriate terminal condition. For the finite time horizon case, Eq. (9) and Expression (10) (with d instead of \(\bar{S}_i\)), together with \(\widetilde{V}_{i}(T+1)=G_{i}(X(T+1))\), are sufficient conditions for the function \(\widetilde{V}_{i}\) and strategy d to be the value function and optimal strategy, respectively.

In the infinite horizon case, the standard form of the terminal condition does not work in the case of unbounded payoffs, so we use a weaker version from Wiszniewska-Matyszkiel [18], Theorem 1. The required conditions for our problem are

  1. (i)

    \(\mathrm {limsup}_{t\rightarrow \infty }\widetilde{V}_{i}(t)\cdot \left( \frac{1}{1+r_{i}}\right) ^{t-t_{0}}\le 0\) and

  2. (ii)

    if \(\mathrm {limsup}_{t\rightarrow \infty }\widetilde{V}_{i}(t)\cdot \left( \frac{1}{1+r_{i}}\right) ^{t-t_{0}}< 0\), then for every \(d:\mathbb {T}\rightarrow D_i\), \(\varPi _i(\bar{S}^{i,d})=-\infty \).

Condition (i) holds from the assumption that the \(\varPi _i\) are bounded from above, while (ii) holds, since the existence of a \(t_k\rightarrow \infty \) such that \(\lim _{k\rightarrow \infty }\widetilde{V}_{i}(t_k)\cdot \left( \frac{1}{1+r_{i}}\right) ^{t_k-t_{0}}<0\) when at least one of \(\varPi _i(S^{i,d})\) is greater than \(-\infty \) contradicts the convergence of the series in the definition of \(\varPi _i\), see Eq. (4).

Since \(\widetilde{V}_{i}\) fulfills (9), the set \(Argmax _{a\in D_{i}}\) \(\left[ P_{i}(a,u(t),X(t))+\left( \frac{1}{1+r_{i}}\right) \cdot \widetilde{V}_{i}(t+1) \right] \) is the set of optimal actions of player i at time t, given u and X. Since we have this property for a.e. i, \(\bar{S}\) is a Nash equilibrium.

\(\square \)

The next equivalence result holds for repeated games.

Theorem 3.3

Consider a repeated game where players’ belief functions are independent of their strategies, such that for every player i, \(\sup _{d,u} |P_i(d,u,\bar{x})|<+\infty \).

  1. (a)

    If \((\mathbb {I},\mathfrak {I},\lambda )\) is an atomless measure space, then a profile S is a pre-BDNE for B, iff it is a Nash equilibrium, iff it is a sequence of Nash equilibria in static one-stage games.

  2. (b)

    Any profile S where the strategies of a.e. player are independent of the observed history is a pre-BDNE for B, iff it is a Nash equilibrium, iff it is a sequence of Nash equilibria in static one-stage games.

Proof

In repeated games, the only variable influencing future payoffs (via the dependence of the strategies of the remaining players on the history) is the statistic of the profile.

  1. (a)

    In games with an atomless space of players, the decision of a single player does not influence the statistic. Therefore, the optimization problem faced by player i can be decomposed into the optimization of \(P_{i}(a,u(t),\bar{x})\) at each separate moment (the discounted payoffs obtained in the future are finite, since the current payoffs are bounded).

  2. (b)

    If the strategies of the remaining players do not depend on the history of the game, then the current decision of a player does not influence their future payoffs, actual or expected. Therefore, the optimization problem faced by player i can be decomposed into the optimization of \(P_{i}(t,a,u(t),\bar{x})\) at each separate moment (again, the discounted payoffs obtained in the future are finite, since the current payoffs are bounded).

\(\square \)

3.2 Toward Belief Distorted Nash Equilibrium: Self-Verification

In this subsection, we concentrate on the problem of the consistency of a game’s history with players’ beliefs.

In dynamic games with many stages, especially games with an infinite time horizon, we cannot check whether beliefs are consistent with reality by assuming that the game is repeated many times.

Using the inf-approach, where beliefs are given by the sets of histories regarded as possible, the method of verification is obvious. If a history regarded as impossible happens, it means that beliefs have been falsified. Otherwise, there is no need to update beliefs and we can regard them as being consistent with reality. Without any ranking of trajectories regarded as being possible, this is the only reasonable method of verification. In the case of probabilistic beliefs, the method of verification is not so obvious. It could be adapted from the inf-approach, where the support of a distribution is treated as the set of possible histories, but this leads to a loss of the information introduced by the probability distribution. Therefore, we introduce a function measuring the consistency of beliefs with a game’s history.

First, given a probability distribution \(\beta \) on \(\mathbb {H}\), we introduce a function, called the likelihood function, that measures to what extent the histories corroborate \(\beta \). It assigns to each probability distribution on \(\mathbb {H}\) a function on the set of infinite histories corresponding to the belief.

Definition 3.4

A function \(L:\mathcal {M}_{1}(\mathbb {H})\rightarrow [0,1]^{\mathbb {H}}\) is called a likelihood function iff

  1. (a)

    If H is an atom of \(\beta \), then \(L(\beta )(\bar{H}):=\frac{\beta (\bar{H})}{\max _{H \in \mathbb {H}}\beta (H)}\).

  2. (b)

    If \(\beta \) is a continuous probability distribution with density \(\mu \),

    then \(L(\beta )(\bar{H}) :=\frac{\mu (\bar{H})}{\max _{H \in \mathbb {H}}\mu (H)}\) if the maximum is attained.

  3. (c)

    Otherwise, the function L satisfies

    1. (i)

      if \(\beta (\{H_{1}\})>\beta (\{H_{2}\})\), then \(L\left( \beta \right) (H_{1}) > L\left( \beta \right) (H_{2})\);

    2. (ii)

      for each \( \beta \), there exists H with \(L(\beta )(H)=1\) (the “most likely history” is always of likelihood 1).

This definition gives a unique function in the case of discrete distributions. In the case of continuous distributions, we can take any density function, which leads to certain non-uniqueness. In the case of mixed distributions, we can choose any function satisfying (a)–(c), since the relation between atoms and the atomless part is not predefined.

From this moment on, we fix a likelihood function L, which is used in further definitions.

The first thing that we consider is verification of beliefs.

Given a likelihood function, we can define a measure of the consistency of beliefs along a profile \(\bar{S}\) as the minimum likelihood of \(H^{\bar{S}}\), taken over time, according to that belief. However, we have to solve a technical problem resulting from the notational convention of using elements from the set \(\mathbb {H}\) to denote both the observed history \(H|_t\) and predictions of the future for all t. In fact, given the beliefs at time t, we only want to measure the likelihood of the predictions: X(s) and u(s) for \(s>t\). The observed history, \(H|_t\), i.e., X(s) for \(s\le t\) and u(s) for \(s<t\), does not cause any problem, since with probability one, \(H|_t=H^{\bar{S}}|_t\). So, only u(t) may cause problems. Note that u(t) has no effect on \(B_i(t,\bullet ,\bullet )\). Hence, if we replace it by something else, we do not change any of the previously defined concepts. Therefore, to define the method of verifying beliefs, we slightly modify this irrelevant part of the history:

$$\begin{aligned}&\bar{B}^t_{i}(t,a,H)(A):=B_{i}(t,a,H)\left( \left\{ (X,u)\in \mathbb {H\ }:\exists u^{\prime },u^{\prime }(s)\right. \right. \\&\quad \left. \left. =u(s)\text { for }s\ne t,(X,u^{\prime })\in A\right\} \right) . \end{aligned}$$

Definition 3.5

A function \(l^{\bar{S}}:\mathbb {I}\times \mathcal {M}_{1}(\mathbb {H})\rightarrow \mathbb {R}_{+}\) is called a measure of the ex post consistency of beliefs \(\{B_{i}\}_{i\in \mathbb {I}}\) with reality for profile \(\bar{S}\) iff \(l_{i}^{\bar{S}}(B_{i}):=\inf _{t\in \mathbb {T}}L(\bar{B}^t_{i}(t,\bar{S}_{i}^{OL}(t),H^{\bar{S}}))(H^{\bar{S}})\).

Given \(\varepsilon \ge 0\), we can define the following properties of \(\varepsilon \)-self-verification of beliefs.

Definition 3.6

  1. (a)

    A collection of beliefs \(B=\{B_{i}\}_{i\in \mathbb {I}}\) is perfectly \(\varepsilon \) -self-verifying iff for every pre-BDNE \(\bar{S}\) for B, then for a.e. \(i\in \mathbb {I}\), we have \(l_{i}^{\bar{S}}(B_{i})\ge 1-\varepsilon \).

  2. (b)

    A collection of beliefs \(B=\{B_{i}\}_{i\in \mathbb {I}}\) is potentially \(\varepsilon \) -self-verifying iff there exists a pre-BDNE \(\bar{S}\) for B such that for a.e. \(i\in \mathbb {I}\), we have \(l_{i}^{\bar{S}}(B_{i})\ge 1-\varepsilon \).

In order to interpret these concepts, let us assume that players, who respond best to their beliefs, have an incentive to change their beliefs only if the measure of their ex post consistency is less than \(1-\varepsilon \). In this case, perfect \(\varepsilon \)-self-verification of beliefs means that players never have any incentive to change their beliefs, while potential \(\varepsilon \)-self-verification of beliefs means that there is a possibility that they will have no incentive to change their beliefs.

3.3 Belief Distorted Nash Equilibrium

Definition 3.7

A profile S is an \(\varepsilon \) -belief distorted Nash equilibrium for a collection of beliefs \(B=\{B_{i}\}_{i\in \mathbb {I}}\) (\(\varepsilon \)-BDNE for short) iff it is a pre-BDNE for B and \(l^S(B)(H^S)\ge 1- \varepsilon \).

A 0-BDNE is called a BDNE.

If we assume that players feel an incentive to change their beliefs only if the measure of the beliefs’ ex post consistency is less than \(1-\varepsilon \), then at an \(\varepsilon \)-BDNE, beliefs are never changed.

Proposition 3.1

Theorems 3.2 and 3.3 and Remark 3.1 still hold when pre-BDNE for specific beliefs is replaced by BDNE for those beliefs. \(\square \)

This means that we have equivalence between BDNE and Nash equilibria for those classes of games for which equivalence results hold for pre-BDNE and Nash equilibria: under assumptions of boundedness, when beliefs are independent of a player’s own decision, in games with a continuum of players, one-shot games and repeated games.

3.4 Comparison of BDNE and \(\varepsilon \)-BDNE to Similar Concepts

We can compare the notions of BDNE and \(\varepsilon \)-BDNE introduced in this paper with Nash equilibria, subjective equilibria, as well as BDNE for set-valued beliefs.

First, we compare our concept to Nash equilibria. From Proposition 3.1, Nash equilibria and BDNE coincide, for example, in games with a continuum of players or repeated games with bounded payoffs and when beliefs are independent of a player’s decisions. However, in general, the concept of BDNE is neither equivalent nor an extension to the concept of Nash equilibrium. In the examples considered in Sect. 4, we compare Nash equilibria to BDNE for specific models.

Next, we compare BDNE to the most related concepts of equilibrium in games with incomplete information. When distorted information is considered, as mentioned before, only two of the concepts of equilibrium without complete information can deal with it.

The subjective equilibria of Kalai and Lehrer [9, 10] are used in the environment of repeated games or games that can be repeated. Decisions are taken at each moment without foreseeing the future, and beliefs—a stochastic environmental response function—are based on history and the decision applied at the present stage of the game. It is assumed that the current decision does not influence future play. Hence, players just optimize given their beliefs at each stage separately. The condition applied in subjective equilibrium theory is that beliefs are not contradicted by observations, i.e., that the frequencies of various results correspond to the assumed probability distributions.

When we compare the concept of BDNE for probabilistic beliefs to subjective equilibria, there is an apparent similarity: Beliefs are probabilistic, players optimize the expected value of their payoff given those beliefs, and the condition that beliefs are not contradicted by observations is added. However, subjective equilibria are adapted to repeated games, and their extension to multistage games is not obvious. Moreover, using the subjective equilibrium approach, a player’s beliefs, based on the history observed, describe the probability distribution of reactions to the decision of a player by the unknown system (which plays the role of a statistic in our formulation) at this stage only. Given such beliefs, players optimize their expected payoffs. No equilibrium condition is added, only the condition that the frequencies of various reactions correspond to the assumed belief.

Belief distorted Nash equilibria (BDNE) for set-valued beliefs (the inf-approach), introduced by the author in [14], as our current concept of BDNE, apply to multistage games. At each stage, players choose decisions maximizing, given their belief correspondences, their guaranteed payoffs (for the realization regarded as being the worst possible) from that moment on. In order for such a profile of decisions to be a pre-BDNE, we add the condition that the value of the statistic of the profile which influences players’ payoffs and the behavior of the state variable is foreseen correctly at each stage, as considered in this paper. Under the assumption that beliefs have perfect foresight, in games with a continuum of players, this notion coincides with the concept of Nash equilibrium (a result analogous to Theorem 3.2 of this paper). Finally, a profile is a BDNE iff it is a pre-BDNE and the actual trajectory of the game is in the belief correspondence. In this paper, beliefs are modelled using a set of probability measures instead of a multivalued correspondence, and the optimal expected payoff replaces the optimal guaranteed payoff. A likelihood function is introduced to verify the consistency of beliefs.

It is worth emphasizing that, if we compare set-valued beliefs and probabilistic beliefs with a uniform distribution on the same set, then the concepts of self-verification in both approaches are exactly the same. However, the BDNE are different, since using the inf-approach, the guaranteed future payoff is considered instead of the expected payoff, which leads to more risk averse behavior.

4 Examples

As the first example of a game with distorted information, we consider a model of a renewable resource which is the common property of all its users. This model, in a slightly different formulation, was first defined in Wiszniewska-Matyszkiel [19] and afterward examined in Wiszniewska-Matyszkiel [14] as an example showing some interesting properties of belief distorted Nash equilibria under the inf-approach. Here, we use it to illustrate the exp-approach.

4.1 A Common Ecosystem

Let us consider two versions of a game of exploiting a common ecosystem: either with n players (\(\{1,\dots ,n\}\) with the normalized counting measure) or using the unit interval [0, 1] with the Lebesgue measure to describe the set of players. The statistic is the aggregate of the profile, i.e., \(g(i,a)=a\). The reaction function \(\phi (x,u)=x(1-\max (0,u-\zeta ))\), where \(\zeta >0\) is the regeneration rate, and the initial state is \(\bar{x}>0\). The sets of available strategies are given by \(D_{i}=[0,( 1+\zeta )]\). The current payoff functions are \(P_{i}(a,u,x)=\ln (ax)\), where \(\ln 0\) is understood as \(-\infty \). The discount rate for all players is \(r>0\). The time horizon is \(+\infty \). In this example, the so-called tragedy of the commons is present in a very drastic form—in the continuum of players case, the players deplete the resource in a finite time at every Nash equilibrium.

The fundamental results from Wiszniewska-Matyszkiel [19] regard the Nash equilibria of this game. We need them as the starting point for analysis, since we want to compare pre-BDNE and \(\varepsilon \)-BDNE to Nash equilibria. Rewritten to fit the formulation of this paper, the results regarding Nash equilibria are as follows.

Proposition 4.1

Let \(\mathbb {I}=[0,1]\). No dynamic profile such that any set of players of positive measure get finite payoffs is an equilibrium, and every dynamic profile yielding depletion of the system at any finite time (i.e., \(\exists \bar{t} \text { s.t. } X(\bar{t})=0\)) is a Nash equilibrium. At every Nash equilibrium, for every player, the payoff is \(-\infty \).

Proposition 4.2

Let \(\mathbb {I}=\{1,\dots ,n\}\). \(\bar{S}\equiv \max \left( \frac{nr(1+r)}{1+nr},\zeta \right) \) is a Nash equilibrium, and at every Nash equilibrium, the payoffs are finite.

The Proof of Proposition 4.2 uses a standard technique for solving the Bellman equation, while the proof of Proposition 4.1 applies a decomposition method from Wiszniewska-Matyszkiel [20].

By Theorem 3.2 and Proposition 3.1, in the case of a continuum of players, any Nash equilibrium is a BDNE for perfect foresight.

We are interested in pre-BDNE that are not Nash equilibria. One interesting problem is to find a belief for which the resource is not depleted at any pre-BDNE in the continuum of players case. Moreover, we want to design a belief such that it is enough to “teach” a relatively small set of players, while the others still hold their original beliefs. The belief we are going to consider is of the form—“it is me who can save the system: if I restrict my exploitation to some level, then with probability one the system will not be destroyed within a finite time, while if I exceed this limit, the system will be destroyed in a finite time with positive probability.” Formally, we state the following proposition for a general class of such beliefs.

Proposition 4.3

Let \(\mathbb {I}=[0,1]\). Consider any belief correspondence such that for every \(i\in \mathbb {J}\subset \mathbb {I}\), where \(\mathbb {J}\) is of positive measure, \(t\in \mathbb {T}\), \(H\in \mathbb {H}\), there exist \(\varepsilon _{1},\varepsilon _{2},\varepsilon _{3}>0\) and constants \(\left( 1+\zeta \right)>\varepsilon _{1}>\delta (i,t,H)>0\) such that \(B_{i}(t,a,H)\) assigns a positive measure to the set of histories (Xu) such that for every \(s>t\), \(X(s)=0\) if \(a> \left( 1+\zeta \right) -\delta (i,t,H)\), while if \(a\le (1+\zeta )-\delta (i,t,H)\), then for every \(s>t\), we have \(X(t+1)\ge \varepsilon _{2}\cdot e^{-\varepsilon _{3}t}\) with probability 1. For every profile S which is a pre-BDNE for this belief, for a.e. \(i\in \mathbb {J}\), we have \(S_{i}^{OL}(t)\le (1+\zeta )-\delta (i,t,H)\), and \(X(t)>0\) for every t.

Proof

Obviously, for every player \(i\in \mathbb {J}\), the decision at time t maximizing \({\varPi }_{i}^{e}\) for any strategy profile of the remaining players is not greater than \( (1+\zeta )-\delta (i,t,H)\)—the maximal level of extraction such that \(V_{i}(t+1,B_{i}(t,a,H^{S}))\ne -\infty \), since \(\varPi _{i}^{e}(t,H^{S},S^{OL}(t))\ge \sum _{s=t}^{T}\ln \left( \left( (1+\zeta )-\delta (i,t,H)\right) \cdot \varepsilon _{2}\cdot e^{-\varepsilon _{3}t}\right) \cdot (1+r)^{-t}\ge \) \(\ge \sum _{s=t}^{T}-\varepsilon _{3}\cdot t\cdot \ln \left( \left( (1+\zeta )-\varepsilon _{1}\right) \cdot \varepsilon _{2}\right) \cdot (1+r)^{-t}\) \(> -\infty \).

We have \(X(t_{0})>0.\) If \(\nu :=\int _{\mathbb {J}}\delta (i,t,H)d\lambda (i)\), then \(X(t+1)\ge X(t)\cdot \left( 1-\left( \left( (1+\zeta )-\nu \right) -\zeta \right) \right) =X(t)\cdot \nu >0\), so \(X(t)>0\) implies \(X(t+1)>0\). \(\square \)

This result has an obvious interpretation: Ecological education can make people sacrifice their current utility in order to protect the system even if they, in fact, constitute a continuum. It is sufficient that they believe their decisions really have an influence on the system. The opposite situation is also possible: If people believe that they individually have no influence on the system, then they behave like a continuum. Depletion of the resource, which is impossible at a Nash equilibrium from Proposition 4.2, may happen at a pre-BDNE, which we prove as the next result.

Proposition 4.4

Let \(\mathbb {I}=\{1,\dots ,n\}\). Consider a belief correspondence such that there exists t such that for every i and H, \(B_{i}(t,a,H)\) assigns a positive probability to the set of (Xu) for which for some \(s>t\), \(X(s)=0\). Then any dynamic profile, including profiles S such that for some \(\bar{t}\), \(X^{S}(\bar{t})=0\), is a pre-BDNE.

Proof

For every i, t, and a, \(V_{i}(t+1,B_{i}(t,a,H))=-\infty \). Therefore, each choice of the players is in the set of best responses to such a belief. \(\square \)

Now let us consider the problem of \(\varepsilon \)-self-verification of such beliefs and check whether pre-BDNE are \(\varepsilon \)-BDNE.

Proposition 4.5

  1. (a)

    Let \(\mathbb {J}\) be a set of players of positive measure. Assume that the beliefs of the remaining players, \(\backslash \mathbb {J}\), are independent of their own decisions and they assign probability 0 to the set of histories for which \(X(t)=0\) for some t. There exists a belief that is perfectly \(\varepsilon \)-self-verifying for some \(\varepsilon <1\) such that for each player from \(\mathbb {J}\), the assumptions of Proposition 4.3 are fulfilled.

  2. (b)

    A profile \(\bar{S}\) for which players from \(\mathbb {J}\) choose \((1+\zeta )-\delta (i,t,H)\), while the remaining players choose \((1+\zeta )\) is an \(\varepsilon \)-BDNE for these beliefs.

  3. (c)

    Consider a belief correspondence such that there exists t such that for a.e. i and every H, \(B_{i}(t,a,H)\) assigns a positive probability to the set of histories \(H^{\prime }\) which are admissible (i.e., there exists a profile S such that \(H^{\prime }=H^{S}\)) and, given this, there exists a time moment \(s_t>t\) for which \(X(s_t)=0\). Any such belief correspondence is potentially \(\varepsilon \)-self-verifying for some \(\varepsilon <1\).

  4. (d)

    Every profile \(\bar{S}\) resulting in depletion of the resource in a finite time is an \(\varepsilon \)-BDNE for some beliefs defined in c).

  5. (e)

    Points (a)–(d) hold for \(\varepsilon =0\).

Proof

(a) and (b) We construct such a belief. For the players from \(\backslash \mathbb {J}\), this belief does not depend on a and is concentrated on the set \(\left\{ (X,u):\forall t\ X(t)\ne 0\right\} \). We specify this belief after some calculations.

Let \(\nu :=\lambda (\mathbb {J})\). Consider a strategy profile \(\bar{S}\) such that the players \(i\in \mathbb {J}\) choose \(\bar{S}_{i}(t,H)=\alpha \) for some \( \alpha \in [\zeta , 1+ \zeta ] \), while for \(i\notin \mathbb {J}\), \(\bar{S}_{i}(t,H)=(1+\zeta )\). Then the statistic for this profile at time t is equal to \(u(t)= (1+\zeta )(1-\nu )+\nu \alpha \), while the trajectory corresponding to it fulfills \(X(t+1)=X(t)(1-\max (0,u(t)-\zeta )=X(t)\cdot \nu \cdot \left( (1+\zeta )-\alpha \right) \).

We consider a belief \(B_{i}\) such that for \(s>t\):

  1. (i)

    every history in its support fulfills \(u(s)=(1+\zeta )(1-\nu )+\nu \alpha \),

  2. (ii)

    \(X(s+1)\ge X(s)\cdot \nu \cdot \left( (1+\zeta )-\alpha \right) \) for all \(i\notin \mathbb {J}\) whatever a is and for \(i\in \mathbb {J}\) only for \(a\le \alpha \),

  3. (iii)

    for \(i\in \mathbb {J}\) and \(a>\alpha \), \(B_{i}(t,a,H)\) assigns a positive probability to the set of histories with \(X(s)=0\) for some \(s>t\),

  4. (iv)

    \(L(B_{i}(t,a,H))\ge 1-\varepsilon \) on the set of histories fulfilling (i)–(iii).

For such a belief, the decision maximizing \(\varPi _{i}^{e}\) for every player i from \(\mathbb {J}\) is \(\alpha \), while for the players from \(\backslash \mathbb {J}\) the optimal choice is \((1+\zeta )\). Therefore, all the pre-BDNE for this belief fulfill the above assumptions, which implies perfect \(\varepsilon \)-self-verification.

(c) and (d) Since from Proposition 4.4, every profile is a pre-BDNE for such a belief correspondence, \(\bar{S}\) is a pre-BDNE. Hence, if \(L(B_{i}(t,a,H))\ge 1-\varepsilon \) on a set of admissible trajectories such that for some \(s>t\), \(X(s)=0\), and this set contains \(\bar{S}\), we have potential \(\varepsilon \)-self-verification.

(e) First, rewrite the proof of (a) with the additional assumption in the definition of the beliefs that \(H^{\bar{S}}\) for \(\bar{S}\) from (b) is of maximal likelihood for beliefs along \(H^{\bar{S}}\). Analogously, do the same for \(\bar{S}\) from (d) while rewriting (c). \(\square \)

4.2 The El Farol Bar Problem with a Continuum of Players or a Public Good with Congestion

Here we present an extension of the model presented by Brian Arthur [21] as the El Farol bar problem to a large game. There are players who choose at each time whether to stay at home, represented by 0, or to go to the bar, represented by 1. If the bar is overcrowded, then it is better to stay at home. The less it is crowded, the better it is to go.

Consider the space of players represented by the unit interval with the Lebesgue measure. The game is repeated. Hence, the state variable is trivial and is omitted in the notation. The statistic of a static profile is \(U(\delta ):=\int _{\mathbb {I}}\delta (i)d\lambda (i)\).

In our model, the effects of congestion are reflected by the current payoff function, \(P_{i}(d,u):=d\cdot \left( \frac{1}{2}-u\right) \).

First, we state the equivalence between Nash equilibria, pre-BDNE, BDNE, and subjective equilibria for this model.

Proposition 4.6

  1. (a)

    If the \(B_{i}\) are independent of a, then the set of pre-BDNE coincides with the set of Nash equilibria, which is equal to the set of profiles such that for every t, \(u(t)=\frac{1}{2}\).

  2. (b)

    The union of the sets of BDNE over beliefs that are independent of a player’s own actions coincides with the set of Nash equilibria and pure strategy subjective equilibria. For every profile in this set, for every t, \(u(t)=\frac{1}{2}\).

Proof

  1. (a)

    The former equivalence is implied by Theorem 3.2 or 3.3. The latter one is trivial.

  2. (b)

    We know that the set of pre-BDNE for beliefs independent of a player’s own choice coincides with the set of Nash equilibria. So, the set of BDNE is a subset of the set of Nash equilibria. What remains to be proved is the fact that each Nash equilibrium is a BDNE for some beliefs from this class.

To prove this, let us take a profile S whose statistic, u, is equal to \(\frac{1}{2}\) for all t and beliefs B having perfect foresight for S (so, they are concentrated on this u) and all \(S^{i,d}\). This profile is a pre-BDNE and BDNE for B.

Since every Nash equilibrium is a subjective equilibrium, the only fact that remains to be proved is that at every subjective equilibrium, \(u(t)=\frac{1}{2}\). The environmental response function assigns a probability distribution describing a player’s beliefs about u(t). All the players who believe that \(P[u(t)>\frac{1}{2}]\) is greater than \(P[u(t)<\frac{1}{2}]\) choose 0, while those who believe the opposite choose 1, the remaining players may choose either of the two strategies. If the number of players choosing 0 is greater than the number of those choosing 1 with positive probability, then the event \(u(t)<\frac{1}{2}\) happens more frequently than the event \(u(t)>\frac{1}{2}\), which contradicts the beliefs of the players who choose 0. \(\square \)

Next, let us state some self-verification results.

Proposition 4.7

Consider a belief independent of the players’ own decisions.

  1. (a)

    Assume \(B=\{B_{i}\}_{i\in \mathbb {I}}\) is such that for every profile S which is a pre-BDNE for B, for a.e. i, every \(a\in \{0,1\}\), and every time t, \(L\left( B_{i}(t,a,H^{S})\right) (\frac{1}{2},\frac{1}{2},\ldots )\ge 1-\varepsilon \) for some \(\varepsilon <1\). Then B is perfectly \(\varepsilon \)-self-verifying and S is an \(\varepsilon \)-BDNE.

  2. (b)

    Assume \(B=\{B_{i}\}_{i\in \mathbb {I}}\) is such that for every profile S which is a pre-BDNE for B, there exists \(\bar{t}\) such that for a.e. i, every \(a\in \{0,1\}\), \(B_{i}(\bar{t},a,H^{S})(\{u:\exists t>\bar{t}, u(t)\ne \frac{1}{2}\})=1\) with \(L\left( B_{i}(t,a,H^{S})\right) \) equal to 0 outside this set. For every profile \(\bar{S}\) which is a pre-BDNE for this profile, for a.e. i, we do not have potential \(\varepsilon \)-self-verification for any \(\varepsilon <1\).

\(\square \)

Proposition 4.7 states that every pre-BDNE for beliefs assigning a sufficiently large likelihood to \(u\equiv \frac{1}{2}\), is an \(\varepsilon \)-BDNE and such beliefs are perfectly \(\varepsilon \)-self-verifying, while beliefs which for every pre-BDNE have \(u(t)\ne \frac{1}{2}\) for some t with probability one, are not even potentially \(\varepsilon \)-self-verifying.

4.3 Repeated Prisoner’s Dilemma

Although the concepts of BDNE are better adapted to games with many players, we present a simple example of a two-player game—the Prisoner’s Dilemma—repeated infinitely many times.

There are two players who have two available strategies at each stage: cooperate, coded as 1, and defect, coded as 0. The decisions are made simultaneously. Therefore, a player does not know the decision of their opponent. We assume that the statistic is the whole profile. If both players cooperate, then they get a payoff of C. If they both defect, they get a payoff of N. If only one of the players cooperates, then the cooperator gets a payoff of A, while the defector gets R. These payoffs are ranked as follows \(A<N<C<R\).

Using the notation of this paper, the payoff function can be written as

$$\begin{aligned} P_{i}(a,u)= \left\{ \begin{array}{lll} C &{}&{} \text { for } a,u_{-i}=1;\\ A &{}&{} \text { for } a=1, \ u_{-i}=0;\\ R &{}&{} \text { for } a=0, \ u_{-i}=1;\\ N &{}&{} \text { for } a,u_{-i}=0. \end{array} \right. \end{aligned}$$

Obviously, the strictly dominant pair of defecting strategies (0, 0) is the only Nash equilibrium in the one-stage game, while a sequence of such decisions also constitutes a Nash equilibrium in the repeated game, as well as a BDNE, which we can easily prove by considering beliefs that are independent of a player’s current decision. We check whether a pair of cooperative strategies can also constitute a BDNE. In order to do this, let us consider any beliefs \(\bar{B}\) of the form “if I defect now, then the other player will always defect, while if I cooperate now, then the other player will always cooperate” with \(B_i(t,1,H)\) assigning maximal probability to the history \(H'\) with \(H'|_t=H|_t\) and \(H(s)=[1,1]\) for \(s \ge t\).

The pair of grim trigger \(GT \) strategies “cooperate until the first defection of the other player, then defect” constitutes a Nash equilibrium if the discount rates are small. This does not hold for a pair of “cooperate” \(CE \) strategies.

Proposition 4.8

If both \(r_{i}\) are small enough, then:

  1. (a)

    The pair \((GT ,GT )\) is a BDNE for \(\bar{B}\) and a Nash equilibrium; moreover, the interval of \(r_{i}\) for which \(GT \) is a BDNE is larger than the interval for which it is a Nash equilibrium;

  2. (b)

    All profiles of the same open-loop form as \((GT ,GT )\), including the pairs \((CE ,CE )\) and \((GT ,CE )\), are also BDNE for \(\bar{B}\), while profiles of any other open-loop form are not pre-BDNE for \(\bar{B}\);

  3. (c)

    There exists beliefs \(\bar{B}\) fulfilling (a) which are perfectly self-verifying.

Proof

We consider \(\bar{B}\) such that for every \(H \in \mathbb {H}_{\infty }\), \(\bar{B}_{i}(t,0,H)\) is concentrated on \(\{H' \in \mathbb {H}_{\infty }: \forall s>t \ (H(s))_{-i}=0\}\) and \(\bar{B}_{i}(t,1,H)\) is concentrated on \(\{H' \in \mathbb {H}_{\infty }: \forall s>t \ (H(s))_{-i}=1\}\) with \(B_i(t,1,H)( \{ H': H'|_t=H|_t, \ \forall s \ge t \ H'(s)=(1,1) \} )\) being maximal (over the set of all histories \( H'\) with \(H'|_t=H|_t\)).

(a) We start by proving that the profile \((GT ,GT )\) is a Nash equilibrium. To do this, consider a player’s best response to \(GT \) from moment t onwards. Assume that at time t this player chooses to defect. Then their maximal payoff for such a profile from time t on is \(R+\sum _{s=t+1}^{\infty } \frac{N}{(1+r_{i})^{(s-t)}}\), while by playing \(GT \), their payoff is \(C+\sum _{s=t+1}^{\infty } \frac{C}{(1+r_{i})^{(s-t)}}\). The condition for \(GT \) to be optimal is \((R-C)\cdot r_{i}<C-N\), which holds for small \(r_{i}\).

Next, we prove that \(GT \) is also a BDNE for \(\bar{B}\), i.e., that it is a pre-BDNE and that the actual history is of likelihood 1 at every moment t. Consider moment t and history H.

We have \(V_{i}(t,B_{i}(t,0,H))=\sum _{s=t}^{\infty } \frac{N}{(1+r_{i})^{(s-t)}}=\frac{(1+r_{i})\cdot N}{r_{i}}\) and \(V_{i}(t,B_{i}(t,1,H))=\sum _{s=t}^{\infty } \frac{R}{(1+r_{i})^{(s-t)}}=\frac{(1+r_{i})\cdot R}{r_{i}}\).

Therefore, for player i, without loss of generality player 1, \(\varPi _{1}^{e}(t,H,(0,1))=R+\frac{N}{r_{1}}\), while \(\varPi _{1}^{e}(t,H,(1,1))=C+\frac{R}{r_{1}}\).

Hence, cooperation is better than defection when \((R-C)\cdot r_{1}<R-N\). For these values of \(r_{1}\), \(GT \) is a pre-BDNE for \(\bar{B}\) and no profile with a different open-loop form can be a pre-BDNE for \(\bar{B}\). Since the statistic for \((GT ,GT )\) is equal to (1, 1) at every moment, the likelihood of the resulting history is equal to one at every moment t. Therefore, the measure of the consistency of beliefs is one and the profile \((GT ,GT )\) is a BDNE.

(b) From (a) and the fact that both strategies \(GT \) and \(CE \) behave in the same way if the other player does not defect, which leads to the same open-loop form as \((GT ,GT )\).

(c) The perfect self-verification of \(\bar{B}\) is a consequence of this and the fact that at every pre-BDNE for \(\bar{B}\), the history is \(H^{(GT ,GT )}\equiv (1,1)\), which is of maximal probability and therefore, of likelihood 1 at every moment t. \(\square \)

Since this game is repeated, we can compare the concept of BDNE with subjective equilibria. At a subjective equilibrium, players maximize their expected payoff at each stage given their beliefs about current decision of the opponent. Since defection dominates cooperation, players should defect at each stage, regardless of their beliefs. It should also be noted that under the concept of subjective equilibrium, punishment is impossible.

5 Conclusions

This paper introduces a new notion of equilibrium—Belief Distorted Nash Equilibrium (BDNE) for probabilistic beliefs. The notion of BDNE is especially applicable in dynamic games and repeated games. Existence and equivalence theorems are proved and concepts of self-verification are introduced. These theoretical results are illustrated by examples: extraction of a common renewable resource, a large minority game, and a repeated Prisoner’s Dilemma. The self-verification of various beliefs is analyzed for these examples. In the case of the model of extracting a common resource, the results suggest that appropriate ecological education is of great importance, since, in some cases, it can be the only way to guarantee sustainability. This paper shows that we have to be conscious of the existence of beliefs, which, although often inconsistent with reality, can be regarded as rational if they have the property of self-verification. If we replace the word “beliefs” by “academic models of dynamic decision-making problems of a game-theoretic nature, used by their participants,” then our results indicate the danger that models inconsistent with reality may be regarded as scientifically valid, since they have the property of self-verification: They suggest behavior which results in the confirmation of the theories assumed.