Dynamic Games and Applications

, Volume 3, Issue 2, pp 187–206 | Cite as

Markov Stationary Equilibria in Stochastic Supermodular Games with Imperfect Private and Public Information

  • Łukasz Balbus
  • Kevin Reffett
  • Łukasz WoźnyEmail author
Open Access


We study a class of discounted, infinite horizon stochastic games with public and private signals and strategic complementarities. Using monotone operators defined on the function space of values and strategies (equipped with a product order), we prove existence of a stationary Markov–Nash equilibrium via constructive methods. In addition, we provide monotone comparative statics results for ordered perturbations of our space of games. We present examples from industrial organization literature and discuss possible extensions of our techniques for studying principal-agent models.


Stochastic games Supermodular games Incomplete information Short memory (Markov) equilibria Constructive methods 

1 Introduction and Related Literature

Since the class of discounted infinite horizon stochastic games was first introduced by Shapley [50], the question of existence and characterization of equilibrium has been the object of extensive study in game theory.1 In addition, more recently, stochastic games have become a fundamental tool for studying dynamic equilibrium in economic models where there is repeated strategic interaction among agents with limited commitment. In many such economic applications, the stochastic games studied assume limited commitment between agents, as well as possessing both public and private information. When private information is introduced into stochastic games, the structure of equilibrium becomes more difficult to analyze, as one must keep track of how each player’s beliefs over the private histories of all the other players evolves over time. Of course, private information can be introduced into the structure of a game in various forms, including private types and/or private monitoring (see, e.g., [32]). In the former case of private types, progress has been made recently by focusing on public strategies and equilibria (see [24] or applications in [8]). In the latter case of private monitoring, authors have often assumed that private monitoring is almost perfect (see [28]), or that sequential equilibrium strategies are belief-free.2

An additional related issue in dynamic games that has received a great deal of attention concerns the assumption of players’ infinite memory. In recent work, economists have begun to analyze situations where players do not have arbitrarily long memory of their own and/or others past moves or states. Given this assumption, the players cannot condition their future actions on arbitrarily long histories.3 Even in this case, the characterization of a short-memory or bounded-recall equilibria is somewhat problematic, as the punishment schemes needed to sustain equilibrium are imposed in a somewhat ad hoc manner, and can depend on the particular’s of the game at hand. Further, because of structure imposed on the game in the name of analytic tractability, restrictive assumptions are often placed on player’s action spaces, as well as the space of private signals/distributions, not to mention public randomization devices or necessity to use mixed strategies.

In this paper, we propose a new approach to analyze games with both public and private information (types). Our motivation is to resolve the aforementioned predicaments in the context of an important class of games, namely games of strategic complementarities. We do this by introducing a simple strategy space, as well as imposing rational expectations concerning the opponent’s private information. Importantly relative to the existing literature, we also allow for uncountable multidimensional state and action spaces, and we assume that players follow Markovian stationary pure strategies. In particular, such Markov stationary Nash equilibrium (MSNE, henceforth) imply a few important characteristics: (i) the imposition of sequential rationality, (ii) the use of minimal state spaces, where the introduction of sunspots or public randomization are not necessary for the existence of equilibrium, and (iii) a relatively direct method to compute both equilibrium values and strategies. It bears mentioning that the resulting MSNE remains an equilibrium in any wider class of strategies including stationary Markov ones.

To obtain our results, our work focuses on stochastic games with strategic complementarities (GSC). It is well known that GSC have proven very useful in applications in economics and operations research in a static context,4 but it turns out to be difficult to adapt existing toolkit to the study of dynamic equilibrium.5 One recent attempt to analyze dynamic supermodular (extensive form) game was undertaken by Balbus et al. [12] in a context of stochastic game with public signals. Here, we focus on the stochastic supermodular games with both public and private shocks, and with our new results, we are able to link the lines of literatures on dynamic supermodular games with that on Bayesian supermodular games [55, 57].

Our paper contributes also to the literature on existence of equilibrium in stochastic games with uncountable state and action spaces without private types. Recall that Mertens and Parthasarathy [38] and Maitra and Sudderth [37] prove existence of subgame perfect Nash equilibrium in a class of such games. It is worth mentioning, however, that existence of MSNE cannot be proved in a general case even if randomization is applied (for an extensive discussion of this fact; see [35]).6 In the class of correlated strategies involving i.i.d. “public randomization,” MSNE have been shown to exist under different assumptions in various papers including [21, 26, 46] and [30]. Recently, Duggan [22] extended the paper by Nowak and Raghavan [46] by expanding the state space where MSNE exist by appealing to the additional “noisy variables.”

In the literature pertaining to economic applications of dynamic/stochastic games, however, the central concern has not been exclusively on the question of weakening conditions under which the existence of equilibrium can be established or various forms of folk theorems (see, e.g., [25]). Rather, the emphasis has also been on characterizing the properties of MSNE from a computational point of view. This approach arises in, for example, calibration approaches to characterizing MSNE (as in macroeconomics), or estimation/simulation methods (as in industrial organization). For such questions, one needs to unify the theory of existence of equilibrium with a theory to numerical implementation, which requires one to present not only (i) constructive arguments to verify existence, but also (ii) sharp characterizations of the set of equilibria being computed, and (iii) methods of relating error analysis to particular approximation schemes at hand. Our paper proposes such a framework for the class of stochastic games we study.

The rest of the paper is organized as follows. Section 2 defines the game and equilibrium concept. Then, in Sect. 3, we prove our main theorem on MSNE existence and computation. Section 4 presents three examples from industrial organization literature. Appendix states the auxiliary theorem we use in our proofs, while Sect. 5 concludes with a discussion of related methods.

2 The Class of Games

Consider an n-person infinite horizon stochastic game with private and public signals in discrete time. That is, in each period t∈{0,1,2,…}=ℕ, every player i initially observes both public signal z t , as well as his own private signal \(\theta_{i}^{t}\). At this stage, players simultaneously undertake actions \(a^{t}=(a_{i}^{t},a_{-i}^{t})\) where \(a_{i}^{t}\) denotes the actions of player i, \(a_{-i}^{t}\) denotes the actions of the remaining players, and a t both (i) yields to each player a current period payoff, as well as (ii) parameterizes a stochastic transition on states that governs the distribution of public and private signals tomorrow. At the end of each period, all actions are then observed by all players, payoffs are distributed, and the game moves forward to the next period.

Formally, the game is a tuple \(\varGamma =(Z,\varTheta ,A,\tilde{A},\mu ,r,q,Q)\), where the elements of these primitives are described as follows:
  • Z is a public shock space, and is an interval in a vector space containing 0 vector, and endowed with the Borel sigma-field \(\mathcal{Z}\).

  • \(\varTheta =\prod _{i=1}^{n}\varTheta_{i}\), where Θ i is a Polish space of private shocks for player i.

  • \(A=\prod _{i=1}^{n}A_{i}\), where the set A i is a closed subset of ℝ k equipped with its Euclidean topology and componentwise partial order representing the action space for player i, with the space A given the product order.

  • \(\tilde{A}(z,\theta )=\prod _{i=1}^{n}\tilde{A}_{i}(z,\theta_{i})\), where \(\tilde{A}_{i}\) is a measurable,7 A i -valued correspondence, where \(\tilde{A}_{i}(z,\theta_{i})\) denotes a nonempty and compact set of actions available for player i when the public shock is z and his private shock is θ i .

  • r i :Z×Θ×A→ℝ+:=[0,∞) is player i reward function for player i, which is assumed to be measurable and uniformly bounded by M<∞.

  • q is a Borel measurable transition probability from Z×Θ×A to Z (i.e., when a public shock is z, the vector of private shocks is (θ 1,…,θ n ), and actions chosen are to be (a 1,…,a n ), then distribution on the continuation realizations of shocks in Z is given by q(⋅∣z,θ,a)).

  • Q is a Borel measurable transition probability from Z to Θ (i.e., when a public shock is z, then vector of private shocks is given by Q(⋅∣z)). Further, let Q i (⋅∣z,θ i ) be a regular conditional distribution on the “other players” private shocks Θ i (i.e., when the public shock is z, and private shock of player i is θ i ).8 In other words, Q i (⋅∣z,θ i ) is a posterior distribution on the other player’s private signals for player i when agent i observes his own private state and the public state. In similar way, we let Q i (⋅∣z,θ i ) denotes a regular conditional distribution player i’s “own” private shocks Θ i .

The players know the history of public shocks, their own private shocks, and their past actions, and let \(H_{i}^{t}\) denote the set of all possible histories of player i up to period t. An element \(h_{i}^{t}\)\(H_{i}^{t}\) is of the form \(h_{i}^{t}=(z^{1},\theta_{i}^{1},a^{1},z^{2},\theta_{i}^{2},a^{2},\ldots ,a^{t-1},z^{t},\theta_{i}^{t})\) where z h Z, \(\theta_{i}^{h}\in \varTheta_{i}\), \(a^{h}\in \tilde{A}(z^{h},\theta^{h})\), 1≤ht. A strategy for player i is then a sequence \(\sigma_{i}:=(\sigma_{i}^{1},\sigma_{i}^{2},\ldots )\), where for each t, \(\sigma_{i}^{t}:H_{i}^{t}\rightarrow A_{i}\) is a measurable mapping such that \(\sigma_{i}^{t}(h_{i}^{t})\in \tilde{A}_{i}(z^{t},\theta_{i}^{t})\). A strategy σ i is Markov if each \(\sigma_{i}^{t}\) depends on current signals/shocks only (i.e., \(\sigma_{i}^{t}(h_{i}^{t})=\sigma_{i}^{t}(z^{t},\theta_{i}^{t})\). A Markov strategy is stationary if \(\sigma_{i}^{1}=\sigma_{i}^{2}=\cdots=\sigma_{i}^{0}\) for some measurable mapping \(\sigma_{i}^{0}\). We denote by σ:=(σ 1,…,σ n ) a profile of Markov stationary strategies.

Suppose player i knows realization of the public shocks, as well as her private shocks, but does not know a realization of private shocks of other players. If the initial public shock is z, and her initial private signal is θ i , then player believes the initial distribution on the others’ private shocks is just Q i (⋅∣z,θ i ), and the evolution of the private shocks \(\theta_{-i}^{t}\) is a Markov chain with a distribution at any step t given by \(Q_{-i}(\cdot \mid z^{t},\theta_{i}^{t})\).

Remark 1

By our assumptions, if the current state is (z,θ), Markov stationary strategy profile is σ, then the distribution of the next state (z′,θ′) is given by measure:
$$ \tilde{Q}(Z_{0}\times T\mid z,\theta ):=\int _{Z_{0}}Q \bigl(T\mid z^{\prime }\bigr)q\bigl(dz^{\prime }\mid z,\theta ,\sigma (z,\theta )\bigr), $$
where Z 0 is a measurable subset of Z and T is a measurable subset of Θ. Notice, player i does not know the realization of θ i , but knows the realization (z,θ i ), and believes that current realization on θ i is given by Q i (⋅∣z,θ i ). Because of this, he believes that the distribution on (z′,θ′) is given by Thus, for arbitrary Markov stationary strategy profile σ, the evolution of public and private state \((z^{t},\theta_{i}^{t})\) for agent i is a Markov decision process with transition probability \(\tilde{Q}_{i}\).

The last remark requires a discussion of the structure of players’ beliefs, as well as the formation of these beliefs in equilibrium. A dynamic game with public states and private (information) types can potentially possess many sequential equilibria (as players can condition their action and beliefs on arbitrary histories). In such a case, the beliefs of any given player relative to the type and/or actions of the other players can matter a great deal in the construction of any such sequential equilibrium. This is true, in particular, for games with no private types (as, for example, analyzed in [2]; APS, henceforth). However, in APS, the authors concentrate on public strategies; therefore, each player’s belief about how his rivals moves is irrelevant in their approach.

Similarly, in this paper, we focus on Markov stationary strategies, and assume players’ use Markovian private beliefs as well (see also [17]). That is, when constructing Markov stationary strategies, the players condition their beliefs on the current state, as well as current private types only. Such a belief structure is rational in our setup (as knowing current state and own type is sufficient for forecasting the continuation structure of the game assuming other players are using Markovian strategies and Markovian-private beliefs). Finally, what guarantees the rationality of such beliefs is our assumption that each period, the distribution on private types depends only on current states.

Let H t ={(z 1,θ 1,a 1,z 2,θ 2,a 2,…,a t−1,z t ,θ t )} be a set of histories of the game up to step t and H ={(z 1,θ 1,a 1,z 2,θ 2,a 2,…)}, both endowed with the product σ-algebra. For every player, given initial public and private states, the transition among public and private states, the profile of strategies σ=(σ 1,…,σ n ), and a belief that others private shocks are changing according to Q(⋅∣z t ), we can generate a sequence of probability measures on histories H t (t<∞). Then, according to the Ionescu–Tulcea theorem (see [15]), we know there exists a measure, say \(P_{i}^{z,\theta _{i},\sigma }\) on H , and a corresponding expected value operator, say \(\mathbb{E}_{i}^{z,\theta _{i},\sigma }\), such that the objective for player i is to maximize lifetime payoffs given by
$$ \gamma_{i}(\sigma ) (z,\theta_{i})=(1-\beta) \mathbb{E}_{i}^{z,\theta _{i},\sigma } \Biggl( \sum _{t=1}^{\infty }\beta^{t-1}r_{i} \bigl(z^{t},\theta^{t},\sigma_{i}^{t} \bigl(z^{t},\theta^{t}\bigr),\sigma_{-i}^{t} \bigl(z^{t},\theta^{t}\bigr)\bigr) \Biggr), $$

Definition 1

A Nash equilibrium in our game is therefore a profile σ from which no unilateral deviation is profitable. That is, σ is a Nash equilibrium if for every player i and her arbitrary strategy σ i :
$$ \forall (z,\theta_{i})\in Z\times \varTheta_{i}\quad \gamma_{i}\bigl(\sigma^{\ast }\bigr) (z,\theta_i) \geq \gamma_{i}\bigl(\sigma_{i},\sigma_{-i}^{\ast } \bigr) (z,\theta_{i}). $$
Any Nash equilibrium that is stationary in Markov strategies is then called MSNE.

3 Main Results

In this section, we build our results on the existence, computation, and equilibrium comparative statics of MSNE in the parameters of the game.

To begin with the existence question, suppose player i knows (z,θ i ) in some period, and believes that the distribution of private shocks for the other agents is Q i (⋅∣z,θ i ). If σ i is a Markov stationary strategy for the other players in the game, and her own action is a i , then her current expected reward is given simply by
$$ R_{i}(z,\theta_{i},a_{i},\sigma_{-i}):= \int _{\varTheta _{-i}}r_{i}\bigl(z,\theta_{i}, \theta_{-i},a_{i},\sigma_{-i}(z, \theta_{-i})\bigr)Q_{-i}(d\theta_{-i}\mid z, \theta_{i}). $$
In line with Remark 1, the expected value from some integrable continuation value v i :Z×Θ i →ℝ+ is computed as Define the following function space for candidate equilibrium values:
$$ \mathcal{V}_{i}:=\bigl\{v_{i}:Z\times \varTheta_{i}\rightarrow [ 0,M]:v_{i}(0, \theta_{i})\equiv 0,v_{i}\mbox{ is Borel measurable}\bigr\}. $$
Also, define a set of Markov stationary strategies to be
$$ \varSigma_{i}:=\bigl\{\sigma_{i}:Z\times \varTheta_{i}\rightarrow A_{i}:\sigma_{i}(z, \theta_{i})\in \tilde{A}_{i}(z,\theta_{i}), \sigma_{i}\mbox{ is Borel measurable}\bigr\}. $$
Observe Σ i is nonempty by the measurable selection theorem of Kuratowski and Ryll-Nardzewski [33]. Denote by \(\mathcal{V}:=\prod _{i=1}^{n}\mathcal{V}_{i}\), and \(\varSigma :=\prod _{i=1}^{n}\varSigma_{i}\) the product space of Markov stationary value functions and strategies, and endow \(\mathcal{V}\times \varSigma \) with its (product) pointwise partial order (i.e., (v 1,σ 1)≤(v 2,σ 2) whenever both \(v_{i}^{1}(z,\theta_{i})\leq v_{i}^{2}(z,\theta_{i})\) and \(\sigma_{i}^{1}(z,\theta_{i})\leq \sigma_{i}^{2}(z,\theta_{i})\)i=1,…,n ∀(z,θ i )∈Z×Θ i ).

We now formulate the assumptions we shall need for our existence theorem.

Assumption 1

Assume that:
  • r i (z,θ,⋅) is continuous on A for each (z,θ)∈Z×Θ and r i (⋅,⋅,a) is measurable on Z×Θ for each aA.

  • \(\tilde{A}_{i}\) is a nonempty, compact, and complete sublatticed-valued correspondence.

  • r i is supermodular9 in a i , has increasing differences10 in (a i ,a i ), is increasing in a i , and
    $$ r_i(0,\theta_i,\theta_{-i},a_i,a_{-i}) \equiv 0 \quad \forall (\theta,a)\in \varTheta\times A. $$
  • q is on the form
    $$ q(\cdot\mid z,\theta,a)=p(\cdot\mid z,\theta,a)+\bigl(1-p(Z\mid z,\theta,a)\bigr) \delta_Z(\cdot), $$
    where δ Z is a Dirac delta on Z concentrated at 0, i.e., δ Z ({0})=1, p(⋅∣z,θ,a) is some measure such that p(Zz,θ,a)<1, and p(Z∣0,θ,a)≡0 ∀(z,θ,a)∈Z×Θ×A.
  • For \(v_{i}\in \mathcal{V}_{i}\) denote \(p(v_{i}\mid z,\theta ,a)=\int_{Z}v_{i}(z^{\prime },\theta_{i}^{\prime })p(dz^{\prime }\mid z,\theta ,a)\), and assume that p(v i z,θ,a) is (a) continuous, supermodular, and increasing in a, and (b) measurable in (z,θ).

Given the assumptions on preferences and stochastic transitions q, we can write down an auxiliary game such that for any continuation value \(v\in \mathcal{V}\), the auxiliary game is a game of strategic complementarities with positive externalities. Further, when q has the specific form in our conditions above, we can preserve supermodularity in the game to each players value function recursively at each stage of the game. We should mention that although this is a powerful technical assumption, the conditions are satisfied in many applications (e.g., see the discussion in [16] for a particular example of this exact structure). Additionally, as we assume positive returns (i.e., r(0,⋅)≡0), our assumptions above assure that the expected continuation value is supermodular in its arguments (as well as monotone in a). This structure is common in the literature. For example, a stronger version of this assumption was introduced by [4], used in a series of papers by [43, 44, 45], as well as studied extensively in the context of games of strategic complementarities with public information in [12]. We refer the reader to our two related papers (see [12, 13]) for a detailed discussion of the nature of these assumptions.

As the next remark indicates, though, our current form of this assumption is significantly weaker than in the existing literature.

Remark 2

Observe that we do not require that p is a probability measure. A typical example of p is: \(p(\cdot\mid z,\theta,a)=\sum^{J}_{j=1}g_{j}(z,\theta,a)\eta_{j}(\cdot\mid z,\theta)\), where η j (⋅∣z,θ) are measures on \(\mathcal{Z} \) and g j :Z×Θ×A→[0,1] are functions with \(\sum^{J}_{j=1} g_{j}(\cdot)\leq 1\). However there are many examples of p that cannot be expressed by a linear combination of stochastic kernels, and still satisfy our assumptions. For example, on Z=A=[0,1], consider p having a density \(\rho_{p}(z^{\prime }\mid z,\theta,a)=\xi(\theta) (\sqrt{z^{\prime }+L(z,a,\theta)}-\sqrt{z^{\prime }} )\) for sufficiently small function ξ and function L increasing in a.

Along these lines, we first introduce the following additional notation. We define for each player i:
$$ W_{i}(z,\theta_{i},a_{i},\sigma_{-i},v_{i})=(1- \beta )R_{i}(z,\theta_{i},a_{i}, \sigma_{-i})+\beta E_{i}(z,\theta_{i},a_{i}, \sigma_{-i},v_{i}), $$
which is expected payoff to any player i who faces continuation v i , with the others using strategy σ i . Define this player’s best response operator to be:
$$ \mathcal{P}_{i}(v_{i},\sigma_{-i}) (z, \theta_{i})=\mbox{arg}\max _{a_{i}\in \tilde{A}_{i}(z,\theta _{i})}W_{i}(z, \theta_{i},a_{i},\sigma_{-i},v_{i}), $$
as well as her corresponding best response value function to be
$$ \mathcal{T}_{i}(v_{i},\sigma_{-i})=\max _{a_{i}\in \tilde{A}(z,\theta _{i})}W_{i}(z,\theta_{i},a_{i}, \sigma_{-i},v_{i}). $$
By \(\mathcal{P}(v,\sigma ):=\prod _{i=1}^{n}\mathcal{P}_{i}(v_{i},\sigma_{-i})\), we denote best responses for all the players. Also, put \(\mathcal{T}(v,\sigma ):=(\mathcal{T}_{1}(v_{1},\sigma_{-1}),\ldots ,\mathcal{T}_{n}(v_{n},\sigma_{-n}))\) to be the vector of value functions for all players induced by these best replies maps.
To construct MSNE, we define a few new mappings. First, consider the correspondence Φ defined on the product space \(\mathcal{V}\times \varSigma \), and defined to be the mapping
$$ \varPhi (v,\sigma ):=\mathcal{T}(v,\sigma )\times \mathcal{P}(v,\sigma ). $$
Using this correspondence, we can define new mappings using the greatest (resp., least selection) from Φ(v,σ) given by
$$ \overline{\varPhi }(v,\sigma )=\bigl(\mathcal{T}(v,\sigma ),\overline{\mathcal{P}}(v,\sigma )\bigr)\quad\text{resp.,}\ \underline{\varPhi }(v, \sigma )=\bigl(\mathcal{T}(v,\sigma ),\underline{\mathcal{P}}(v,\sigma )\bigr), $$
where we have \(\overline{\mathcal{P}}(v,\sigma ):=\prod _{i=1}^{n}\overline{\mathcal{P}}_{i}(v_{i},\sigma_{-i})\) with \(\overline{\mathcal{P}}_{i}(v,\sigma_{-i})\) the greatest selection from \(\mathcal{P}_{i}(v,\sigma_{-i})\). Similarly, define the least selections \(\underline{\mathcal{P}}_{i}\) and \(\underline{\mathcal{P}}\). Notice, this can be done by Lemmas 3 and 4 stated below.

We now make a number of important observations.

Lemma 1

For each σ i Σ i and \(v_{i}\in \mathcal{V}_{i}\), function W i (z,θ i ,a i ,σ i ,v i ) is a Carathéodory function in (z,θ i ) and a i , that is: W i is measurable in (z,θ i ) and continuous in a i .


See Chap. 7 in [15]. □

Lemma 2

E i (z,θ i ,a i ,σ i ,v i ) is supermodular in a i increasing in σ i and has increasing differences in (a i ,σ i ).


Observe that Hence, the assertion follows from Assumption 1, as supermodularity, increasing differences and monotonicity are preserved by summation. □

Lemma 3

For each i, \(\underline{\mathcal{P}}_{i}\) and \(\overline{\mathcal{P}}_{i}\) are increasing in (v i ,σ i ) and both functions \((z,\theta_{i})\to \underline{\mathcal{P}}_{i}(v_{i},\sigma_{-i})(z,\theta_{i})\) and \((z,\theta_{i})\to \overline{\mathcal{P}}_{i}(v_{i},\sigma_{-i})(z,\theta_{i})\) are measurable.


First observe that by Lemma 2 W i is supermodular in a i . We need to show it has increasing differences in (a i ;σ i ,v i ). As increasing differences are preserved by summation, we just need to show that R i and E i have increasing differences separately. Observe that is increasing in σ i as r i has increasing differences in (a i ,σ i ). Similarly, E i has increasing differences. To see that observe that by our assumption has desired increasing differences by monotonicity of p. Thus, W i has increasing differences in (a i ;σ i ,v i ). Hence, by Theorem 6.2. in [54], \(\mathcal{P}_{i}\) is ascending, compact, and sublatticed-valued correspondence from \(\mathcal{V}\times \varSigma\) into itself. As a result \(\underline{\mathcal{P}}_{i}\) and \(\overline{\mathcal{P}}_{i}\) are increasing.
Now the aim is to show that both \(\underline{\mathcal{P}}_{i}(v_{i},\sigma_{-i})\) and \(\overline{\mathcal{P}}_{i}(v_{i},\sigma_{-i})\) are measurable. Measurability of \(\mathcal{P}_{i}(v_{i},\sigma_{-i})\) is a well-known fact. To see that observe: which by Lemma 1.10 of [42] or Theorem 18.19 in [3] is measurable. We now show that extremal selections are measurable. Consider a collection of maximization problems O ij : maxy j such that \(y\in \mathcal{P}_{i}(v_{i},\sigma_{-i})(z,\theta_{i})\), y=(y 1,…,y k )∈ℝ k . Define \(\mathcal{P}_{ij}^{0}(v_{i},\sigma_{-i})(z,\theta_{i})\) as the set of all maxima in the problem O ij . Then the correspondence \((z,\theta_{i}) \to \mathcal{P}_{ij}^{0}(v_{i},\sigma_{-i})(z,\theta_{i})\) is measurable and compact-valued. By Theorem 4.1 in [27],
$$(z,\theta_i) \to Y^0_i(z,\theta_i):= \bigcap_{j=1}^{k} \mathcal{P}_{ij}^0(v_i,\sigma_{-i})(z,\theta_i) $$
is measurable as well. Observe that \(Y^{0}_{i}(z,\theta_{i})\) is single-valued and its element, say \(y^{0}_{i}(z,\theta_{i})\) is a measurable function. Trivially, \(y^{0}_{i}(z,\theta_{i})=\overline{\mathcal{P}}_{i}(v_{i},\sigma_{-i})(z,\theta_{i})\). Similarly we prove a measurability of \(\underline{\mathcal{P}}_{i}(v,\sigma)\). □

Lemma 4

\(\mathcal{T}\) is isotone on \(\mathcal{V}\times \varSigma \) and function \(\mathcal{T}_{i}(v_{i},\sigma_{-i})\) is measurable on Z×Θ i for each (v i ,σ i ), i=1,…,n.


Recall that \(\mathcal{T}_{i}(v_{i},\sigma_{-i})=\max _{a_{i}\in \tilde{A}(z,\theta_{i})}W_{i}(z,\theta_{i},a_{i},\sigma_{-i},v_{i})\). Hence, the monotonicity follows directly from Assumption 1. To show that \(\mathcal{T}_{i}(v_{i},\sigma_{-i})\) is measurable we just apply Lemma 1 and Theorem 18.19 in [3]. □

Lemma 5

\(\overline{\varPhi}\) and \(\underline{\varPhi}\) are isotone and map \(\mathcal{V}\times \varSigma \) into itself.


It follows directly from Lemmas 3 and 4. □

Lemma 6

σ is a MSNE equilibrium with v as a corresponding payoff iff (v ,σ )∈Φ(v ,σ ).


It follows directly from principle of optimality and standard dynamic programming arguments (see [15]). Also observe that σ remains a MSNE if players are allowed to use more general strategies (assuming beliefs are Markov). □

Lemma 7

\(\overline{\varPhi}\) is monotonically inf-preserving and \(\underline{\varPhi}\) is monotonically sup-preserving.11


By Lemmas 3 and 4, we immediately conclude that \(\overline{\varPhi}(v,\sigma)\) and \(\underline{\varPhi}(v,\sigma)\) are well defined. We show that \(\overline{\varPhi}\) is monotonically inf-preserving. Let (σ n ,v n ) be a decreasing sequence and \((v,\sigma)=\bigwedge \{ (v^{n},\sigma^{n})\in (\mathcal{V},\varSigma)\}\). By Lemma 5, \(\overline{\varPhi}(v^{n},\sigma^{n})\) is a decreasing sequence, hence pointwise convergent to some ϕ 0. We need to show that ϕ 0Φ(v,σ). Clearly, by Assumption 1, the function aE i (z,θ i ,a,b i ) is continuous on A for any v i . Applying Fatous lemma for varying measures (see [49], p. 231), we obtain \(E_{i}(z,\theta_{i},a_{i}^{n},\sigma_{-i}^{n},v_{i}^{n})\to E_{i}(z,\theta_{i},a_{i},\sigma_{-i},v_{i})\) whenever \(a_{i}^{n}\to a_{i}\), and \((\sigma_{-i}^{n},v_{i}^{n})\to (\sigma_{-i},v_{i})\) pointwise in (z,θ i ) (as n→∞). Hence, ϕ 0Φ(v,σ), and consequently \(\phi_{0}\le \overline{\varPhi}(v,\sigma)\). On the other hand, observe that \(\overline{\varPhi}(v,\sigma)\le \overline{\varPhi}(v^{n},\sigma^{n})\). Taking a limit, we obtain \(\overline{\varPhi}(v,\sigma)\le \phi^{0}\). Hence, \(\phi_{0}=\overline{\varPhi}(v,\sigma)\) and \(\overline{\varPhi}\) is monotonically inf-preserving. Similarly, we show that \(\underline{\varPhi}\) is monotonically sup-preserving. □

We now define two important sequences.

Definition 2

Let \(\phi_{0}(z,\theta)\equiv (\bigvee\mathcal{V},\bigvee \varSigma)\) and for t≥1:
$$ \phi^{t+1}=\overline{\varPhi}\bigl(\phi^t\bigr). $$
Similarly, let \(\psi_{0}(z,\theta)\equiv (\bigwedge\mathcal{V},\bigwedge \varSigma)\) and for t≥1:
$$ \psi^{t+1}=\underline{\varPhi}\bigl(\psi^t\bigr). $$

Observe, \(\bigvee \mathcal{V}\) is the vector of constant functions that equal to M, while \(\bigwedge \mathcal{V}\) is the vector of constant functions equal to zero.

Having these observations and definitions in hand, we are ready to prove the main results of the paper. We first prove a result on existence and computation of equilibrium values and MSNE:

Theorem 1

Let Assumption 1 be satisfied. Then:
  1. (i)

    there exist pointwise limits ϕ =lim t→∞ ϕ t and ψ =lim t→∞ ψ t .

  2. (ii)

    \(\phi^{*}_{2}\) is a MSNE with \(\phi^{*}_{1}\) as a corresponding payoff vector. Similarly, \(\psi^{*}_{2}\) is a MSNE with \(\psi^{*}_{1}\) as a corresponding payoff vector.

  3. (iii)

    Let f be a MSNE and v its corresponding payoff vector. Then \(\phi^{*}_{2}\ge f^{*}\ge\psi^{*}_{2}\) and \(\phi^{*}_{1}\ge v^{*}\ge\psi^{*}_{1}\).



  1. (i)

    We show that ϕ t is a monotone sequence. Clearly, ϕ 2ϕ 1. Assume that ϕ t ϕ t−1 for some t>1. By Lemma 5, we then have \(\phi^{t+1}\le\overline{\varPhi}(\phi^{t})\le \overline{\varPhi}(\phi^{t-1})=\phi^{t}\). Hence, ϕ t is antitone. Similarly, we show ψ t is isotone. As a result, both of these sequences have a limit.

  2. (ii)

    As ϕ t+1Φ(ϕ t ), ϕ t ϕ by previous step, hence and by Lemma 7, ϕ Φ(ϕ ). Similarly, ψ Φ(ψ ). By Lemma 6, \(\phi_{1}^{*}\) and \(\psi^{*}_{1}\) are Nash equilibria with corresponding payoff vectors \(\phi^{*}_{2}\), and \(\psi^{*}_{2}\).

  3. (iii)
    Let f be an arbitrary stationary Nash equilibrium with a corresponding payoff v . Then by Lemma 6 \((v^{*},f^{*})\in \varPhi(v^{*},f^{*})\subset[\underline{\varPhi}(v^{*},f^{*}),\overline{\varPhi}(v^{*},f^{*})]\). The last inclusion follows from Lemma 7. Clearly, ψ 1≤(v ,f )≤ϕ 1. Assume for some t∈ℕ: By definition of \(\underline{\varPhi}\) and \(\overline{\varPhi}\) and Lemma 5, we have
    $$ \psi^{t+1}=\underline{\varPhi}\bigl(\psi^t\bigr)\le\underline{\varPhi}\bigl(v^*,f^*\bigr)\le \bigl(v^*,f^*\bigr)\le\overline{\varPhi}\bigl(v^*,f^*\bigr)\le \overline{\varPhi}\bigl(\phi^t\bigr)=\phi^{t+1}. $$
    Hence, the inequality in (1) follows for all t. Taking a limit in (1) by step (ii), we receive \(\psi_{2}^{*}\le f^{*}\le\phi^{*}_{2}\) and \(\psi_{1}^{*}\le v^{*}\le\phi^{*}_{1}\).  □

Theorem 1 states a number of things. We start this discussion from our existence result (ii)–(iii), and then move to comment on our approximation result in (i).

First, the theorem establishes existence of MSNE for our infinite horizon game with both public and private shocks; but it does more. It also provides bounds for constructing every MSNE. Moreover, both of these extremal fixed points are actual MSNE, and therefore provide equilibrium bounds for all MSNE. We can also obtain corresponding equilibrium bounds for all MSNE equilibrium values.

Second, as is typical in the literature, to prove the existence of equilibrium, we construct auxiliary one shot game parameterized by continuation values. What is important, though, in our method is that instead of finding the set of Nash equilibria at every period, we parameterize the payoff function of every player by both continuation value function and strategy profile for the actions of the other players. Using this added structure, we are then able to evaluate the best response of the player depending on the strategy of the other players, as well as his continuation value. The advantage of this method is the simplicity of resulting computations that ensue as compared with the computations involved in the APS type methods of Cole and Kocherlakota [17], for example. We comment more on the importance of this simplification in a moment.

Third, our method uses recent results on Bayesian supermodular games in its construction. That is, similar to the papers of Vives [57], or Van Zandt [55], MSNE are not necessarily monotone as a functions of states (private or public); rather, we just impose enough structure on the game to construct operators for value/strategy pairs that are monotone with continuation values and other player strategies. In doing this, we are then able to obtain precisely a dynamic supermodular game. Of course, we can also seek conditions sufficient to prove the existence of monotone Markovian equilibrium (in states). For this case, we simply impose stronger complementarity assumptions in the primitives of the game between actions and states.12

Fourth, the theorem provides a simple iterative algorithm to construct the greatest and least equilibria in our infinite horizon game. More specifically, as compared with other methods (e.g., APS methods), we simultaneously iterate on operators defined in terms of both player values and Markovian strategies. We are able to show in the theorem that our iterations converge in order to Markov equilibrium strategies (as well as their associated equilibrium values). One characterization that is missing here, though, are estimates of the rate of convergence, as well as the accuracy of our approximations to the least and greatest MSNE. To address these latter issues, we can introduce additional metric structure, and study the metric convergence question, and perhaps looks for stronger properties of MSNE (e.g., local Lipschitz structure).

Fifth, our algorithm is simpler than that proposed in [12] for the case of public information, as we do not need to compute equilibria of the auxiliary game at every value function iteration. However, this simplification comes at a cost, as our iterations are not equilibria in truncated finite horizon games. In this sense, our method is similar to that discussed in [52], but very different than the one developed in Balbus and Nowak [10, 11], or Balbus et al. [12] for games with public information.

Sixth, the approach used in the proof of Theorem 1 reminiscent of the iterated elimination of dominated strategies (as discussed, for example, in [57]), but extended to dynamic games. Indeed, as observed by Chassang [16], the simultaneous iterated elimination of dominated Markovian strategies (and corresponding values) leads to convergence in order to extremal MSNE. Recall that this procedure heavily depends on the (Markovian) equilibrium and (Markov-private) beliefs concepts applied.

Finally, we make two more specific remarks on how the results can be strengthened by strengthening conditions on the primitives, or changed by altering order structure on the space of functions where we study the equilibrium existence problem.

Remark 3

If the best replies are functions, then we can strengthen our results by saying that the MSNE set is a countably chain complete. That is, MSNE set is closed under countable sup/infs of chains. It follows from our generalization of Tarski–Kantorovitch fixed-point theorem (see Proposition 1 in the Appendix).

Remark 4

If the order on each \(\mathcal{V}_{i}\) and Σ i is changed to a.e. (where, a.e. refers to private and public signals), then we can conclude using Veinott [56]/Zhou [60] generalization of the Tarski [53] fixed-point theorem that the MSNE set not only has the greatest and least elements, but is also a complete lattice. This follows from that fact the set of bounded, Borel-measurable functions is a sigma-complete lattice, when endowed with pointwise (everywhere) order, but is a complete lattice, when endowed with a.e. order (see [57]). In this paper, we prefer to use pointwise (everywhere) order mainly for comparative statics results presented in Theorem 2 below.

We complete this section on the existence and characterization of MSNE, we conclude with an important corollary of the main theorem.

Corollary 1

MSNE exists in a class of stochastic games satisfying Assumption 1 with perfect monitoring and no private information, i.e., where with probability one θ i =θ j for all players.

Observe, by this corollary, we prove the existence of MSNE in class of games similar to Curtat [19] or Amir [5]. Similar to their work, we let the within-period game exhibit strategic complementarities, but there are also a few specific differences that are worth mentioning. First, we do not require that payoffs or transition probabilities to be Lipschitz continuous, an assumption which appears to be very strong relative to many economic applications. Second, we also do not impose any conditions on payoffs and stochastic transitions that imply (i) “double increasing differences” in player’s payoff structure, or (ii) strong concavity conditions such as strict diagonal dominance that are needed to obtain their existence result. Third, and equally as important, we do not assume any increasing differences between actions and states (hence, our equilibrium strategies are not necessarily increasing on Z). These new results do come at the expense of requiring our assumption on transition Q, which is more specific than required by either Amir or Curtat.

We now present our results on monotone equilibrium comparative statics, and show the set of MSNE is ordered relative to order perturbations of the deep parameters of the game. To do this, consider a parameterized version of our game Γ(ω), where ωΩ is a set of parameters of the game, where Ω is a poset. More specifically, denote by \(\tilde{A}(z,\theta ;\omega )\), r i (z,θ,a;ω) and p(⋅∣z,θ,a;ω) the parameterized versions of our primitive data of the original game, and by \(\phi^{*}_{2,\omega }\) and \(\psi^{*}_{2,\omega }\) the two extremal MSNE computed in Theorem 1 at parameter ω. If \(\phi^{*}_{1,\omega }\) and \(\psi^{*}_{1,\omega }\), denote the corresponding equilibrium payoffs then: \((\phi^{*}_{1,\omega},\phi^{*}_{2,\omega}) = \overline{\varPhi}(\phi^{*}_{1,\omega},\phi^{*}_{2,\omega})\) and \((\psi^{*}_{1,\omega},\psi^{*}_{2,\omega})=\underline{\varPhi}(\psi^{*}_{1,\omega},\psi^{*}_{2,\omega})\).

We make the following assumptions on the parameterized class of games.

Assumption 2

Assume that:
  • For each ωΩ Assumption 1 holds.

  • Each r i has increasing differences in (a i ,ω), and is increasing in ω.

  • p(v i z,θ,a,ω) has increasing differences in (a i ,ω), and is increasing in ω for each \(v_{i}\in \mathcal{V}_{i}\).

  • \(\tilde{A}_{i}(z,\theta )\) does not depend on ω.

With this parameterization complete, we can now state our central equilibrium monotone comparative statics theorem for our class of parameterized games:

Theorem 2

Let Assumption 2 be satisfied. Then extremal MSNE \(\phi^{*}_{2,\omega},\psi^{*}_{2,\omega}\) are monotone with ω on Ω.


Consider a model parameterized by ω. Consider the least MSNE equilibrium as \(\psi^{*}_{2,\omega}(z,\theta)\). We show that \(\psi^{*}_{2,\omega}\) is increasing in ω. To do it, observe that \(\psi^{*}_{\omega}:=(\psi^{*}_{1,\omega},\psi^{*}_{2,\omega})\) is a fixed point of operator \(\underline{\varPhi}(v,\sigma;\omega)\). Clearly, by Lemma 5, \(\underline{\varPhi}\) is increasing in (v,σ) and by Lemma 7 is monotonically sup-preserving. We need to show that this operator is increasing in ω. Let \((\tilde{v}_{i}(\omega),\tilde{\sigma}_{i}(\omega)):=\underline{\varPhi}_{i}( v_{i},\sigma_{-i};\omega)\). By definition of \(\tilde{\sigma}_{i}\), it is a least selection of argmax correspondence of the function a i W i (z,θ i ,θ i ,a i ,σ i ,v i ;ω) over \(\tilde{A}_{i}(z,\theta_{i})\). By Lemma 2, W i is supermodular in a i and \(\tilde{A}_{i}\) does not depend on ω. Analogously, we prove W i has isotone differences in (a i ,ω). Hence, by [54] \(,\tilde{\sigma}_{i}\) is increasing in ω. Since W i is increasing in ω, hence \(\tilde{v}_{i}\) is increasing in ω. This implies that \(\underline{\varPhi}\) is isotone in ω. As by Lemma 7, \(\underline{\varPhi}\) is monotonically sup-preserving, and \(\mathcal{V}\times \varSigma\) is countably chain complete poset, hence by Proposition 1 we obtain that \(\psi^{*}_{\omega}\) is the least element of the set \(K_{\omega}:=\{\psi\in \mathcal{V}\times\varSigma: \underline{\varPhi}(\psi;\omega)\le \psi\}\). To show that \(\psi^{*}_{\omega}\) increases in ω, let ωω′. Then
$$ \psi^*_{\omega'}=\underline{\varPhi}\bigl(\psi^*_{\omega'};\omega'\bigr)\ge \underline{\varPhi}\bigl(\psi^*_{\omega'};\omega\bigr). $$
Hence, \(\psi^{*}_{\omega'} \) is some selection of K ω , while \(\psi^{*}_{\omega}\) is the least selection of this set. Therefore, \(\psi^{*}_{\omega}\le \psi^{*}_{\omega'}\) and \(\psi^{*}_{2,\omega}\le \psi^{*}_{2,\omega'}\). Similarly, we show that \(\phi^{*}_{2,\omega}\) increases in ω. □

We should remark, we are not aware of any similar monotone comparative statics result for dynamic games in the existing literature. Here, our monotone equilibrium comparative statics result follows from the monotonicity of our operators and applications of our extension of Veinott [56] parameterized fixed point theorem to countable chain complete posets (see proof of Theorem 2).

4 Examples

In this section, we present three applications of our methods. In all three examples, the results of our paper can be used to verify existence of the greatest and the least Markov stationary Nash equilibrium, as well as provide methods to compute these extremal equilibria by the simple iterative procedure. Finally, Theorem 2 offers the corresponding result per monotone equilibrium comparative statics.

4.1 Dynamic Price Competition with Private Information

Consider an economy with n firms who are competing for customers buying heterogeneous, but substitutable, goods. Firms have private information concerning their demand parameters θ i ⊂[−ϵ i ,ϵ i ]=Θ i , and there is also a public signal \(z\in Z=[0,\overline{z}]\subset \mathbb{R}_{+}^{n}\) giving each firm partial information on others’ firm demand parameters. More succinctly, let the demand parameter be given by s i (z i ,θ i ).

After observing z (that could, for example, reflect business cycle fluctuations), the individual parameters θ are drawn from the conditional distribution Q(⋅∣z). If the other firms choose prices a i (z,θ i ), the interim payoff of firm i, choosing price \(a_{i}\in [ 0,\bar{a}]\) is given by where D i is a demand. Normalize the profits such that if z i =0, the firm’s i profit is zero (e.g., that turnover is too small to cover the costs, and the company is driven out of the market). As the within period game is Bertrand with heterogeneous firms and substitutable products, the payoff Assumption in 1 is satisfied if demand D i is (a) increasing with others prices, and (b) has increasing differences between (a i ,a i ). Also, as \([0,\bar{a}]\) is single dimensional, payoff is supermodular function of own price. Finally, assume as is standard that C i is increasing and convex.

To interpret this model using the language of our model, let measure p on Z capture the influence on current parameters (θ,z) and prices on tomorrow’s demand parameterized by vector z′. Therefore, apart from technical assumption on measurability, to apply our methods, we only require here that measure p be monotone, supermodular and continuous in prices. This latter condition can be interpreted as the demand substitution between periods (i.e., prices today imply higher probability on positive (\(z\in (0,\overline{z}]\)) demand parameters the next period, as consumers can wait for cheaper prices tomorrow). This effect is stronger if others set higher prices as well via the supermodularity assumption. Indeed, when the company increases its price today, it may lead to a positive demand in the future if the others have also high prices. But if the other firms set low prices today, then such impact is definitely lower, as some clients may want to purchase the competitors good today instead.

4.2 Dynamic R&D Competition with Positive Spillovers and Private Costs

A second application of our results is inspired by d’Aspremont and Jacquemin [20], who analyze a two stage game between oligopolists choosing the R&D expenditure to reduce costs in the first stage, and then in a second stage compete a la Cournot. The authors study the effects of R&D investment spillovers in an (subgame perfect) equilibrium, as well as its optimality. To study such a game, we analyze an infinite horizon R&D competition model, where each period, we embed the two stage game of d’Aspremont and Jacquemin [20], which is played between n oligopolists.

Along these lines, assume that the inverse demand is given by P(Q)=AbQ, where Q=∑ i q i , and the production cost functions are given by \(c_{i}=C_{i}(q_{i})=[\overline{z}-z_{i}-\theta_{i}-a_{i}-\delta \sum_{j\neq i}a_{j}]q_{i}\), where \(z\in Z\subset [ 0,\overline{z}]^{n}\) is a drawn each period common cost parameter, θ i ∈[−ϵ i ,ϵ i ] is noise on the actual cost parameter z i +θ i , δ∈[0,1] is a spillover parameter, and finally a i is a investment in a cost reduction R&D process. The cost of a i units of R&D investment is then given by a i γ i (a i ), which is assumed to be continuous and bounded. Apart from the within period spillovers, higher investment a i has also intertemporal effects via p of increasing probabilities of a positive cost reduction draw tomorrow.

Every period, the profit of an oligopolist assuming the next stage a Cournot equilibrium is played is given by the function π i (z,θ i ,a i ,a i ), where

Observe, for a large R&D spillovers (i.e., δ>0.5), the payoff is increasing in a i (e.g., the top-dog strategy effect is dominated by a spillover effect), and π i (z,θ i ,a i ,a i ) has increasing differences in (a i ,a i ) and (a i ,s). Further, the measure p(⋅∣z,θ,a) satisfies Assumption 1 if intertemporal investment effects are self-reinforcing (i.e., if high R&D investment today has positive effects on positive cost reduction the next period, and this effect is stronger if others invest more). Finally, allowing z=0 to be an absorbing state is justified, e.g., if we have \(\bar{z}+\epsilon_{i}\geq A \), i.e. assumption ruling out production possibilities if the size of the market is too small relative to the unit production cost \(\overline{z}\).

4.3 Dynamic Cournot Competition with Learning-by-Doing and Incomplete Information

Finally, consider an economy where each period, n-firms compete by setting the quantity q i of differentiated product. The goods are assumed to be behavioral complements, i.e., the consumption of one good increases purchase of the complementary products. Additionally, each firm has a individual stochastic learning-by-doing effect influencing its marginal cost function via a parameter s i =z i +θ i measuring cumulative experience of the given firm.

Then profit of a given firm is summarized by
$$ \varPi (z,\theta_{i},q_{i},q_{-i})=q_{i} \bigl[P_{i}\bigl(q_{i},q_{-i}(z, \theta_{-i})\bigr)-c(z_{i}+\theta_{i})\bigr]. $$
Observe that the assumptions on payoffs are satisfied if the cost c is decreasing in the learning-by-doing parameters, P i increasing in q i (i.e., we have complementary goods), and P i has increasing differences in (q i ,q i ) (e.g., we have P i given by the form P i =γq i +∑ ji δ j q j ).

Concerning the learning process, lets assume that joint experience vector \(z\in [ 0,\overline{z}]^{n}\) is stochastic and drawn accordingly to a distribution p, while individual costs parameters \(\theta \in \times_{i=1}^{n}[-\epsilon_{i},\epsilon_{i}]\) which are noise in the learning effect are distributed according to Q. Finally, let supP i ()<c(0+ϵ i ). Then the only restrictive assumption on p we require is that qp(⋅∣z,θ,q) is continuous, increasing, and supermodular. One way of interpreting this condition (from the perspective of complementarity) is the higher output today increases the chance of a positive experience draw next period, and that effect is stronger if others set higher quantities, via spillovers. Under these conditions, all the main theorems of the paper can be applied.

5 Conclusions and Related Techniques

This paper proposes a new set of monotone methods for a class of discounted, infinite horizon stochastic games with both public and private signals, as well as strategic complementarities. The role of strategic complementarities in the development of our methods is critical, as these complementarities allow us to develop monotone methods to study the structure and computation of Markovian equilibrium in our class of games directly.

Our analysis shares some of the properties of the belief-free equilibria studied in [23], as we assume players have (rational) Markovian beliefs that depend only on public and individual signals, and we do not need to model beliefs off the equilibrium path as in their work. Also, as Markovian equilibria are adopted here, we do not allow players to impose punishments schemes inconsistent with Markovian strategies, which is also related to work using belief-free equilibria. Additionally, in our model, public information amounts to signaling the distribution of private information and past moves, rather than signaling current opponents’ actions. Finally, our analysis is very closely related to ideas that are behind the methods proposed by Cole and Kocherlakota [17], who develop methods for solving for (nonstationary) Markov equilibria with Markov beliefs via APS-type methods applied in function spaces.

Per extensions of the results in future work, perhaps the most critical class of models where our stochastic games approach seems most appropriate is the study of Markovian equilibrium in dynamic principal-agent problems, where we have both unobservable information or actions, which is well known to greatly complicate the nature of dynamic equilibrium arrangements. In this literature, there are at least three other techniques used to study similar dynamic principal-agent problems, namely: (i) APS methods, (ii) recursive saddlepoint method and (iii) first-order approaches. The APS approach has proven very useful for verifying existence of sequential equilibrium in broad classes of both repeated and dynamic games (see [9]). This approach focuses on the computation of the equilibrium value set, without a sharp characterization of sequential equilibrium strategies that support any equilibrium value in the equilibrium value set. Further, when these games have state variables (like capital stocks or shocks), additional issues arise in the presence of public and private information over uncountable state spaces. That is, the APS method becomes significantly more complicated as the set of measurable Nash equilibrium values need not be closed in any useful topology (e.g., weak-star topology).

Another important set of techniques for studying limited commitment problems are the so-called “recursive saddlepoint methods” as discussed originally in the seminal work of Kydland and Prescott [34], and further developed in [39], for example. These methods have been shown to be very useful to compute equilibrium in some classes of incentive problems with private information or actions, where primal and dual optimization problems can be appropriately linked. One immediate limitation of such methods is that “punishment schemes” are typically assumed to be “exogenous,” and specified in an ad hoc manner. Further, there are subtle issues associated with the existence and computation of recursive saddlepoints themselves, which is needed to guarantee KKT multipliers are useful and placed in appropriate dual spaces.

Finally, the first-order approaches developed in Ábrahám and Pavoni [1]; Mitchell and Zhang [40] are often useful when they can be rigorously applied. In particular, when problems are concave in equilibrium, one can precisely link the first-order conditions for optimization problem with its actual solutions, e.g., by showing that these first-order conditions are not only necessary but also (locally) sufficient.13 In this sense, the requirements needed to apply such methods are similar to recursive saddlepoint methods. Unfortunately, as in recursive saddlepoint problems, when state variables are present (as, for example, in a dynamic game), conditions on primitives that imply concavity of the value function are very difficult to obtain.

The techniques developed in this current paper have an important technical advantage over all this work, as in the present method, one works directly with both equilibrium strategies and values simultaneously per the existence and computation of equilibrium question without necessity to use first-order conditions, duality, or importantly restricting our results to the ones available using APS-type techniques.


  1. 1.

    See Raghavan et al. [47] or Neyman and Sorin [41] for an extensive survey of results, along with references.

  2. 2.

    For example, this occurs when one studies the case where continuation play is independent on the beliefs on the information set (see [23]).

  3. 3.

    Cf., [14, 18, 29] and [36].

  4. 4.

    See, for example, the excellent survey of Vives [58] for a discussion of the extensive applications of GSC in the economics literature.

  5. 5.

    The only exception to this of which we are aware is (i) the example of dynamic global game presented in [16], and (ii) the analysis of a large industry dynamics game studied in [51].

  6. 6.

    In some classes of games with absolutely continuous transitions (with respect to some probability measure on the state space), approximations of stochastic games with uncountable state space by games with discrete sets of states is possible. In this case, only ϵ-stationary equilibria can be constructed. See Jaśkiewicz and Nowak [31] and their references.

  7. 7.

    That is, lower inverse image of an open set is Borel. In the literature, it is sometimes called a weakly measurable correspondence (see, e.g., [27, 33]). By Theorem 4.1. of Himmelberg [27] or Lemma 18.4 in Aliprantis and Border [3], the product correspondence \((z,\theta ) \to \tilde{A}(z,\theta )\) is also measurable. Finally, in this paper measurability of various mappings means Borel measurability.

  8. 8.

    Existence of regular conditional probability shall follow from standard conditions (e.g., see [7]).

  9. 9.

    Function f:X→ℝ on a lattice X is supermodular iff f(x′∧x)+f(x′∨x)≥f(x′)+f(x) for all x′,xX, where x′∧x=inf{x′,x} and x′∨x=sup{x′,x}.

  10. 10.

    Function f:X×T→ℝ, where X,T are partially ordered sets (posets), has increasing differences in (x,t) iff f(x′,t′)−f(x,t′)≥f(x′,t)−f(x,t) for any t′≥t, x′≥x.

  11. 11.

    A function F:XX, with X a chain complete poset, is monotonically sup (resp. inf)-preserving if for any increasing (resp. decreasing) sequence x n , we have F(⋁x n )=⋁F(x n ) (resp. F(⋀x n )=⋀F(x n )). Here, ⋁ denotes supremum of {x n } and ⋀ its infimum.

  12. 12.

    See, for example, [48] and citations therewithin for a related papers on monotone equilibria in Bayesian games. Also, see [19] or [6] for monotone MSNE in stochastic supermodular games with public information.

  13. 13.

    Observe that our tools actually allow to obtain conditions, where players best replies are characterized by both necessary and sufficient first-order conditions (see [59] for the details).

  14. 14.

    For definition, see footnote 11.



We thank Andrzej Nowak and two anonymous referees for the excellent comments on an earlier draft of this paper. Łukasz Woźny thanks the University of Oxford, UK, for hosting during the writing of this paper. Kevin Reffett acknowledges with gratitude the Centre d’Economie de la Sorbonne (CES) at the Université Paris I for their support of this research during his stay in the summer 2012. All the usual caveats apply.

Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.


  1. 1.
    Ábrahám A, Pavoni N (2008) Efficient allocations with moral hazard and hidden borrowing and lending: a recursive formulation. Rev Econ Dyn 11(4):781–803 CrossRefGoogle Scholar
  2. 2.
    Abreu D, Pearce D, Stacchetti E (1990) Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58(5):1041–1063 MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Aliprantis CD, Border KC (2006) Infinite dimensional analysis. A hitchhiker’s guide. Springer, Heidelberg zbMATHGoogle Scholar
  4. 4.
    Amir R (1996) Strategic intergenerational bequests with stochastic convex production. Econ Theory 8:367–376 zbMATHGoogle Scholar
  5. 5.
    Amir R (2002) Complementarity and diagonal dominance in discounted stochastic games. Ann Oper Res 114:39–56 MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Amir R (2005) Discounted supermodular stochastic games: theory and applications. Manuscript, University of Arizona Google Scholar
  7. 7.
    Ash R (1972) Real Analysis and Probability. Academic Press, New York Google Scholar
  8. 8.
    Athey S, Bagwell K (2008) Collusion with persistent cost shocks. Econometrica 76(3):493–540 MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Atkeson A (1991) International lending with moral hazard and risk of repudiation. Econometrica 59(4):1069–1089 MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Balbus Ł, Nowak AS (2004) Construction of Nash equilibria in symmetric stochastic games of capital accumulation. Math Methods Oper Res 60:267–277 MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Balbus Ł, Nowak AS (2008) Existence of perfect equilibria in a class of multigenerational stochastic games of capital accumulation. Automatica 44(6) Google Scholar
  12. 12.
    Balbus Ł, Reffett K, Woźny Ł(2011) Constructive study of Markov equilibria in stochastic games with strategic complementarities. Manuscript Google Scholar
  13. 13.
    Balbus Ł, Reffett K, Woźny Ł(2012) Stationary Markovian equilibrium in altruistic stochastic OLG models with limited commitment. J Math Econ 48:115–132 zbMATHCrossRefGoogle Scholar
  14. 14.
    Barlo M, Carmona G, Sabourian H (2009) Repeated games with one-memory. J Econ Theory 144(1):312–336 MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Bertsekas D, Shreve S (1978) Stochastic optimal control. The discrete time case. Academic Press, New York zbMATHGoogle Scholar
  16. 16.
    Chassang S (2010) Fear of miscoordination and the robustness of cooperation in dynamic global games with exit. Econometrica 78(3):973–1006 MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Cole HL, Kocherlakota N (2001) Dynamic games with hidden actions and hidden states. J Econ Theory 98(1):114–126 MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Cole HL, Kocherlakota N (2005) Finite memory and imperfect monitoring. Games Econ Behav 53(1):59–72 MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Curtat L (1996) Markov equilibria of stochastic games with complementarities. Games Econ Behav 17:177–199 MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    d’Aspremont C, Jacquemin A (1988) Cooperative and noncooperative R&D in duopoly with spillovers. Am Econ Rev 78(5):1133–1137 Google Scholar
  21. 21.
    Duffie D, Geanakoplos J, Mas-Colell A, McLennan A (1994) Stationary Markov equilibria. Econometrica 62:745–781 MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Duggan J (2012) Noisy stochastic games. Econometrica 80(5):2017–2045 MathSciNetCrossRefGoogle Scholar
  23. 23.
    Ely JC, Hörner J, Olszewski W (2005) Belief-free equilibria in repeated games. Econometrica 73(2):377–415 MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Fudenberg D, Yamamoto Y (2011) Learning from private information in noisy repeated games. J Econ Theory 146(5):1733–1769 MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Fudenberg D, Levine D, Maskin E (1994) The folk theorem with imperfect public information. Econometrica 62(5):997–1039 MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Harris C, Reny P, Robson A (1995) The existence of subgame-perfect equilibrium in continuous games with almost perfect information: a case for public randomization. Econometrica 63(3):507–544 MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Himmelberg C (1975) Measurable relations. Fundam Math 87:53–72 MathSciNetzbMATHGoogle Scholar
  28. 28.
    Hörner J, Olszewski W (2006) The folk theorem for games with private almost-perfect monitoring. Econometrica 74(6):1499–1544 MathSciNetzbMATHCrossRefGoogle Scholar
  29. 29.
    Hörner J, Olszewski W (2009) How robust is the folk theorem? Q J Econ 124(4):1773–1814 CrossRefGoogle Scholar
  30. 30.
    Jaśkiewicz A, Nowak AS (2005) Nonzero-sum semi-Markov games with the expected average payoffs. Math Methods Oper Res 62(1):23–40 MathSciNetzbMATHCrossRefGoogle Scholar
  31. 31.
    Jaśkiewicz A, Nowak AS (2006) Approximation of noncooperative semi-Markov games. J Optim Theory Appl 131(1):115–134 MathSciNetzbMATHCrossRefGoogle Scholar
  32. 32.
    Kandori M (2002) Introduction to repeated games with private monitoring. J Econ Theory 102(1):1–15 MathSciNetzbMATHCrossRefGoogle Scholar
  33. 33.
    Kuratowski K, Ryll-Nardzewski C (1965) A general theorem on selectors. Bull Acad Pol Sci, Sér Sci Math Astron Phys 13:397–403 MathSciNetzbMATHGoogle Scholar
  34. 34.
    Kydland F, Prescott E (1980) Dynamic optimal taxation, rational expectations and optimal control. J Econ Dyn Control 2(1):79–91 CrossRefGoogle Scholar
  35. 35.
    Levy J (2012) A discounted stochastic game with no stationary equilibria: the case of absolutely continuous transitions. Discussion Paper 612, The Hebrew University of Jerusalem Google Scholar
  36. 36.
    Mailath GJ, Olszewski W (2011) Folk theorems with bounded recall under (almost) perfect monitoring. Games Econ Behav 71(1):174–192 MathSciNetzbMATHCrossRefGoogle Scholar
  37. 37.
    Maitra AP, Sudderth WD (2007) Subgame-perfect equilibria for stochastic games. Math Oper Res 32(3):711–722 MathSciNetzbMATHCrossRefGoogle Scholar
  38. 38.
    Mertens JF, Parthasarathy T (2003) Equilibria for discounted stochastic games. In: Neyman A, Sorin S (eds) Stochastic games and applications. NATO Advanced Science Institutes series D: behavioural and social sciences. Kluwer Academic, Dordrecht Google Scholar
  39. 39.
    Messner M, Pavoni N, Sleet C (2012) Recursive methods for incentive problems. Rev Econ Dyn 15:501–525 CrossRefGoogle Scholar
  40. 40.
    Mitchell M, Zhang Y (2010) Unemployment insurance with hidden savings. J Econ Theory 145(6):2078–2107 MathSciNetzbMATHCrossRefGoogle Scholar
  41. 41.
    Neyman A, Sorin S (eds) (2003) Stochastic games and applications. NATO Advanced Science Institutes series D: behavioural and social sciences. Kluwer Academic, Dordrecht zbMATHGoogle Scholar
  42. 42.
    Nowak AS (1984) On zero-sum stochastic games with general state space. I. Probab Math Stat 4:13–32 zbMATHGoogle Scholar
  43. 43.
    Nowak AS (2003) On a new class of nonzero-sum discounted stochastic games having stationary Nash equilibrium points. Int J Game Theory 32:121–132 zbMATHCrossRefGoogle Scholar
  44. 44.
    Nowak AS (2006) On perfect equilibria in stochastic models of growth with intergenerational altruism. Econ Theory 28:73–83 zbMATHCrossRefGoogle Scholar
  45. 45.
    Nowak AS (2007) On stochastic games in economics. Math Methods Oper Res 66(3):513–530 MathSciNetzbMATHCrossRefGoogle Scholar
  46. 46.
    Nowak AS, Raghavan T (1992) Existence of stationary correlated equilibria with symmetric information for discounted stochastic games. Math Oper Res 17:519–526 MathSciNetzbMATHCrossRefGoogle Scholar
  47. 47.
    Raghavan T, Ferguson T, Parthasarathy T, Vrieez O (eds) (1991) Stochastic games and related topics. Kluwer, Dordrecht zbMATHGoogle Scholar
  48. 48.
    Reny PJ (2011) On the existence of monotone pure-strategy equilibria in Bayesian games. Econometrica 79(2):499–553 MathSciNetzbMATHCrossRefGoogle Scholar
  49. 49.
    Royden HL (1965) Real analysis. MacMillan, London Google Scholar
  50. 50.
    Shapley L (1953) Stochastic games. Proc Natl Acad Sci USA 39:1095–1100 MathSciNetzbMATHCrossRefGoogle Scholar
  51. 51.
    Sleet C (2001) Markov perfect equilibria in industries with complementarities. Econ Theory 17:371–397 MathSciNetzbMATHCrossRefGoogle Scholar
  52. 52.
    Szajowski P (2006) Existence of Nash equilibria in two-person stochastic games of resource extraction. Banach Cent Publ 71:291–302 MathSciNetCrossRefGoogle Scholar
  53. 53.
    Tarski A (1955) A lattice-theoretical fixpoint theorem and its applications. Pac J Math 5:285–309 MathSciNetzbMATHCrossRefGoogle Scholar
  54. 54.
    Topkis DM (1978) Minimizing a submodular function on a lattice. Oper Res 26(2):305–321 MathSciNetzbMATHCrossRefGoogle Scholar
  55. 55.
    Van Zandt T (2010) Interim Bayesian Nash equilibrium on universal type spaces for supermodular games. J Econ Theory 145(1):249–263 zbMATHCrossRefGoogle Scholar
  56. 56.
    Veinott (1992) Lattice programming: qualitative optimization and equilibria. Manuscript, Standford Google Scholar
  57. 57.
    Vives X (1990) Nash equilibrium with strategic complementarities. J Math Econ 19(3):305–321 MathSciNetzbMATHCrossRefGoogle Scholar
  58. 58.
    Vives X (2005) Complementarities and games: new developments. J Econ Lit 43(2):437–479 CrossRefGoogle Scholar
  59. 59.
    Woźny Ł, Growiec J (2012) Strategic interactions in human capital accumulation. BE J Theor Econ 12 Google Scholar
  60. 60.
    Zhou L (1994) The set of Nash equilibria of a supermodular game is a complete lattice. Games Econ Behav 7:295–300 zbMATHCrossRefGoogle Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  1. 1.Faculty of Mathematics, Computer Sciences and EconometricsUniversity of Zielona GóraZielona GóraPoland
  2. 2.Department of EconomicsArizona State UniversityTempeUSA
  3. 3.Department of Theoretical and Applied EconomicsWarsaw School of EconomicsWarszawaPoland

Personalised recommendations