Markov Stationary Equilibria in Stochastic Supermodular Games with Imperfect Private and Public Information

We study a class of discounted, infinite horizon stochastic games with public and private signals and strategic complementarities. Using monotone operators defined on the function space of values and strategies (equipped with a product order), we prove existence of a stationary Markov–Nash equilibrium via constructive methods. In addition, we provide monotone comparative statics results for ordered perturbations of our space of games. We present examples from industrial organization literature and discuss possible extensions of our techniques for studying principal-agent models.


Introduction and Related Literature
Since the class of discounted infinite horizon stochastic games was first introduced by Shapley [50], the question of existence and characterization of equilibrium has been the object of extensive study in game theory. 1 In addition, more recently, stochastic games have become a fundamental tool for studying dynamic equilibrium in economic models where there is repeated strategic interaction among agents with limited commitment. In many such economic applications, the stochastic games studied assume limited commitment between agents, as well as possessing both public and private information. When private information is introduced into stochastic games, the structure of equilibrium becomes more difficult to analyze, as one must keep track of how each player's beliefs over the private histories of all the other players evolves over time. Of course, private information can be introduced into the structure of a game in various forms, including private types and/or private monitoring (see, e.g., [32]). In the former case of private types, progress has been made recently by focusing on public strategies and equilibria (see [24] or applications in [8]). In the latter case of private monitoring, authors have often assumed that private monitoring is almost perfect (see [28]), or that sequential equilibrium strategies are belief-free. 2 An additional related issue in dynamic games that has received a great deal of attention concerns the assumption of players' infinite memory. In recent work, economists have begun to analyze situations where players do not have arbitrarily long memory of their own and/or others past moves or states. Given this assumption, the players cannot condition their future actions on arbitrarily long histories. 3 Even in this case, the characterization of a shortmemory or bounded-recall equilibria is somewhat problematic, as the punishment schemes needed to sustain equilibrium are imposed in a somewhat ad hoc manner, and can depend on the particular's of the game at hand. Further, because of structure imposed on the game in the name of analytic tractability, restrictive assumptions are often placed on player's action spaces, as well as the space of private signals/distributions, not to mention public randomization devices or necessity to use mixed strategies.
In this paper, we propose a new approach to analyze games with both public and private information (types). Our motivation is to resolve the aforementioned predicaments in the context of an important class of games, namely games of strategic complementarities. We do this by introducing a simple strategy space, as well as imposing rational expectations concerning the opponent's private information. Importantly relative to the existing literature, we also allow for uncountable multidimensional state and action spaces, and we assume that players follow Markovian stationary pure strategies. In particular, such Markov stationary Nash equilibrium (MSNE, henceforth) imply a few important characteristics: (i) the imposition of sequential rationality, (ii) the use of minimal state spaces, where the introduction of sunspots or public randomization are not necessary for the existence of equilibrium, and (iii) a relatively direct method to compute both equilibrium values and strategies. It bears mentioning that the resulting MSNE remains an equilibrium in any wider class of strategies including stationary Markov ones.
To obtain our results, our work focuses on stochastic games with strategic complementarities (GSC). It is well known that GSC have proven very useful in applications in economics and operations research in a static context, 4 but it turns out to be difficult to adapt existing toolkit to the study of dynamic equilibrium. 5 One recent attempt to analyze dynamic supermodular (extensive form) game was undertaken by Balbus et al. [12] in a context of stochastic game with public signals. Here, we focus on the stochastic supermodular games with both public and private shocks, and with our new results, we are able to link the lines of literatures on dynamic supermodular games with that on Bayesian supermodular games [55,57].
Our paper contributes also to the literature on existence of equilibrium in stochastic games with uncountable state and action spaces without private types. Recall that Mertens and Parthasarathy [38] and Maitra and Sudderth [37] prove existence of subgame perfect Nash equilibrium in a class of such games. It is worth mentioning, however, that existence of MSNE cannot be proved in a general case even if randomization is applied (for an extensive discussion of this fact; see [35]). 6 In the class of correlated strategies involving i.i.d. "public randomization," MSNE have been shown to exist under different assumptions in various papers including [21,26,46] and [30]. Recently, Duggan [22] extended the paper by Nowak and Raghavan [46] by expanding the state space where MSNE exist by appealing to the additional "noisy variables." In the literature pertaining to economic applications of dynamic/stochastic games, however, the central concern has not been exclusively on the question of weakening conditions under which the existence of equilibrium can be established or various forms of folk theorems (see, e.g., [25]). Rather, the emphasis has also been on characterizing the properties of MSNE from a computational point of view. This approach arises in, for example, calibration approaches to characterizing MSNE (as in macroeconomics), or estimation/simulation methods (as in industrial organization). For such questions, one needs to unify the theory of existence of equilibrium with a theory to numerical implementation, which requires one to present not only (i) constructive arguments to verify existence, but also (ii) sharp characterizations of the set of equilibria being computed, and (iii) methods of relating error analysis to particular approximation schemes at hand. Our paper proposes such a framework for the class of stochastic games we study.
The rest of the paper is organized as follows. Section 2 defines the game and equilibrium concept. Then, in Sect. 3, we prove our main theorem on MSNE existence and computation. Section 4 presents three examples from industrial organization literature. Appendix states the auxiliary theorem we use in our proofs, while Sect. 5 concludes with a discussion of related methods.

The Class of Games
Consider an n-person infinite horizon stochastic game with private and public signals in discrete time. That is, in each period t ∈ {0, 1, 2, . . .} = N, every player i initially observes both public signal z t , as well as his own private signal θ t i . At this stage, players simultaneously undertake actions a t = (a t i , a t −i ) where a t i denotes the actions of player i, a t −i denotes the actions of the remaining players, and a t both (i) yields to each player a current period payoff, as well as (ii) parameterizes a stochastic transition on states that governs the distribution of public and private signals tomorrow. At the end of each period, all actions are then observed by all players, payoffs are distributed, and the game moves forward to the next period.
Formally, the game is a tuple Γ = (Z, Θ, A,Ã, μ, r, q, Q), where the elements of these primitives are described as follows: -Z is a public shock space, and is an interval in a vector space containing 0 vector, and endowed with the Borel sigma-field Z.
where the set A i is a closed subset of R k equipped with its Euclidean topology and componentwise partial order representing the action space for player i, with the space A given the product order.
-Ã(z, θ ) = n i=1Ã i (z, θ i ), whereÃ i is a measurable, 7 A i -valued correspondence, wherẽ A i (z, θ i ) denotes a nonempty and compact set of actions available for player i when the public shock is z and his private shock is θ i .
is player i reward function for player i, which is assumed to be measurable and uniformly bounded by M < ∞. q is a Borel measurable transition probability from Z × Θ × A to Z (i.e., when a public shock is z, the vector of private shocks is (θ 1 , . . . , θ n ), and actions chosen are to be (a 1 , . . . , a n ), then distribution on the continuation realizations of shocks in Z is given by q(· | z, θ, a)). -Q is a Borel measurable transition probability from Z to Θ (i.e., when a public shock is z, then vector of private shocks is given by Q(· | z)). Further, let Q −i (· | z, θ i ) be a regular conditional distribution on the "other players" private shocks Θ −i (i.e., when the public shock is z, and private shock of player i is θ i ). 8 In other words, Q −i (· | z, θ i ) is a posterior distribution on the other player's private signals for player i when agent i observes his own private state and the public state. In similar way, we let Q i (· | z, θ −i ) denotes a regular conditional distribution player i's "own" private shocks Θ i .
The players know the history of public shocks, their own private shocks, and their past actions, and let H t i denote the set of all possible histories of player i up to period . A strategy σ i is Markov if each σ t i depends on current signals/shocks only (i.e., σ t i (h t i ) = σ t i (z t , θ t i ). A Markov strategy is stationary if σ 1 i = σ 2 i = · · · = σ 0 i for some measurable mapping σ 0 i . We denote by σ := (σ 1 , . . . , σ n ) a profile of Markov stationary strategies.
Suppose player i knows realization of the public shocks, as well as her private shocks, but does not know a realization of private shocks of other players. If the initial public shock is z, and her initial private signal is θ i , then player believes the initial distribution on the others' private shocks is just Q −i (· | z, θ i ), and the evolution of the private shocks θ t −i is a Markov chain with a distribution at any step t given by Q −i (· | z t , θ t i ).
Remark 1 By our assumptions, if the current state is (z, θ ), Markov stationary strategy profile is σ , then the distribution of the next state (z , θ ) is given by measure: 7 That is, lower inverse image of an open set is Borel. In the literature, it is sometimes called a weakly measurable correspondence (see, e.g., [27,33]). By Theorem 4.1. of Himmelberg [27] or Lemma 18.4 in Aliprantis and Border [3], the product correspondence (z, θ) →Ã(z, θ) is also measurable. Finally, in this paper measurability of various mappings means Borel measurability. 8 Existence of regular conditional probability shall follow from standard conditions (e.g., see [7]).
where Z 0 is a measurable subset of Z and T is a measurable subset of Θ. Notice, player i does not know the realization of θ −i , but knows the realization (z, θ i ), and believes that current realization on θ −i is given by Q −i (· | z, θ i ). Because of this, he believes that the distribution on (z , θ ) is given bỹ Thus, for arbitrary Markov stationary strategy profile σ , the evolution of public and private state (z t , θ t i ) for agent i is a Markov decision process with transition probabilityQ i .
The last remark requires a discussion of the structure of players' beliefs, as well as the formation of these beliefs in equilibrium. A dynamic game with public states and private (information) types can potentially possess many sequential equilibria (as players can condition their action and beliefs on arbitrary histories). In such a case, the beliefs of any given player relative to the type and/or actions of the other players can matter a great deal in the construction of any such sequential equilibrium. This is true, in particular, for games with no private types (as, for example, analyzed in [2]; APS, henceforth). However, in APS, the authors concentrate on public strategies; therefore, each player's belief about how his rivals moves is irrelevant in their approach.
Similarly, in this paper, we focus on Markov stationary strategies, and assume players' use Markovian private beliefs as well (see also [17]). That is, when constructing Markov stationary strategies, the players condition their beliefs on the current state, as well as current private types only. Such a belief structure is rational in our setup (as knowing current state and own type is sufficient for forecasting the continuation structure of the game assuming other players are using Markovian strategies and Markovian-private beliefs). Finally, what guarantees the rationality of such beliefs is our assumption that each period, the distribution on private types depends only on current states.
Let H t = {(z 1 , θ 1 , a 1 , z 2 , θ 2 , a 2 , . . . , a t−1 , z t , θ t )} be a set of histories of the game up to step t and H ∞ = {(z 1 , θ 1 , a 1 , z 2 , θ 2 , a 2 , . . .)}, both endowed with the product σ -algebra. For every player, given initial public and private states, the transition among public and private states, the profile of strategies σ = (σ 1 , . . . , σ n ), and a belief that others private shocks are changing according to Q(· | z t ), we can generate a sequence of probability measures on histories H t (t < ∞). Then, according to the Ionescu-Tulcea theorem (see [15]), we know there exists a measure, say P z,θ i ,σ i on H ∞ , and a corresponding expected value operator, say E z,θ i ,σ i , such that the objective for player i is to maximize lifetime payoffs given by Definition 1 A Nash equilibrium in our game is therefore a profile σ * from which no unilateral deviation is profitable. That is, σ * is a Nash equilibrium if for every player i and her arbitrary strategy σ i : Any Nash equilibrium that is stationary in Markov strategies is then called MSNE.
In this section, we build our results on the existence, computation, and equilibrium comparative statics of MSNE in the parameters of the game.
To begin with the existence question, suppose player i knows (z, θ i ) in some period, and believes that the distribution of private shocks for the other agents is Q −i (· | z, θ i ). If σ −i is a Markov stationary strategy for the other players in the game, and her own action is a i , then her current expected reward is given simply by In line with Remark 1, the expected value from some integrable continuation value v i : Define the following function space for candidate equilibrium values: Also, define a set of Markov stationary strategies to be Observe Σ i is nonempty by the measurable selection theorem of Kuratowski and Ryll-Nardzewski [33]. Denote by V := n i=1 V i , and Σ := n i=1 Σ i the product space of Markov stationary value functions and strategies, and endow V × Σ with its (product) pointwise partial order (i.e., We now formulate the assumptions we shall need for our existence theorem.
Assumption 1 Assume that: -Ã i is a nonempty, compact, and complete sublatticed-valued correspondence.
r i is supermodular 9 in a i , has increasing differences 10 in (a i , a −i ), is increasing in a −i , and where X, T are partially ordered sets (posets), has increasing differences in (x, t) q is on the form where δ Z is a Dirac delta on Z concentrated at 0, i.e., δ Z ( , θ, a), and assume that p(v i | z, θ, a) is (a) continuous, supermodular, and increasing in a, and (b) measurable in (z, θ ).
Given the assumptions on preferences and stochastic transitions q, we can write down an auxiliary game such that for any continuation value v ∈ V, the auxiliary game is a game of strategic complementarities with positive externalities. Further, when q has the specific form in our conditions above, we can preserve supermodularity in the game to each players value function recursively at each stage of the game. We should mention that although this is a powerful technical assumption, the conditions are satisfied in many applications (e.g., see the discussion in [16] for a particular example of this exact structure). Additionally, as we assume positive returns (i.e., r(0, ·) ≡ 0), our assumptions above assure that the expected continuation value is supermodular in its arguments (as well as monotone in a). This structure is common in the literature. For example, a stronger version of this assumption was introduced by [4], used in a series of papers by [43][44][45], as well as studied extensively in the context of games of strategic complementarities with public information in [12]. We refer the reader to our two related papers (see [12,13]) for a detailed discussion of the nature of these assumptions.
As the next remark indicates, though, our current form of this assumption is significantly weaker than in the existing literature.
Remark 2 Observe that we do not require that p is a probability measure. A typical example of p is: However there are many examples of p that cannot be expressed by a linear combination of stochastic kernels, and still satisfy our assumptions. For example, on Z = A = [0, 1], consider p having a density for sufficiently small function ξ and function L increasing in a.
Along these lines, we first introduce the following additional notation. We define for each player i: which is expected payoff to any player i who faces continuation v i , with the others using strategy σ −i . Define this player's best response operator to be: as well as her corresponding best response value function to be we denote best responses for all the players. Also, put to be the vector of value functions for all players induced by these best replies maps.
To construct MSNE, we define a few new mappings. First, consider the correspondence Φ defined on the product space V × Σ , and defined to be the mapping Using this correspondence, we can define new mappings using the greatest (resp., least selection) from Φ(v, σ ) given by Similarly, define the least selections P i and P. Notice, this can be done by Lemmas 3 and 4 stated below.
We now make a number of important observations.
Proof Observe that Hence, the assertion follows from Assumption 1, as supermodularity, increasing differences and monotonicity are preserved by summation.
Proof First observe that by Lemma 2 W i is supermodular in a i . We need to show it has increasing differences in (a i ; σ −i , v i ). As increasing differences are preserved by summation, we just need to show that R i and E i have increasing differences separately. Observe that is increasing in σ −i as r i has increasing differences in (a i , σ −i ). Similarly, E i has increasing differences. To see that observe that by our assumption has desired increasing differences by monotonicity of p. Thus, W i has increasing differences in (a i ; σ −i , v i ). Hence, by Theorem 6.2. in [54], P i is ascending, compact, and sublatticedvalued correspondence from V × Σ into itself. As a result P i and P i are increasing. Now the aim is to show that both is a well-known fact. To see that observe: which by Lemma 1.10 of [42] or Theorem 18.19 in [3] is measurable. We now show that extremal selections are measurable. Consider a collection of maximization problems O ij : max y j such that y is measurable as well. Observe that Y 0 i (z, θ i ) is single-valued and its element, say y 0 i (z, θ i ) is a measurable function. Trivially, y 0 Similarly we prove a measurability of P i (v, σ ).

Lemma 4 T is isotone on
Hence, the monotonicity follows directly from Assumption 1. To show that T i (v i , σ −i ) is measurable we just apply Lemma 1 and Theorem 18.19 in [3].

Lemma 5 Φ and Φ are isotone and map
Proof It follows directly from Lemmas 3 and 4.
Proof It follows directly from principle of optimality and standard dynamic programming arguments (see [15]). Also observe that σ * remains a MSNE if players are allowed to use more general strategies (assuming beliefs are Markov).
We now define two important sequences.
Similarly, let ψ 0 (z, θ ) ≡ ( V, Σ) and for t ≥ 1: Observe, V is the vector of constant functions that equal to M, while V is the vector of constant functions equal to zero.
Having these observations and definitions in hand, we are ready to prove the main results of the paper. We first prove a result on existence and computation of equilibrium values and MSNE:
(iii) Let f * be an arbitrary stationary Nash equilibrium with a corresponding payoff v * . Then by Lemma The last inclusion follows from Lemma 7. Clearly, ψ 1 ≤ (v * , f * ) ≤ φ 1 . Assume for some t ∈ N: By definition of Φ and Φ and Lemma 5, we have Hence, the inequality in (1) follows for all t . Taking a limit in (1) by step (ii), we receive Theorem 1 states a number of things. We start this discussion from our existence result (ii)-(iii), and then move to comment on our approximation result in (i).
First, the theorem establishes existence of MSNE for our infinite horizon game with both public and private shocks; but it does more. It also provides bounds for constructing every MSNE. Moreover, both of these extremal fixed points are actual MSNE, and therefore provide equilibrium bounds for all MSNE. We can also obtain corresponding equilibrium bounds for all MSNE equilibrium values.
Second, as is typical in the literature, to prove the existence of equilibrium, we construct auxiliary one shot game parameterized by continuation values. What is important, though, in our method is that instead of finding the set of Nash equilibria at every period, we parameterize the payoff function of every player by both continuation value function and strategy profile for the actions of the other players. Using this added structure, we are then able to evaluate the best response of the player depending on the strategy of the other players, as well as his continuation value. The advantage of this method is the simplicity of resulting computations that ensue as compared with the computations involved in the APS type methods of Cole and Kocherlakota [17], for example. We comment more on the importance of this simplification in a moment.
Third, our method uses recent results on Bayesian supermodular games in its construction. That is, similar to the papers of Vives [57], or Van Zandt [55], MSNE are not necessarily monotone as a functions of states (private or public); rather, we just impose enough structure on the game to construct operators for value/strategy pairs that are monotone with continuation values and other player strategies. In doing this, we are then able to obtain precisely a dynamic supermodular game. Of course, we can also seek conditions sufficient to prove the existence of monotone Markovian equilibrium (in states). For this case, we simply impose stronger complementarity assumptions in the primitives of the game between actions and states. 12 Fourth, the theorem provides a simple iterative algorithm to construct the greatest and least equilibria in our infinite horizon game. More specifically, as compared with other methods (e.g., APS methods), we simultaneously iterate on operators defined in terms of both player values and Markovian strategies. We are able to show in the theorem that our iterations converge in order to Markov equilibrium strategies (as well as their associated equilibrium values). One characterization that is missing here, though, are estimates of the rate of convergence, as well as the accuracy of our approximations to the least and greatest MSNE. To address these latter issues, we can introduce additional metric structure, and study the metric convergence question, and perhaps looks for stronger properties of MSNE (e.g., local Lipschitz structure).
Fifth, our algorithm is simpler than that proposed in [12] for the case of public information, as we do not need to compute equilibria of the auxiliary game at every value function iteration. However, this simplification comes at a cost, as our iterations are not equilibria in truncated finite horizon games. In this sense, our method is similar to that discussed in [52], but very different than the one developed in Balbus and Nowak [10,11], or Balbus et al. [12] for games with public information.
Sixth, the approach used in the proof of Theorem 1 reminiscent of the iterated elimination of dominated strategies (as discussed, for example, in [57]), but extended to dynamic games. Indeed, as observed by Chassang [16], the simultaneous iterated elimination of dominated Markovian strategies (and corresponding values) leads to convergence in order to extremal MSNE. Recall that this procedure heavily depends on the (Markovian) equilibrium and (Markov-private) beliefs concepts applied.
Finally, we make two more specific remarks on how the results can be strengthened by strengthening conditions on the primitives, or changed by altering order structure on the space of functions where we study the equilibrium existence problem.

Remark 3
If the best replies are functions, then we can strengthen our results by saying that the MSNE set is a countably chain complete. That is, MSNE set is closed under countable sup/infs of chains. It follows from our generalization of Tarski-Kantorovitch fixed-point theorem (see Proposition 1 in the Appendix).

Remark 4
If the order on each V i and Σ i is changed to a.e. (where, a.e. refers to private and public signals), then we can conclude using Veinott [56]/Zhou [60] generalization of the Tarski [53] fixed-point theorem that the MSNE set not only has the greatest and least elements, but is also a complete lattice. This follows from that fact the set of bounded, Borel-measurable functions is a sigma-complete lattice, when endowed with pointwise (everywhere) order, but is a complete lattice, when endowed with a.e. order (see [57]). In this paper, we prefer to use pointwise (everywhere) order mainly for comparative statics results presented in Theorem 2 below.
We complete this section on the existence and characterization of MSNE, we conclude with an important corollary of the main theorem.

Corollary 1 MSNE exists in a class of stochastic games satisfying Assumption 1 with perfect monitoring and no private information, i.e., where with probability one θ i = θ j for all players.
Observe, by this corollary, we prove the existence of MSNE in class of games similar to Curtat [19] or Amir [5]. Similar to their work, we let the within-period game exhibit strategic complementarities, but there are also a few specific differences that are worth mentioning. First, we do not require that payoffs or transition probabilities to be Lipschitz continuous, an assumption which appears to be very strong relative to many economic applications. Second, we also do not impose any conditions on payoffs and stochastic transitions that imply (i) "double increasing differences" in player's payoff structure, or (ii) strong concavity conditions such as strict diagonal dominance that are needed to obtain their existence result. Third, and equally as important, we do not assume any increasing differences between actions and states (hence, our equilibrium strategies are not necessarily increasing on Z). These new results do come at the expense of requiring our assumption on transition Q, which is more specific than required by either Amir or Curtat.
We now present our results on monotone equilibrium comparative statics, and show the set of MSNE is ordered relative to order perturbations of the deep parameters of the game. To do this, consider a parameterized version of our game Γ (ω), where ω ∈ Ω is a set of parameters of the game, where Ω is a poset. More specifically, denote byÃ(z, θ ; ω), r i (z, θ, a; ω) and p (· | z, θ, a; ω) the parameterized versions of our primitive data of the original game, and by φ * 2,ω and ψ * 2,ω the two extremal MSNE computed in Theorem 1 at parameter ω. If φ * 1,ω and ψ * 1,ω , denote the corresponding equilibrium payoffs then: . We make the following assumptions on the parameterized class of games.
• Each r i has increasing differences in (a i , ω), and is increasing in ω.
• p(v i | z, θ, a, ω) has increasing differences in (a i , ω), and is increasing in ω for each With this parameterization complete, we can now state our central equilibrium monotone comparative statics theorem for our class of parameterized games: Proof Consider a model parameterized by ω. Consider the least MSNE equilibrium as ψ * 2,ω (z, θ ). We show that ψ * 2,ω is increasing in ω. To do it, observe that ψ * ω := (ψ * 1,ω , ψ * 2,ω ) is a fixed point of operator Φ(v, σ ; ω). Clearly, by Lemma 5, Φ is increasing in (v, σ ) and by Lemma 7 is monotonically sup-preserving. We need to show that this operator is increasing in ω. Let (ṽ i (ω),σ i (ω)) := Φ i (v i , σ −i ; ω). By definition ofσ i , it is a least selection of argmax correspondence of the function a i → W i (z, θ i , θ −i , a i , σ −i , v i ; ω) over A i (z, θ i ). By Lemma 2, W i is supermodular in a i andÃ i does not depend on ω. Analogously, we prove W i has isotone differences in (a i , ω). Hence, by [54] ,σ i is increasing in ω. Since W i is increasing in ω, henceṽ i is increasing in ω. This implies that Φ is isotone in ω. As by Lemma 7, Φ is monotonically sup-preserving, and V × Σ is countably chain complete poset, hence by Proposition 1 we obtain that ψ * ω is the least element of the set Hence, ψ * ω is some selection of K ω , while ψ * ω is the least selection of this set. Therefore, ψ * ω ≤ ψ * ω and ψ * 2,ω ≤ ψ * 2,ω . Similarly, we show that φ * 2,ω increases in ω.
We should remark, we are not aware of any similar monotone comparative statics result for dynamic games in the existing literature. Here, our monotone equilibrium comparative statics result follows from the monotonicity of our operators and applications of our extension of Veinott [56] parameterized fixed point theorem to countable chain complete posets (see proof of Theorem 2).

Examples
In this section, we present three applications of our methods. In all three examples, the results of our paper can be used to verify existence of the greatest and the least Markov stationary Nash equilibrium, as well as provide methods to compute these extremal equilibria by the simple iterative procedure. Finally, Theorem 2 offers the corresponding result per monotone equilibrium comparative statics.

Dynamic Price Competition with Private Information
Consider an economy with n firms who are competing for customers buying heterogeneous, but substitutable, goods. Firms have private information concerning their demand parameters θ i ⊂ [− i , i ] = Θ i , and there is also a public signal z ∈ Z = [0, z] ⊂ R n + giving each firm partial information on others' firm demand parameters. More succinctly, let the demand parameter be given by s i (z i , θ i ).
After observing z (that could, for example, reflect business cycle fluctuations), the individual parameters θ are drawn from the conditional distribution Q(· | z). If the other firms choose prices a −i (z, θ −i ), the interim payoff of firm i, choosing price a i ∈ [0,ā] is given by where D i is a demand. Normalize the profits such that if z i = 0, the firm's i profit is zero (e.g., that turnover is too small to cover the costs, and the company is driven out of the market). As the within period game is Bertrand with heterogeneous firms and substitutable products, the payoff Assumption in 1 is satisfied if demand D i is (a) increasing with others prices, and (b) has increasing differences between (a i , a −i ). Also, as [0,ā] is single dimensional, payoff is supermodular function of own price. Finally, assume as is standard that C i is increasing and convex.
To interpret this model using the language of our model, let measure p on Z capture the influence on current parameters (θ, z) and prices on tomorrow's demand parameterized by vector z . Therefore, apart from technical assumption on measurability, to apply our methods, we only require here that measure p be monotone, supermodular and continuous in prices. This latter condition can be interpreted as the demand substitution between periods (i.e., prices today imply higher probability on positive (z ∈ (0, z]) demand parameters the next period, as consumers can wait for cheaper prices tomorrow). This effect is stronger if others set higher prices as well via the supermodularity assumption. Indeed, when the company increases its price today, it may lead to a positive demand in the future if the others have also high prices. But if the other firms set low prices today, then such impact is definitely lower, as some clients may want to purchase the competitors good today instead.

Dynamic R&D Competition with Positive Spillovers and Private Costs
A second application of our results is inspired by d'Aspremont and Jacquemin [20], who analyze a two stage game between oligopolists choosing the R&D expenditure to reduce costs in the first stage, and then in a second stage compete a la Cournot. The authors study the effects of R&D investment spillovers in an (subgame perfect) equilibrium, as well as its optimality. To study such a game, we analyze an infinite horizon R&D competition model, where each period, we embed the two stage game of d'Aspremont and Jacquemin [20], which is played between n oligopolists.
Along these lines, assume that the inverse demand is given by P (Q) = A − bQ, where Q = i q i , and the production cost functions are given by is a drawn each period common cost parameter, θ i ∈ [− i , i ] is noise on the actual cost parameter z i + θ i , δ ∈ [0, 1] is a spillover parameter, and finally a i is a investment in a cost reduction R&D process. The cost of a i units of R&D investment is then given by a i → γ i (a i ), which is assumed to be continuous and bounded. Apart from the within period spillovers, higher investment a i has also intertemporal effects via p of increasing probabilities of a positive cost reduction draw tomorrow.
Every period, the profit of an oligopolist assuming the next stage a Cournot equilibrium is played is given by the function π i (z, θ i , a i , a −i ), where Observe, for a large R&D spillovers (i.e., δ > 0.5), the payoff is increasing in a −i (e.g., the top-dog strategy effect is dominated by a spillover effect), and π i (z, θ i , a i , a −i ) has increasing differences in (a i , a −i ) and (a i , s). Further, the measure p(· | z, θ, a) satisfies Assumption 1 if intertemporal investment effects are self-reinforcing (i.e., if high R&D investment today has positive effects on positive cost reduction the next period, and this effect is stronger if others invest more). Finally, allowing z = 0 to be an absorbing state is justified, e.g., if we havez + i ≥ A, i.e. assumption ruling out production possibilities if the size of the market is too small relative to the unit production cost z.

Dynamic Cournot Competition with Learning-by-Doing and Incomplete Information
Finally, consider an economy where each period, n-firms compete by setting the quantity q i of differentiated product. The goods are assumed to be behavioral complements, i.e., the consumption of one good increases purchase of the complementary products. Additionally, each firm has a individual stochastic learning-by-doing effect influencing its marginal cost function via a parameter s i = z i + θ i measuring cumulative experience of the given firm.
Then profit of a given firm is summarized by Observe that the assumptions on payoffs are satisfied if the cost c is decreasing in the learning-by-doing parameters, P i increasing in q −i (i.e., we have complementary goods), and P i has increasing differences in (q i , q −i ) (e.g., we have P i given by the form P i = γ − q i + j =i δ j q j ).
Concerning the learning process, lets assume that joint experience vector z ∈ [0, z] n is stochastic and drawn accordingly to a distribution p, while individual costs parameters θ ∈ × n i=1 [− i , i ] which are noise in the learning effect are distributed according to Q. Finally, let sup P i () < c(0 + i ). Then the only restrictive assumption on p we require is that q → p(· | z, θ, q) is continuous, increasing, and supermodular. One way of interpreting this condition (from the perspective of complementarity) is the higher output today increases the chance of a positive experience draw next period, and that effect is stronger if others set higher quantities, via spillovers. Under these conditions, all the main theorems of the paper can be applied.

Conclusions and Related Techniques
This paper proposes a new set of monotone methods for a class of discounted, infinite horizon stochastic games with both public and private signals, as well as strategic complementarities. The role of strategic complementarities in the development of our methods is critical, as these complementarities allow us to develop monotone methods to study the structure and computation of Markovian equilibrium in our class of games directly.
Our analysis shares some of the properties of the belief-free equilibria studied in [23], as we assume players have (rational) Markovian beliefs that depend only on public and individual signals, and we do not need to model beliefs off the equilibrium path as in their work. Also, as Markovian equilibria are adopted here, we do not allow players to impose punishments schemes inconsistent with Markovian strategies, which is also related to work using belief-free equilibria. Additionally, in our model, public information amounts to signaling the distribution of private information and past moves, rather than signaling current opponents' actions. Finally, our analysis is very closely related to ideas that are behind the methods proposed by Cole and Kocherlakota [17], who develop methods for solving for (nonstationary) Markov equilibria with Markov beliefs via APS-type methods applied in function spaces.
Per extensions of the results in future work, perhaps the most critical class of models where our stochastic games approach seems most appropriate is the study of Markovian equilibrium in dynamic principal-agent problems, where we have both unobservable information or actions, which is well known to greatly complicate the nature of dynamic equilibrium arrangements. In this literature, there are at least three other techniques used to study similar dynamic principal-agent problems, namely: (i) APS methods, (ii) recursive saddlepoint method and (iii) first-order approaches. The APS approach has proven very useful for verifying existence of sequential equilibrium in broad classes of both repeated and dynamic games (see [9]). This approach focuses on the computation of the equilibrium value set, without a sharp characterization of sequential equilibrium strategies that support any equilibrium value in the equilibrium value set. Further, when these games have state variables (like capital stocks or shocks), additional issues arise in the presence of public and private information over uncountable state spaces. That is, the APS method becomes significantly more complicated as the set of measurable Nash equilibrium values need not be closed in any useful topology (e.g., weak-star topology).
Another important set of techniques for studying limited commitment problems are the so-called "recursive saddlepoint methods" as discussed originally in the seminal work of Kydland and Prescott [34], and further developed in [39], for example. These methods have been shown to be very useful to compute equilibrium in some classes of incentive problems with private information or actions, where primal and dual optimization problems can be appropriately linked. One immediate limitation of such methods is that "punishment schemes" are typically assumed to be "exogenous," and specified in an ad hoc manner. Further, there are subtle issues associated with the existence and computation of recursive saddlepoints themselves, which is needed to guarantee KKT multipliers are useful and placed in appropriate dual spaces.
Finally, the first-order approaches developed in Ábrahám and Pavoni [1]; Mitchell and Zhang [40] are often useful when they can be rigorously applied. In particular, when problems are concave in equilibrium, one can precisely link the first-order conditions for optimization problem with its actual solutions, e.g., by showing that these first-order conditions are not only necessary but also (locally) sufficient. 13 In this sense, the requirements needed to apply such methods are similar to recursive saddlepoint methods. Unfortunately, as in recursive saddlepoint problems, when state variables are present (as, for example, in a dynamic game), conditions on primitives that imply concavity of the value function are very difficult to obtain.
The techniques developed in this current paper have an important technical advantage over all this work, as in the present method, one works directly with both equilibrium strategies and values simultaneously per the existence and computation of equilibrium question without necessity to use first-order conditions, duality, or importantly restricting our results to the ones available using APS-type techniques.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Appendix: An Auxiliary Result
Here, we state and prove the following proposition of independent interest. Proposition 1 Let X be a countably chain complete poset (i.e., if x n ∈ X is monotone sequence then its supremum and infimum belongs to X) with the greatest element θ and the least element θ . Let F : X → X be an isotone function. Then: (i) If F is monotonically inf preserving 14 then Φ := F n (θ) is the greatest fixed point and if F is monotonically sup preserving then Φ := F n (θ ) is the least fixed point. (ii) If F is monotonically inf preserving function then (iii) If F is monotonically sup preserving function then (iv) If F is monotonically sup and inf preserving function, then its fixed-point set is a countably chain complete poset. 13 Observe that our tools actually allow to obtain conditions, where players best replies are characterized by both necessary and sufficient first-order conditions (see [59] for the details). 14 For definition, see footnote 11.
Similarly, we prove the thesis for decreasing sequences.