Skip to main content
Log in

Conditional Markov equilibria in discounted dynamic games

  • Original Article
  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

This paper introduces conditional Markov strategies in discrete-time discounted dynamic games with perfect monitoring. These are strategies in which players follow Markov policies after all histories. Policies induced by conditional Markov equilibria can be supported with the threat of reverting to the policy that yields the smallest expected equilibrium payoff for the deviator. This leads to a set-valued fixed-point characterization of equilibrium payoff functions. The result can be used for the computation of equilibria and for showing the existence in behavior strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Sleet and Yeltekin (2003) have one example involving stochastic fluctuations although their present the numerical method only for deterministic games.

  2. There is a slight abuse of notation throughout the paper when we need to write \(y\) as \((y_i,y_{-i})\) or \(\sigma =(\sigma _i,\sigma _{-i})\). For example, instead of writing \(u_i((y_i,y_{-i}),x,w)\) we denote \(u_i(y_i,y_{-i},x,w)\). Respectively, \(u_i(\sigma _i(h),\sigma _{-i}(h),x(h),w)\) stands for \(u_i((\sigma _i(h),\sigma _{-i}(h)),x(h),w)\).

References

  • Abreu D (1986) Extremal equilibria of oligopolistic supergames. J Econ Theory 39(2):191–225

    Article  MathSciNet  MATH  Google Scholar 

  • Abreu D (1988) On the theory of infinitely repeated games with discounting. Econometrica 56(2):383–396

    Article  MathSciNet  MATH  Google Scholar 

  • Abreu D, Pearce D, Stacchetti E (1986) Optimal cartel equilibria with imperfect monitoring. J Econ Theory 39(1):251–269

    Article  MathSciNet  MATH  Google Scholar 

  • Abreu D, Pearce D, Stacchetti E (1990) Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58(5):1041–1063

    Article  MathSciNet  MATH  Google Scholar 

  • Aguirregabiria V, Mira P (2007) Sequential estimation of dynamic discrete games. Econometrica 75(1):1–53

    Article  MathSciNet  MATH  Google Scholar 

  • Bajari P, Benkard L, Levin J (2007) Estimating dynamic models of imperfect competition. Econometrica 75(5):1331–1370

    Article  MathSciNet  MATH  Google Scholar 

  • Berg K, Kitti M (2012) Equilibrium paths in discounted supergames. Working paper

  • Berg K, Kitti M (2013) Computing equilibria in discounted \(2\times 2\) supergames. Comput Econ 41(1):71–88

    Article  Google Scholar 

  • Berry S, Ostrovsky M, Pakes A (2007) Simple estimators for the parameters of discrete dynamic games. RAND J Econ 38(2):373–399

    Article  Google Scholar 

  • Bertsekas DP, Shreve SE (1996) Stochastic optimal control: the discrete time case. Athena Scientific, Belmont, Massachusetts

    Google Scholar 

  • Cole HL, Kocherlakota N (2001) Dynamic games with hidden actions and hidden states. J Econ Theory 98(1):114–126

    Article  MathSciNet  MATH  Google Scholar 

  • Cronshaw MB (1997) Algorithms for finding repeated game equilibria. Comput Econ 10(2):139–168

    Article  MATH  Google Scholar 

  • Cronshaw MB, Luenberger DG (1994) Strongly symmetric subgame perfect equilibria in infinitely repeated games with perfect monitoring. Games Econ Behav 6(2):220–237

    Article  MathSciNet  MATH  Google Scholar 

  • Doraszelski U, Escobar J (2012) Restricted feedback in long term relationships. J Econ Theory 147(1): 142–161

    Google Scholar 

  • Doraszelski U, Pakes A (2007) A framework for applied dynamic analysis in IO. In: Armstrong M, Porter R (eds) Handbook of industrial organization, vol 3. North-Holland, Amsterdam, pp 1887–1966

    Google Scholar 

  • Doraszelski U, Satterthwaite M (2010) Computable Markov-perfect industry dynamics. RAND J Econ 41(2):215–243

    Article  MathSciNet  Google Scholar 

  • Duffie D, Geanakoplos J, Mas-Colell A, McLennan A (1994) Stationary Markov equilibria. Econometrica 62(4):745–781

    Article  MathSciNet  MATH  Google Scholar 

  • Dutta PK, Radner R (2006) A game-theoretic approach to global warming. In: Kasuoka S, Yamazaki A (eds). Advances in mathematical economics, vol 8. Springer-Verlag, Tokyo, pp 135–153

  • Ely JC, Hörner J, Olszewski W (2005) Belief-free equilibria in repeated games. Econometrica 73(2): 377–415

    Google Scholar 

  • Ericson R, Pakes A (1995) Markov-perfect industry dynamics: a framework for empirical work. Rev Econ Stud 62:53–82

    Article  MATH  Google Scholar 

  • Fink A (1964) Equilibrium points of stochastic noncooperative games. Hiroshima Univ Ser A 28:89–93

    MathSciNet  MATH  Google Scholar 

  • Fudenberg D, Levine D (1983) Subgame-perfect equilibria of finite- and infinite-horizon games. J Econ Theory 31(2):251–267

    Article  MathSciNet  MATH  Google Scholar 

  • Judd K, Yeltekin Ş, Conklin J (2003) Computing supergame equilibria. Econometrica 71(4):1239–1254

    Article  MathSciNet  MATH  Google Scholar 

  • Käenmäki A, Vilppolainen M, (2010) Dimension and measures of sub-self-affine sets. Monatshefte für Mathematik 161(3): 271–293

    Google Scholar 

  • Kandori M (2011) Weakly belief-free equilibria in repeated games with private monitoring. Econometrica 79(3):877–892

    Article  MathSciNet  MATH  Google Scholar 

  • Kitti M (2011) Conditionally stationary equilibria in discounted dynamic games. Dyn Games Appl 1(4): 514–533

    Google Scholar 

  • Maitra AP, Sudderth WD (2007) Subgame-perfect equilibria for stochastic games. Math Oper Res 32(3):711–722

    Article  MathSciNet  MATH  Google Scholar 

  • Mertens JF, Parthasarathy T (1991) Nonzerosum stochastic games. In: Raghavan TES, Ferguson TS, Parthasarathy T, Vrieze OJ (eds) Stochastic games and related topics. Kluwer, Boston

    Google Scholar 

  • Nash JF (1951) Non-cooperative games. Ann Math 54(2):286–295

    Article  MathSciNet  MATH  Google Scholar 

  • Phelan C, Stacchetti E (2001) Sequential equilibria in a Ramsey tax model. Econometrica 69(6):1491–1518

    Article  MathSciNet  MATH  Google Scholar 

  • Rockafellar RT, Wets RJ-B (1998) Variational analysis. Springer, Berlin

    Book  MATH  Google Scholar 

  • Shapley LS (1953) Stochastic games. Proc Natl Acad Sci USA 39(10):1095–1100

    Article  MathSciNet  MATH  Google Scholar 

  • Sleet C, Yeltekin Ş (2003) On the computation of value correspondences. Working paper

  • Sleet C, Yeltekin Ş (2006) Optimal taxation with endogenously incomplete asset markets. J Econ Theory 127(1):36–73

    Article  MathSciNet  MATH  Google Scholar 

  • Sleet C, Yeltekin Ş (2007) Recursive monetary policy games with incomplete information. J Econ Dyn Control 31(5):1557–1583

    Article  MathSciNet  MATH  Google Scholar 

  • Solan E (1998) Discounted stochastic games. Math Oper Res 23(4):1010–1021

    Article  MathSciNet  MATH  Google Scholar 

  • Whitt W (1980) Representation and approximation of noncooperative sequential games. SIAM J Control Optim 18(1):33–48

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mitri Kitti.

Additional information

I thank two anonymous referees for their comments. Funding from the Academy of Finland is gratefully acknowledged.

Appendix: Auxiliary Proofs

Appendix: Auxiliary Proofs

Proof of lemmas

2 and 3. The purpose is to first show that if we pick a sequence of equilibrium payoff functions, then there is a policy that gives the payoff function obtained in the limit of a convergent subsequence of the original sequence. First, note that \(\mathbb{E } [u_i(y,x,w)]\) can be replaced with a function \(\bar{u}_i(y,x)\) that is the expected value of \(u_i\) over \(w\), i.e., there is no loss in generality by assuming that \(u\) is only a function of \(y\) and \(x\), see, e.g., Section 8 in Bertsekas and Shreve (1996) for more on this argument. Let \(U^k(x)\) denote the expected payoffs at stage \(k\) for initial state \(x\) and for a given policy profile \(\mu ^0,\mu ^ 1,\ldots \), i.e., \(U^k(x)=\bar{u}(\mu ^k(x),x)\). Moreover, for a given policy profile we can determine the corresponding state transition probabilities \(\text{ Prob}(x^{k+1}|x^k,k)\). It follows that we can find probabilities of states at each stage of the game conditional on the initial state, i.e., \(\text{ Prob}(x|x^0,k)\). Then player \(i\)’s expected payoff in stage \(k\) is

$$\begin{aligned} v_i(x^0,k)=\sum _{j=k}^{\infty } \delta _i^{j-k}\sum _{x\in X}\text{ Prob}(x|x^0,k) U_i^{j} (x) \quad \text{ for} \text{ all}\ k\ge 0. \end{aligned}$$
(6)

If we pick a sequence of equilibrium payoff functions \(\{v^j\}_j\), then it has a convergent subsequence because \(X\) is finite and players’ payoffs are bounded. Let \(v\) be the limit. Recall that functions \(v^j, j\ge 0\), can be associated with finite dimensional vectors because there are finitely many states. Hence, convergence can be considered in the usual Euclidean metric. Then by the usual diagonalization argument it is possible to pick a convergent subsequence in which the terms \(U^{k}(x)\) and \(\text{ Prob}(x|x^0,k)\) corresponding to the elements of the sequence \(\{v^j\}_j\) converge for all \(x,x^0\in X\) and \(k\ge 0\). Let \(\bar{U}^k(x)\) and \(q^k(x|x^0)\) denote the resulting limits. Note that the assumption on the finiteness of \(X\) is crucial for this step. We also obtain the expected payoffs \(v_i(x,k), x\in X, k\ge 0, i\in I\), in the limit, with \(v_i(x)=v_i(x,0), x\in X, i\in I\). Moreover, \(v_i(x^0,k)\) satisfies (6) for \(\bar{U}_i^k(x)\) and \(q^k(x|x^0), x\in X, k\ge 0\). By the compactness of payoffs, there are decision functions \(\mu ^k\in M, k\ge 0\), which lead to these payoffs and probabilities of states. Hence, we can construct a policy which yields the limit payoff \(v\).

Let us now show the result of Lemma 2. The above deduction holds particularly for the sequence of payoff functions \(\{v^j\}_j\) in which the component \(v_i^j(x), j\ge 0\), converges to \(v_i^-(V)(x)=\inf \{v_i: v\in V(x)\}\) for a given \(x\in X\). Consequently, there is a subsequence \(\{v^{j_k}\}_k\) that converges to \(\bar{v}^i\) with \(\bar{v}_i^i(x)=v_i^-(V)(x)\). Corresponding to \(\{v^{j_k}\}_k\) we can construct a sequence of policies \(\{\pi ^{x,i,k}\}_k\) giving these payoffs. As observed previously, we can find a policy corresponding to the limit \(\bar{v}^i\). Consequently, we obtain \(\pi ^{x,i}\in \Pi \) for all \(x\in X\) and \(i\in I\) such that

$$\begin{aligned} \bar{v}_i^ i(x)=v_i(\pi ^{x,i})(x)=\inf \{v_i: v\in S(x)\}. \end{aligned}$$

Let \(p^*\) be the penal code composed of these policies. This penal code is an equilibrium because it gives punishment payoffs that are not larger than any other equilibrium payoffs. To be more specific, we can first observe from Proposition 1 that \(\sigma (\pi ^{x,i,k},p^*)\) is a conditional Markov equilibrium, i.e., \(\mu ^j(\pi ^{x,i,k}), j\ge 0\), are incentive compatible for \(v^{j+1}(\pi ^{x,i,k})\) and \(v(p^*), j\ge 0\). By the compactness of payoffs and finiteness of \(X\) it is possible to find a subsequence of policies \(\{\pi ^{x,i,k_l}\}_l\) such that \(v^j(\pi ^{x,i,k_l})(x^0)\) converge for all \(j\ge 0\) and \(x,x^0\in X\) to \(v^j(\pi ^{x,i})(x^0)\). It follows that the limit is incentive compatible for \(v^{j+1}(\pi ^{x,i})\) and \(v(p^*), j\ge 0\). Note that the inequality in the incentive compatibility condition remains in the limit. Consequently, Proposition 1 implies that \(p^*\) is an equilibrium penal code. This proves Lemma 2.

Let us finally show Lemma 3. As argued previously, for any sequence of payoff functions in \(V\), we have a convergent subsequence, and corresponding to the limit payoff \(v\) of the subsequence we can find a policy \(\pi \) that yields \(v\) as its outcome. Similarly as \(p^*\) is shown to be an equilibrium, it can be shown that \(\sigma (\pi ,p^*)\) is an equilibrium. This implies \(v\in V\), i.e., Lemma 3.

Proof of Lemma

4 Let us begin by showing the first result that \(B\) maps compact sets of payoff functions into compact sets. Since \(X\) is finite, compactness is in the sense of the usual topology defined by the Euclidean metric. By the finiteness of \(X\), payoff functions can be associated with finite dimensional vectors. If we pick a sequence vectors (payoff functions) in \(B(S)\), there is a convergent subsequence \(\{v^j\}_j\) because payoffs are bounded. Moreover, there is \(\mu \) corresponding to the limit \(v\) of this subsequence. The limit satisfies \(v(x)=T(\mu (x),x,v^{\prime })\) for all \(x\in X\) and some \(v^{\prime }\in S\). The payoff function \(v^{\prime }\) can be constructed by diagonalization argument, i.e., choosing a subsequence \(\{v^{j_k}\}_k\) with \(v^{j_k}(x)=T(\mu ^{k}(x),x,\bar{v}^k), \bar{v}^ k\in S, x\in X, k\ge 0\), such that \(v^{\prime }\) is obtained in the limit of \(\{\bar{v}^{j_k}\}_k\). Moreover, \(\mu \) is incentive compatible, i.e., it satisfies \(\mu \in IC(v,S)\). Hence, \(B(S)\) is compact.

The proof of the second result is straightforward: we construct a strategy profile \(\sigma \) corresponding to \(v^0\in B(S)\) for which \(U(\sigma ,x)=v^0(x)\) for all \(x\in X\), and then prove that it is an equilibrium.

Let us take \(v^0\in S\). Then \(v^0\in B(S)\), i.e., there is \(\mu ^0\) and \(v^1\in S\) such that \(v^0(x)=T(\mu ^0(x),x,v^1)\) for all \(x\in X\). We can repeat the same deduction for \(v^1\) and so on. This construction gives us \(\pi =(\mu ^0,\mu ^1,\ldots )\) and the corresponding continuation payoff functions \(v^0,v^1,\ldots \). Furthermore, we can construct \(\pi ^{x,i}\) corresponding to \(v_i^-(S)(x)\). Observe that for each \(v_i^-(S)(x)\) there is a continuation payoff function \(v_i^x\in F_i\) such that \(v_i^x(x)= v_i^-(S)(x)\). The construction of \(\pi ^{x,i}\) is similar to that of \(\pi \). As a result, we get a penal code \(p\). Consequently, we obtain a simple strategy \(\sigma (\pi ,p)\) and by construction \(v^k(x)\) is the expected payoff that the players will get when they follow this strategy starting from period \(k\) and state \(x\in X\). By the definition of \(B\) we have

$$\begin{aligned} \mu ^k(\pi )\in IC\left(v^{k+1},S\right) \quad \text{ and}\quad \mu ^k(\pi ^{x,i})\in IC\left(v^{k+1}(\pi ^{x,i}),S\right)\quad \text{ for} \text{ all}\ k\ge 0. \end{aligned}$$

Proposition 1 implies that \(\sigma (\pi ,p)\) is a conditional Markov equilibrium. Hence, it holds that \(v^0\in V\).

Proof of Lemma

5 Lemma 1 follows directly from the results for dynamic programming models, see, e.g., Section 9.4 in Bertsekas and Shreve (1996).

The results of Lemmas 2 and 3 follow similarly as for pure strategies. Now \(U_i^k(x)\) in the proof is replaced with \(\sum _j \text{ Prob}(y^j|k)\bar{u}_i(y^j,x)\). In the diagonalization argument we pick a convergent subsequence of payoff functions such that also \(\{\text{ Prob}(y^j|k)\}_k\) converge for all \(j\). The result then follows.

The fact that \(B(S)\in C\) when \(S\in C\) follows by taking a convergent sequence of payoffs in \(B(S)\) and observing that the limit payoff function satisfies the incentive compatibility constraint and hence belongs to \(B(S)\). The self generation result, Lemma 4, follows by the same deduction as for pure strategies. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kitti, M. Conditional Markov equilibria in discounted dynamic games. Math Meth Oper Res 78, 77–100 (2013). https://doi.org/10.1007/s00186-013-0433-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00186-013-0433-x

Keywords

Navigation