Skip to main content
Log in

Discounted stochastic games with voluntary transfers

  • Research Article
  • Published:
Economic Theory Aims and scope Submit manuscript

Abstract

This paper studies discounted stochastic games with perfect or imperfect public monitoring and the opportunity to conduct voluntary monetary transfers and possibly burn money. This generalization of repeated games with transfers is ideally suited to study relational contracting in applications with long-term investments and also allows to study collusive industry dynamics. We show that for all discount factors every perfect public equilibrium payoff can be implemented with a class of simple equilibria that have a stationary structure on the equilibrium path and optimal penal codes with a stick-and-carrot structure. We develop an algorithm for perfect monitoring to compute the set of equilibrium payoffs and find simple equilibria that implement these payoffs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Baliga and Evans (2000), Fong and Surti (2009), Gjertsen et al. (2010), Miller and Watson (2013), and Goldlücke and Kranz (2013) study renegotiation-proof equilibria in repeated games with transfers.

  2. The algorithms by Abreu and Sannikov (2014) and Abreu et al. (2016) run faster, but are developed for two player games only.

  3. Naturally, the presence of transfers also simplifies computation in the case of imperfect monitoring. The operator from Abreu et al. (1990), which works on payoffs sets and not on policies, needs as input only \(n + 1\) values for every state (see the working paper Goldlücke and Kranz 2016, for details).

  4. Examples include studies of learning by doing by Benkard (2000) and Besanko et al. (2010), advertisement dynamics by Doraszelski and Markovich (2007), consumer learning by Ching (2010), capacity expansion by Besanko and Doraszelski (2004), or network externalities by Markovich and Moenius (2009).

  5. Characterizing the SPE or PPE payoff set can be challenging even in the limit case of the discount factor converging toward 1. While Dutta (1995) established a folk theorem for perfect monitoring, folk theorems for imperfect public monitoring have been derived more recently by Fudenberg and Yamamoto (2011) and Hörner et al. (2011) for irreducible stochastic games.

  6. Besanko et al. (2010) illustrate the multiplicity problem and show how the homotopy method can be used to find multiple MPE. There is, however, still no guarantee that all (pure) MPE are found.

  7. Most of our results (Propositions 1 and 2, Theorems 1 and 2) also hold for the case that A(x) is a compact set in \(\mathbb {\mathbb {R}}^{m}\), for some m, always with the restriction to pure strategies. If the action space in state x is not finite, we assume in addition that stage game payoffs and the probability distribution of signals and new states are continuous functions of the action profile.

  8. We assume that the game is described in a parsimonious way such that there is no state that cannot be reached at all from the initial state by some sequence of actions.

  9. Theorem 1 also holds for mixed strategies, but our results in Sect. 4 require this restriction to pure strategies.

  10. These sets depend on the discount factor, but since the discount factor is fixed, we do not make this dependence explicit. Although the initial state \(x_{0}\) is also fixed, this dependence is made explicit since the set of possible continuation payoffs of a PPE following a history that ends in state x is equal to \(\mathcal {U}(x).\)

  11. In a simple strategy profile, no player makes and receives positive transfers at the same time. Any vector of net payments p can be mapped into a \(n\times (n+1)\)-matrix of gross transfers \(\tilde{p}_{ij}\) (= payment from i to j) as follows. Denote by \(I_{P}=\{i|p_{i}>0\}\) the set of net payers and by \(I_{R}=\{i|p_{i}\le 0\}\cup \{0\}\) the set of net receivers including the sink for burned money indexed by 0. For any receiver \(j\in I_{R}\), we denote by

    $$\begin{aligned} s_{j}=\frac{|p_{j}|}{\sum _{j\in I_{R}}|p_{j}|} \end{aligned}$$

    the share she receives from the total amount that is transferred or burned and assume that each net payer distributes her gross transfers according to these proportions

    $$\begin{aligned} \tilde{p}_{ij}=\left\{ \begin{array}{ll} {s_{j}p_{i}} &{} \hbox {if } {i\in I_{P}}\hbox { and } {j\in I_{R}}\\ 0 &{} \hbox {otherwise}. \end{array}\right. \end{aligned}$$
  12. This is the key intuition for why optimal penal codes can be characterized with stick-and-carrot punishments with a single punishment action profile per player and state. See Abreu (1986) for an early example of stick-and-carrot punishments as well as Acemoglu and Wolitzky (2015) for a recent paper on community enforcement, who show that a specialized enforcer punishment will be used for exactly one period in combination with the less efficient community punishment. In their setting, the punishment power of an inefficient path of play is linked via an incentive constraint to the more efficient enforcer punishment, while in our setting, it is linked to the perfectly efficient punishment of paying a fine.

  13. For a full proof using the formalism of Abreu et al. (1990), see the working paper Goldlücke and Kranz (2016).

  14. This condition has a unique solution since the transition matrix has eigenvalues with absolute value no larger than 1. The solution is given by \(U=(1-\delta )(\mathrm {I}-\delta Q(\alpha ^{e}))^{-1}\Pi (\alpha ^{e}),\) where \(Q(\alpha ^{e})\) is the transition matrix given that players follow the policy \(\alpha ^{e}\).

  15. For details on policy iteration, convergence speed and alternative computation methods to solve Markov Decision Processes, see, e.g., Puterman (1994).

  16. For an example, consider the Cournot game described in Sect. 4.3 below. It has 21 * 21 = 441 states and, depending on the state, a player has between 0 to 20 different stage game actions. If we punish player 1, the number of potentially relevant pure strategy punishment policies a brute-force algorithm has to search is given by the number of pure Markov strategies of player 2. Here, each player has \(\prod _{m_{1}=1}^{20}\prod _{m_{2}=0}^{20}m_{1}=(20!)^{21}\) different pure Markov strategies. This is an incredible large number and renders a brute-force approach infeasible. Yet, in no iteration of the outer loop, does Algorithm 1 need more than just 4 rounds to find an optimal punishment policy.

  17. For a theoretical upper bound, we note that in each iteration in which the algorithm does not stop at least one action profile in at least one state is eliminated. Yet in practice, much fewer iterations are needed, e.g., only 8 iterations in the example in Sect. 4.3.

  18. The replication code is in the file https://github.com/skranz/dyngame/blob/master/examples/article/4_3_Cournot.Rmd.

  19. ABS have provided a well-documented open source C++ implementation of their algorithm (see http://babrooks.github.io/SGSolve/). We have written an R interface to their library (see https://github.com/skranz/RSGSolve) and included some functionality in our dyngame package that allows to quickly compare the solutions and algorithmic performance for stochastic games with and without transfers. The replication code of this analysis is in the file https://github.com/skranz/dyngame/blob/master/examples/article/4_4_ABS_vs_GK.Rmd.

  20. That transfers do not change the critical discount factor is due to two facts: i) the game and optimal equilibrium strategies are symmetric, i.e., no transfers are needed on the equilibrium path to smooth incentives constraints and ii) the harshest punishment is the MPE of always playing (D,D), which can also be implemented without transfers.

  21. Note that every equilibrium payoff without transfers can also be implement in the corresponding game with transfers. Yet, looking precisely in Fig.  5, one sees that one corner point of ABS’s payoff set even lies slightly outside our payoff set for the game with transfers. This result can be due to the fact that ABS, like also Yeltekin et al. (2015), only compute an outer approximation of the equilibrium payoff set.

  22. The ABS algorithm can be customized with several parameters, e.g., 13 parameters that specify different types of tolerances. We have run it with the default parameters that ABS specify in their code. In some cases, the ABS algorithm did not find a solution, with the error code: “Caught the following exception: bestAction==NULL. Could not find an admissible direction.” With different specifications of the tolerances a solution could be found in some of those cases. We decided to stick with the default configuration for creating this table, however.

  23. Although those results were derived only for a finite action space, they go through also for compact subsets of \(\mathbb {R}^{m}\).

References

  • Abreu, D.: Extremal equilibria of oligopolistic supergames. J. Econ. Theory 39(1), 191–225 (1986)

    Article  Google Scholar 

  • Abreu, D.: On the theory of infinitely repeated games with discounting. Econometrica 56(2), 383–396 (1988)

    Article  Google Scholar 

  • Abreu, D., Brooks, B., Sannikov, Y.: A ’Pencil Sharpening’ algorithm for two player stochastic games with perfect monitoring (February 11, 2016). Princeton University William S. Dietrich II Economic Theory Center Research Paper No. 078_2016 (2016)

  • Abreu, D., Pearce, D., Stacchetti, E.: Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58(5), 1041–1063 (1990)

    Article  Google Scholar 

  • Abreu, D., Sannikov, Y.: An algorithm for two player repeated games with perfect monitoring. Theor. Econ. 9, 313–338 (2014)

    Article  Google Scholar 

  • Acemoglu, D., Wolitzky A.: Sustaining Cooperation: Community Enforcement vs. Specialized Enforcement. NBER Working Paper No. 21457 (2015)

  • Baliga, S., Evans, R.: Renegotiation in repeated games with side-payments. Games Econ. Behav. 33(2), 159–176 (2000)

    Article  Google Scholar 

  • Benkard, C.L.: Learning and forgetting: the dynamics of aircraft production. Am. Econ. Rev. 90(4), 1034–1054 (2000)

    Article  Google Scholar 

  • Besanko, D., et al.: Learning-by-doing, organizational forgetting, and industry dynamics. Econometrica 78(2), 453–508 (2010)

    Article  Google Scholar 

  • Besanko, D., Doraszelski, U.: Capacity dynamics and endogenous asymmetries in firm size. Rand J. Econ. 35(1), 23–49 (2004)

    Article  Google Scholar 

  • Ching, A.T.: A dynamic oligopoly structural model for the prescription drug market after patent expiration. Int. Econ. Rev. 51(4), 1175–1207 (2010)

    Article  Google Scholar 

  • Doornik, K.: Relational contracting in partnerships. J. Econ. Manag. Strategy 15(2), 517–548 (2006)

    Article  Google Scholar 

  • Doraszelski, U., Markovich, S.: Advertising dynamics and competitive advantage. Rand J. Econ. 38(3), 557–592 (2007)

    Article  Google Scholar 

  • Dutta, P.K.: A folk theorem for stochastic games. J. Econ. Theory 66(1), 1–32 (1995)

    Article  Google Scholar 

  • Fong, Y., Surti, J.: On the optimal degree of cooperation in the repeated Prisoner’s Dilemma with side payments. Games Econ. Behav. 67(1), 277–291 (2009)

    Article  Google Scholar 

  • Fudenberg, D., Yamamoto, Y.: The folk theorem for irreducible stochastic games with imperfect public monitoring. J. Econ. Theory 146, 1664–1683 (2011)

    Article  Google Scholar 

  • Gjertsen H., Groves T., Miller D., Niesten E., Squires D., Watson J.: A Contract-Theoretic Model of Conservation Agreements, mimeo (2010)

  • Goldlücke, S., Kranz, S.: Infinitely repeated games with public monitoring and monetary transfers. J. Econ. Theory 147(3), 1191–1221 (2012)

    Article  Google Scholar 

  • Goldlücke, S., Kranz, S.: Renegotiation-proof relational contracts. Games Econ. Behav. 80, 157–178 (2013)

    Article  Google Scholar 

  • Goldlücke, S., Kranz, S.: Discounted Stochastic Games with Voluntary Transfers, Working Paper (2016)

  • Harrington, J.E., Skrzypacz, A.: Collusion under monitoring of sales. Rand J. Econ. 38(2), 314–331 (2007)

    Article  Google Scholar 

  • Harrington, J.E., Skrzypacz, A.: Private monitoring and communication in cartels: explaining recent collusive practices. Am. Econ. Rev. 101(6), 2425–2449 (2011)

    Article  Google Scholar 

  • Hörner, J., Sugaya, T., Takahashi, S., Vieille, N.: Recursive methods in discounted stochastic games: an algorithm for \(\delta \rightarrow 1\) and a folk theorem. Econometrica 79(4), 1277–1318 (2011)

    Article  Google Scholar 

  • Judd, K.L., Yeltekin, S., Conklin, J.: Computing supergame equilibria. Econometrica 71(4), 1239–1254 (2003)

    Article  Google Scholar 

  • Klimenko, M., Ramey, G., Watson, J.: Recurrent trade agreements and the value of external enforcement. J. Int. Econ. 74(2), 475–499 (2008)

    Article  Google Scholar 

  • Levin, J.: Multilateral contracting and the employment relationship. Q. J. Econ. 117(3), 1075–1103 (2002)

    Article  Google Scholar 

  • Levin, J.: Relational incentive contracts. Am. Econ. Rev. 93(3), 835–857 (2003)

    Article  Google Scholar 

  • MacLeod, W.B., Malcomson, J.A.: Implicit contracts, incentive compatibility, and involuntary unemployment. Econometrica 57(2), 447–480 (1989)

    Article  Google Scholar 

  • Markovich, S., Moenius, J.: Winning while losing: competition dynamics in the presence of indirect network effects. Int. J. Ind. Organ. 27(3), 346–357 (2009)

    Article  Google Scholar 

  • Miller, D.A., Watson, J.: A theory of disagreement in repeated games with bargaining. Econometrica 81(6), 2303–2350 (2013)

    Article  Google Scholar 

  • Pakes, A., McGuire, P.: Computing Markov-perfect nash equilibria: numerical implications of a dynamic differentiated product model. Rand J. Econ. 25(4), 555–589 (1994)

    Article  Google Scholar 

  • Pakes, A., McGuire, P.: Stochastic algorithms, symmetric Markov perfect equilibrium, and the “curse” of dimensionality. Econometrica 69(5), 1261–1281 (2001)

    Article  Google Scholar 

  • Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)

    Book  Google Scholar 

  • Rayo, L.: Relational incentives and moral hazard in teams. Rev. Econ. Stud. 74(3), 937–963 (2007)

    Article  Google Scholar 

  • Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  • Yeltekin, S., Cai, Y., Judd, K. L.: Computing Equilibria of Dynamic Games. Working Paper (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susanne Goldlücke.

Additional information

Support by the German Research Foundation (DFG) through SFB-TR 15 for both authors and an individual research grant for the second author is gratefully acknowledged. Sebastian Kranz would like to thank the Cowles Foundation in Yale, where part of this work was conducted, for the stimulating research environment. Further thanks go to Dirk Bergemann, An Chen, Mehmet Ekmekci, Paul Heidhues, Johannes Hörner, Jon Levin, David Miller, Larry Samuelson, Philipp Strack, Juuso Välimäki, Joel Watson and seminar participants at Arizona State University, UC San Diego and Yale for very helpful discussions.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (html 790 KB)

Supplementary material 2 (html 848 KB)

Appendix: Remaining proofs

Appendix: Remaining proofs

Proof of Theorem 2:

For each state \(x\in X\) and regime \(k\in \mathcal {K},\) condition (16) allows to choose a distribution \(u_{i}^{k}(x),\) \(i=1,\ldots ,n\), of the surplus such that

$$\begin{aligned} \sum _{i=1}^{n}u_{i}^{k}(x)=(1-\delta )\Pi (x,\alpha ^{k})+\delta E[U|x,\alpha ^{k}] \end{aligned}$$
(30)

and

$$\begin{aligned} u_{i}^{k}(x)\ge \max _{\hat{a}_{i}}(1-\delta )\pi _{i}(x,\hat{a}_{i},\alpha _{-i}^{k})+\delta E[v_{i}|x,\hat{a}_{i},\alpha _{-i}^{k}], \end{aligned}$$
(31)

holding with equality for \(i=k.\) A simple strategy profile with transfers \(p_{i}^{k}(x,\alpha ^{k}(x),x')\) achieves this distribution of payoffs if the expected transfers

$$\begin{aligned} \bar{t}_{i}^{k}(x)=(1-\delta )E[p_{i}^{k}(x,\alpha ^{k}(x),x')|x,\alpha ^{k}(x)] \end{aligned}$$

satisfy

$$\begin{aligned} \delta \bar{t}_{i}^{k}(x)=(1-\delta )\pi _{i}(x,\alpha ^{k})+\delta E[u_{i}^{e}|x,\alpha ^{k}]-u_{i}^{k}(x). \end{aligned}$$

If we define \(\bar{t}_{i}^{k}(x)\) by this condition, it holds that \(\sum _{i=1}^{n}\bar{t}_{i}^{k}(x)=0\). Moreover, it follows from condition (31) that

$$\begin{aligned} E\left[ u_{i}^{e}-v_{i}|x,\alpha ^{k}\right] \ge \bar{t}_{i}^{k}(x). \end{aligned}$$

The intuition behind this is that it is more difficult to induce an action and a subsequent expected payment afterward than to induce an expected payment. We still need to show that for each \(k\in \mathcal {K}\) and state x there exist payments \(t_{i}(x')=(1-\delta )p_{i}^{k}(x,\alpha ^{k}(x),x')\) for each state \(x'\) such that the following three conditions hold:

$$\begin{aligned}&\displaystyle t_{i}(x') \le u_{i}^{e}(x')-v_{i}(x'), \end{aligned}$$
(32)
$$\begin{aligned}&\displaystyle \sum _{i=1}^{n}t_{i}(x') = 0,\end{aligned}$$
(33)
$$\begin{aligned}&\displaystyle \sum _{_{x'}}q(x')t_{i}(x') = \bar{t}_{i}^{k}(x), \end{aligned}$$
(34)

where \(q(x')=q(x'|x,\alpha ^{k}(x))\) is the transition probability from state x to state \(x'\) if \(\alpha ^{k}(x)\) is played. We use Theorem 22.1 in Rockafellar (1970) to show that such payments exist. This theorem says that the existence of a vector with entries \(t_{i}(x')\), \(i=1,\ldots ,n\), \(x'\in X\), that satisfies the above three conditions is equivalent to the nonexistence of real numbers \(\lambda _{i}(x')\ge 0\), \(\mu (x'),\) and \(\eta _{i},\) \(i=1,\ldots ,n\), \(x'\in X\), that satisfy the following two conditions:

$$\begin{aligned}&\displaystyle \lambda _{i}(x')+\mu (x')+\eta _{i}q(x') = 0\text { for all }i,x'\end{aligned}$$
(35)
$$\begin{aligned}&\displaystyle \sum _{i,x'}\lambda _{i}(x')(u_{i}^{e}(x')-v_{i}(x'))+\sum _{i=1}^{n}\eta _{i}\bar{t_{i}}^{k}(x) < 0. \end{aligned}$$
(36)

We assume to the contrary that such a solution to (35) and (36) exists. These two conditions imply that

$$\begin{aligned} -\sum _{x'}\mu (x')\left( U(x')-\sum _{i=1}^{n}v_{i}(x')\right) +\sum _{i=1}^{n}\eta _{i}\left( \bar{t_{i}}^{k}(x)-E[u_{i}^{e}-v_{i}|x]\right) <0. \end{aligned}$$

Let \(\tilde{x}\) be a state with \(\frac{\mu (\tilde{x})}{q(\tilde{x})}\le \frac{\mu (x')}{q(x')}\) for all \(x'\in X.\) Since condition (35) holds for all \(x'\in X,\) it also holds for \(x'=\tilde{x},\) i.e., \(\eta _{i}=-\frac{\lambda _{i}(\tilde{x})+\mu (\tilde{x})}{q(\tilde{x})}.\) Hence, it follows that

$$\begin{aligned}&\sum _{x'}\left( \frac{\mu (\tilde{x})q(x')}{q(\tilde{x})}-\mu (x')\right) \left( U(x')-\sum _{i=1}^{n}v_{i}(x')\right) \nonumber \\&\quad +\sum _{i=1}^{n}\frac{\lambda _{i}(\tilde{x})}{q(\tilde{x})}\left( E[u_{i}^{e}-v_{i}|x]-\bar{t_{i}}^{k}(x)\right) <0. \end{aligned}$$

This implies

$$\begin{aligned} \sum _{x'}\left( \frac{\mu (\tilde{x})q(x')}{q(\tilde{x})}-\mu (x')\right) \left( U(x')-\sum _{i=1}^{n}v_{i}(x')\right) <0. \end{aligned}$$

By definition of \(\tilde{x}\) and because of condition (15), the expression on the left-hand side is nonnegative. Hence, we arrived at a contradiction, which means that the system given by (32), (33), and (34) must have a solution, and we can define payments \((1-\delta )p_{i}^{k}(x,\alpha ^{k}(x),x')=t_{i}(x')\).

It remains to define the payments following a unilateral deviation. For any combination of states x,\(x^{\prime }\) and signal y with \(y_{i}\ne \alpha _{i}^{k}(x)\) and \(y_{-i}=\alpha _{-i}^{k}(x)\) we choose payments

$$\begin{aligned} (1-\delta )p_{i}^{k}(x,y,x^{\prime })=u_{i}^{e}(x^{\prime })-v_{i}(x^{\prime }), \end{aligned}$$
(37)

such that continuation payoffs after a deviation in the action stage are indeed given by \(v_{i}\). Payments for players other than i can be defined such that

$$\begin{aligned} (1-\delta )p_{j}^{k}(x,y,x^{\prime })\le u_{j}^{e}(x^{\prime })-v_{j}(x^{\prime }) \end{aligned}$$

and

$$\begin{aligned} \sum _{j=1}^{n}p_{j}^{k}(x,y,x^{\prime })=0, \end{aligned}$$

using condition (15).

Now we have to show that the so defined simple strategy profile is indeed a PPE. The budget and payment constraints are satisfied by definition. The relevant action constraints are satisfied because of inequality (31). Moreover, it holds by definition that \(u_{i}^{i}(x)=v_{i}(x),\) and since there is no money burning, \(U^{e}(x)=U(x).\) The payment plan that we have defined is an optimal payment plan, because U(x) is an upper bound of the equilibrium joint payoff \(U^{e}(x)\), and similarly

$$\begin{aligned} u_{i}^{i}(x)\ge \max _{a_{i}}(1-\delta )\pi _{i}(x,a_{i},\alpha _{-i}^{i})+\delta E[u_{i}^{i}|x,a_{i},\alpha _{-i}^{i}] \end{aligned}$$

implies that \(v_{i}(x)\) is a lower bound of the punishment payoff \(u_{i}^{i}(x).\) Thus, we have shown the “if” part of the Theorem. The “only if” part is straightforward. \(\square \)

Proof of Proposition 3:

For a given policy \(\alpha ,\) let \(C_{i}^{\alpha }\) be an operator mapping the set of punishment payoffs into itself defined by

$$\begin{aligned} C_{i}^{\alpha }(v_{i})[x]=c_{i}(x,\alpha (x),v_{i}) \end{aligned}$$

It can be easily verified that \(C_{i}^{\alpha }\) is a contraction-mapping operator. It follows from the contraction-mapping theorem that player i’s best-reply payoffs are given by the unique fixed point of \(C_{i}^{\alpha }\), which we denote by \(v_{i}(\alpha )\). This means

$$\begin{aligned} v_{i}(\alpha )=C_{i}^{\alpha }(v_{i}(\alpha )) \end{aligned}$$
(38)

It is a well-known result that the operator \(C_{i}^{\alpha }\) is monotone:

$$\begin{aligned} v_{i}\le \tilde{v}_{i}\Rightarrow C_{i}^{\alpha }(v_{i})\le C_{i}^{\alpha }(\tilde{v}_{i}) \end{aligned}$$
(39)

where \(v_{i}\le \tilde{v}_{i}\) is defined as \(v_{i}(x)\le \tilde{v}_{i}(x)\forall x\in X\). We denote by \([C_{i}^{\alpha }]^{k}\) the operator that applies k times \(C_{i}^{\alpha }\) and define its limit by

$$\begin{aligned}{}[C_{i}^{\alpha }]^{\infty }=\lim _{k\rightarrow \infty }[C_{i}^{\alpha }]^{k}. \end{aligned}$$

The contraction-mapping theorem implies that \([C_{i}^{\alpha }]^{\infty }\) is well defined and transforms every payoff function v into the fixed point of \(C_{i}^{\alpha }\), i.e.,

$$\begin{aligned}{}[C_{i}^{\alpha }]^{\infty }(v)=v(\alpha ) \end{aligned}$$
(40)

Furthermore, it follows from monotonicity of \(C_{i}^{\alpha }\) that

$$\begin{aligned} C_{i}^{\alpha }(v_{i})\le v_{i}\Rightarrow [C_{i}^{\alpha }]^{\infty }(v_{i})\le v_{i} \end{aligned}$$
(41)

and

$$\begin{aligned} C_{i}^{\alpha }(v_{i})<v_{i}\Rightarrow [C_{i}^{\alpha }]^{\infty }(v_{i})<v_{i} \end{aligned}$$
(42)

where two payoff functions \(u_{i}\) and \(\tilde{u}_{i}\) satisfy \(u_{i}<\tilde{u}_{i}\) if \(u_{i}\le \tilde{u}_{i}\) and \(u_{i}\ne \tilde{u}_{i}\).

We now show that for any two policies a and \(\tilde{a}\) the following monotonicity results hold

$$\begin{aligned}&\displaystyle C_{i}^{\alpha }(v(\alpha ))=C_{i}^{\tilde{\alpha }}(v(\alpha )) \Rightarrow v(\alpha )=v(\tilde{\alpha }) \end{aligned}$$
(43)
$$\begin{aligned}&\displaystyle C_{i}^{\alpha }(v(\alpha ))>C_{i}^{\tilde{\alpha }}(v(\alpha )) \Rightarrow v(\alpha )>v(\tilde{\alpha }) \end{aligned}$$
(44)
$$\begin{aligned}&\displaystyle v(\alpha )\nleq v(\tilde{\alpha }) \Rightarrow C_{i}^{\alpha }(v(\alpha ))\nleq C_{i}^{\tilde{\alpha }}(v(\alpha )) \end{aligned}$$
(45)

We exemplify the proof for (44). It follows from (38), the left part of (44), (41) and (40) that

$$\begin{aligned} v(\alpha )=C_{i}^{\alpha }(v(\alpha ))>C_{i}^{\tilde{\alpha }}(v(\alpha ))\ge \left[ C_{i}^{\tilde{\alpha }}\right] ^{\infty }(v(\alpha ))=v(\tilde{\alpha }). \end{aligned}$$

Equation 43 and can be proven similarly. To prove (45), assume that there is some \(\tilde{\alpha }\) with \(C_{i}^{\alpha }(v)\le C_{i}^{\tilde{\alpha }}(v)\) but \(\tilde{v}\ngeq v\). We find

$$\begin{aligned} v=C_{i}^{\alpha }(v)\le C_{i}^{\tilde{\alpha }}(v)\le \left( C_{i}^{\tilde{\alpha }}\right) ^{\infty }(v)=\tilde{v} \end{aligned}$$

which contradicts the assumption \(\tilde{v}\ngeq v\).

Intuitively, these monotonicity properties of the cheating payoff operator are crucial for why the algorithm works. If one wants to find out whether a policy \(\tilde{\alpha }\) can yield lower punishment payoffs for player i than a policy \(\alpha \), one does not have to solve player i’s Markov decision process under policy \(\tilde{\alpha }\). It suffices to check whether for some state x the cheating payoffs given policy \(\tilde{\alpha }\) and punishment payoffs \(v(\alpha )\) are lower than \(v(\alpha )(x)\). If this is not the case for any admissible policy \(\tilde{\alpha }\), then a policy \(\alpha \) is an optimal punishment policy, in the sense that it minimizes player i’s punishment payoffs in every state.

The fixed point condition (38) of the value determination step and the policy improvement step (20) imply that \(v^{r}=C_{i}^{\alpha ^{r}}(v^{r})\ge C_{i}^{\alpha ^{r+1}}(v^{r})\). We first establish that if

$$\begin{aligned} v^{r}=C_{i}^{\alpha ^{r}}(v^{r})=C_{i}^{\alpha ^{r+1}}(v^{r}). \end{aligned}$$
(46)

then we have \(v_{i}^{r}=\hat{v}_{i}\). For a proof by contradiction, assume that condition holds for some r but that there exists a policy \(\hat{\alpha }\) such that \(v(\alpha ^{r})\nleq v(\hat{\alpha }),\) i.e., \(\hat{\alpha }\) leads in at least some state x to a strictly lower best-reply payoff for player i than \(\alpha ^{r}\). By (45) this would imply \(C_{i}^{\alpha ^{r}}(v^{r})\nless C_{i}^{\tilde{\alpha }}(v^{r})\). This means that \(\hat{\alpha }\) must also be a solution to the policy improvement step and since (46) holds, we then must have

$$\begin{aligned} C_{i}^{\alpha ^{r}}(v^{r})=C_{i}^{\hat{\alpha }}(v^{r}) \end{aligned}$$

However, (43) then implies that \(v(\alpha ^{r})=v(\hat{\alpha })\), which contradicts the assumption \(v(\alpha ^{r})\nleq v(\hat{\alpha })\). Thus, if the algorithm stops in a round R, we indeed have \(v^{R}=\hat{v}_{i}\).

If the algorithm does not stop in round r, it must be the case that \(v^{r}=C_{i}^{\alpha ^{r}}(v^{r})>C_{i}^{\alpha ^{r+1}}(v^{r})\). (44) then directly implies the monotonicity result \(v^{r}>v^{r+1}\). The algorithm always stops in a finite number of rounds since the number of policies is finite and there are no cycles because of the monotonicity result. \(\square \)

Proof of Corollary 1:

Assume there exists a simple equilibrium with action plan \((\alpha _{2}^{k})_{k}\) and with optimal payment plan \((p^{k})_{k}\) such that joint payoffs are U (which implies that \(p_{1}^{e}+p_{2}^{e}=0)\) and punishment payoffs are \(v_{1}\) and \(\bar{v}_{2}.\) Define

$$\begin{aligned} t^{k}(x,y,x')=\frac{\delta }{1-\delta }(u_{2}^{e}(x')-(1-\delta )p_{2}^{k}(x,y,x')-\bar{v}_{2}). \end{aligned}$$

The conditions (PC-k) imply that \(t^{k}(x,y,x')\ge 0\) and

$$\begin{aligned} t^{k}(x,y,x')\le \frac{\delta }{1-\delta }(U(x')-v_{1}(x')-\bar{v}_{2}). \end{aligned}$$

Moreover, the conditions (AC-k),

$$\begin{aligned} \alpha _{2}^{k}(x)\in \arg \max _{\tilde{a}}(1-\delta )\pi _{2}(x,\tilde{a})+\delta E[u_{2}^{e}(x')-(1-\delta )p_{2}^{k}(x,y,x')|x,\tilde{a}], \end{aligned}$$

imply that \(\alpha _{2}^{k}(x)\in \arg \max _{\tilde{a}}\pi _{2}(x,\tilde{a})+E[t^{k}|x,\tilde{a}]\).

For the other direction, assume that there exist payments \(t^{k}(x,y,x')\) as in the Proposition and define

$$\begin{aligned} \tilde{u}_{2}^{e}(x)=(1-\delta )\pi _{2}(x,\alpha ^{e})+E[(1-\delta )t^{e}(x,y,x')+\delta \bar{v}_{2}|x,\alpha ^{e}] \end{aligned}$$

and

$$\begin{aligned} (1-\delta )p_{2}^{e}(x,y,x')=\tilde{u}_{2}^{e}(x')-\frac{1-\delta }{\delta }t^{e}(x,y,x')-\bar{v}_{2} \end{aligned}$$

as well as \(p_{1}^{e}=-p_{2}^{e}\), such that \(U^{e}(x)=U(x)\) and (BC-e) hold by definition, and \(\tilde{u}_{2}^{e}(x)=u_{2}^{e}(x).\) Moreover, (PC-e) for the agent holds because

$$\begin{aligned} u_{2}^{e}(x')-(1-\delta )p_{2}^{e}(x,y,x')=\frac{1-\delta }{\delta }t^{e}(x,y,x')+\bar{v}_{2}\ge \bar{v}_{2}, \end{aligned}$$

and for the principal because

$$\begin{aligned} u_{1}^{e}(x')-(1-\delta )p_{1}^{e}(x,y,x')=U(x')-\frac{1-\delta }{\delta }t^{e}(x,y,x')-\bar{v}_{2}\ge v_{1}(x'). \end{aligned}$$

Further define

$$\begin{aligned} (1-\delta )p_{2}^{2}(x,y,x')=u_{2}^{e}(x')-\bar{v}_{2} \end{aligned}$$

and \(p_{1}^{2}=-p_{2}^{2},\) such that \(u_{2}^{2}(x)=\bar{v}_{2}\) and (BC-2) and (PC-2) for the agent hold by definition, and for the principal because \(U(x')\ge v_{1}(x')+\bar{v}_{2}\). Finally, define

$$\begin{aligned} (1-\delta )p_{1}^{1}(x,y,x')=u_{1}^{e}(x')-v_{1}(x') \end{aligned}$$

and

$$\begin{aligned} (1-\delta )p_{2}^{1}(x,y,x')=u_{2}^{e}(x')-\frac{1-\delta }{\delta }t^{1}(x,y,x')-\bar{v}_{2}, \end{aligned}$$

such that \(u_{1}^{1}(x)=v_{1}(x)\) and (PC-1) for the agent hold by definition, and for the principal for the same reason as (PC-e). The condition (BC-1) then holds as well, since it is equivalent to \(U(x')-v_{1}(x')-\bar{v}_{2}\ge \frac{1-\delta }{\delta }t^{1}(x,y,x')\). The action constraints follow from condition (24) since the payments were defined such that the agent’s continuation payoff is equal to \(\frac{(1-\delta )}{\delta }t\) plus a constant.

In case of perfect monitoring, any deviation is punished by withholding transfers while following prescribed play is rewarded by the maximum payment. If there is perfect monitoring in state x,  the condition therefore is

$$\begin{aligned} \pi _{2}(x,a_{2}^{k})+\frac{\delta }{1-\delta }E[U-v_{1}-\bar{v}_{2}|x,a^{k}]\ge \max _{\tilde{a}_{2}}\pi _{2}(x,\tilde{a}_{2}) \end{aligned}$$

which can be rearranged to equal condition (25). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goldlücke, S., Kranz, S. Discounted stochastic games with voluntary transfers. Econ Theory 66, 235–263 (2018). https://doi.org/10.1007/s00199-017-1060-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00199-017-1060-1

Keywords

JEL Classification

Navigation