Abstract
This paper studies discounted stochastic games with perfect or imperfect public monitoring and the opportunity to conduct voluntary monetary transfers and possibly burn money. This generalization of repeated games with transfers is ideally suited to study relational contracting in applications with long-term investments and also allows to study collusive industry dynamics. We show that for all discount factors every perfect public equilibrium payoff can be implemented with a class of simple equilibria that have a stationary structure on the equilibrium path and optimal penal codes with a stick-and-carrot structure. We develop an algorithm for perfect monitoring to compute the set of equilibrium payoffs and find simple equilibria that implement these payoffs.
Similar content being viewed by others
Notes
Naturally, the presence of transfers also simplifies computation in the case of imperfect monitoring. The operator from Abreu et al. (1990), which works on payoffs sets and not on policies, needs as input only \(n + 1\) values for every state (see the working paper Goldlücke and Kranz 2016, for details).
Characterizing the SPE or PPE payoff set can be challenging even in the limit case of the discount factor converging toward 1. While Dutta (1995) established a folk theorem for perfect monitoring, folk theorems for imperfect public monitoring have been derived more recently by Fudenberg and Yamamoto (2011) and Hörner et al. (2011) for irreducible stochastic games.
Besanko et al. (2010) illustrate the multiplicity problem and show how the homotopy method can be used to find multiple MPE. There is, however, still no guarantee that all (pure) MPE are found.
Most of our results (Propositions 1 and 2, Theorems 1 and 2) also hold for the case that A(x) is a compact set in \(\mathbb {\mathbb {R}}^{m}\), for some m, always with the restriction to pure strategies. If the action space in state x is not finite, we assume in addition that stage game payoffs and the probability distribution of signals and new states are continuous functions of the action profile.
We assume that the game is described in a parsimonious way such that there is no state that cannot be reached at all from the initial state by some sequence of actions.
These sets depend on the discount factor, but since the discount factor is fixed, we do not make this dependence explicit. Although the initial state \(x_{0}\) is also fixed, this dependence is made explicit since the set of possible continuation payoffs of a PPE following a history that ends in state x is equal to \(\mathcal {U}(x).\)
In a simple strategy profile, no player makes and receives positive transfers at the same time. Any vector of net payments p can be mapped into a \(n\times (n+1)\)-matrix of gross transfers \(\tilde{p}_{ij}\) (= payment from i to j) as follows. Denote by \(I_{P}=\{i|p_{i}>0\}\) the set of net payers and by \(I_{R}=\{i|p_{i}\le 0\}\cup \{0\}\) the set of net receivers including the sink for burned money indexed by 0. For any receiver \(j\in I_{R}\), we denote by
$$\begin{aligned} s_{j}=\frac{|p_{j}|}{\sum _{j\in I_{R}}|p_{j}|} \end{aligned}$$the share she receives from the total amount that is transferred or burned and assume that each net payer distributes her gross transfers according to these proportions
$$\begin{aligned} \tilde{p}_{ij}=\left\{ \begin{array}{ll} {s_{j}p_{i}} &{} \hbox {if } {i\in I_{P}}\hbox { and } {j\in I_{R}}\\ 0 &{} \hbox {otherwise}. \end{array}\right. \end{aligned}$$This is the key intuition for why optimal penal codes can be characterized with stick-and-carrot punishments with a single punishment action profile per player and state. See Abreu (1986) for an early example of stick-and-carrot punishments as well as Acemoglu and Wolitzky (2015) for a recent paper on community enforcement, who show that a specialized enforcer punishment will be used for exactly one period in combination with the less efficient community punishment. In their setting, the punishment power of an inefficient path of play is linked via an incentive constraint to the more efficient enforcer punishment, while in our setting, it is linked to the perfectly efficient punishment of paying a fine.
This condition has a unique solution since the transition matrix has eigenvalues with absolute value no larger than 1. The solution is given by \(U=(1-\delta )(\mathrm {I}-\delta Q(\alpha ^{e}))^{-1}\Pi (\alpha ^{e}),\) where \(Q(\alpha ^{e})\) is the transition matrix given that players follow the policy \(\alpha ^{e}\).
For details on policy iteration, convergence speed and alternative computation methods to solve Markov Decision Processes, see, e.g., Puterman (1994).
For an example, consider the Cournot game described in Sect. 4.3 below. It has 21 * 21 = 441 states and, depending on the state, a player has between 0 to 20 different stage game actions. If we punish player 1, the number of potentially relevant pure strategy punishment policies a brute-force algorithm has to search is given by the number of pure Markov strategies of player 2. Here, each player has \(\prod _{m_{1}=1}^{20}\prod _{m_{2}=0}^{20}m_{1}=(20!)^{21}\) different pure Markov strategies. This is an incredible large number and renders a brute-force approach infeasible. Yet, in no iteration of the outer loop, does Algorithm 1 need more than just 4 rounds to find an optimal punishment policy.
For a theoretical upper bound, we note that in each iteration in which the algorithm does not stop at least one action profile in at least one state is eliminated. Yet in practice, much fewer iterations are needed, e.g., only 8 iterations in the example in Sect. 4.3.
The replication code is in the file https://github.com/skranz/dyngame/blob/master/examples/article/4_3_Cournot.Rmd.
ABS have provided a well-documented open source C++ implementation of their algorithm (see http://babrooks.github.io/SGSolve/). We have written an R interface to their library (see https://github.com/skranz/RSGSolve) and included some functionality in our dyngame package that allows to quickly compare the solutions and algorithmic performance for stochastic games with and without transfers. The replication code of this analysis is in the file https://github.com/skranz/dyngame/blob/master/examples/article/4_4_ABS_vs_GK.Rmd.
That transfers do not change the critical discount factor is due to two facts: i) the game and optimal equilibrium strategies are symmetric, i.e., no transfers are needed on the equilibrium path to smooth incentives constraints and ii) the harshest punishment is the MPE of always playing (D,D), which can also be implemented without transfers.
Note that every equilibrium payoff without transfers can also be implement in the corresponding game with transfers. Yet, looking precisely in Fig. 5, one sees that one corner point of ABS’s payoff set even lies slightly outside our payoff set for the game with transfers. This result can be due to the fact that ABS, like also Yeltekin et al. (2015), only compute an outer approximation of the equilibrium payoff set.
The ABS algorithm can be customized with several parameters, e.g., 13 parameters that specify different types of tolerances. We have run it with the default parameters that ABS specify in their code. In some cases, the ABS algorithm did not find a solution, with the error code: “Caught the following exception: bestAction==NULL. Could not find an admissible direction.” With different specifications of the tolerances a solution could be found in some of those cases. We decided to stick with the default configuration for creating this table, however.
Although those results were derived only for a finite action space, they go through also for compact subsets of \(\mathbb {R}^{m}\).
References
Abreu, D.: Extremal equilibria of oligopolistic supergames. J. Econ. Theory 39(1), 191–225 (1986)
Abreu, D.: On the theory of infinitely repeated games with discounting. Econometrica 56(2), 383–396 (1988)
Abreu, D., Brooks, B., Sannikov, Y.: A ’Pencil Sharpening’ algorithm for two player stochastic games with perfect monitoring (February 11, 2016). Princeton University William S. Dietrich II Economic Theory Center Research Paper No. 078_2016 (2016)
Abreu, D., Pearce, D., Stacchetti, E.: Toward a theory of discounted repeated games with imperfect monitoring. Econometrica 58(5), 1041–1063 (1990)
Abreu, D., Sannikov, Y.: An algorithm for two player repeated games with perfect monitoring. Theor. Econ. 9, 313–338 (2014)
Acemoglu, D., Wolitzky A.: Sustaining Cooperation: Community Enforcement vs. Specialized Enforcement. NBER Working Paper No. 21457 (2015)
Baliga, S., Evans, R.: Renegotiation in repeated games with side-payments. Games Econ. Behav. 33(2), 159–176 (2000)
Benkard, C.L.: Learning and forgetting: the dynamics of aircraft production. Am. Econ. Rev. 90(4), 1034–1054 (2000)
Besanko, D., et al.: Learning-by-doing, organizational forgetting, and industry dynamics. Econometrica 78(2), 453–508 (2010)
Besanko, D., Doraszelski, U.: Capacity dynamics and endogenous asymmetries in firm size. Rand J. Econ. 35(1), 23–49 (2004)
Ching, A.T.: A dynamic oligopoly structural model for the prescription drug market after patent expiration. Int. Econ. Rev. 51(4), 1175–1207 (2010)
Doornik, K.: Relational contracting in partnerships. J. Econ. Manag. Strategy 15(2), 517–548 (2006)
Doraszelski, U., Markovich, S.: Advertising dynamics and competitive advantage. Rand J. Econ. 38(3), 557–592 (2007)
Dutta, P.K.: A folk theorem for stochastic games. J. Econ. Theory 66(1), 1–32 (1995)
Fong, Y., Surti, J.: On the optimal degree of cooperation in the repeated Prisoner’s Dilemma with side payments. Games Econ. Behav. 67(1), 277–291 (2009)
Fudenberg, D., Yamamoto, Y.: The folk theorem for irreducible stochastic games with imperfect public monitoring. J. Econ. Theory 146, 1664–1683 (2011)
Gjertsen H., Groves T., Miller D., Niesten E., Squires D., Watson J.: A Contract-Theoretic Model of Conservation Agreements, mimeo (2010)
Goldlücke, S., Kranz, S.: Infinitely repeated games with public monitoring and monetary transfers. J. Econ. Theory 147(3), 1191–1221 (2012)
Goldlücke, S., Kranz, S.: Renegotiation-proof relational contracts. Games Econ. Behav. 80, 157–178 (2013)
Goldlücke, S., Kranz, S.: Discounted Stochastic Games with Voluntary Transfers, Working Paper (2016)
Harrington, J.E., Skrzypacz, A.: Collusion under monitoring of sales. Rand J. Econ. 38(2), 314–331 (2007)
Harrington, J.E., Skrzypacz, A.: Private monitoring and communication in cartels: explaining recent collusive practices. Am. Econ. Rev. 101(6), 2425–2449 (2011)
Hörner, J., Sugaya, T., Takahashi, S., Vieille, N.: Recursive methods in discounted stochastic games: an algorithm for \(\delta \rightarrow 1\) and a folk theorem. Econometrica 79(4), 1277–1318 (2011)
Judd, K.L., Yeltekin, S., Conklin, J.: Computing supergame equilibria. Econometrica 71(4), 1239–1254 (2003)
Klimenko, M., Ramey, G., Watson, J.: Recurrent trade agreements and the value of external enforcement. J. Int. Econ. 74(2), 475–499 (2008)
Levin, J.: Multilateral contracting and the employment relationship. Q. J. Econ. 117(3), 1075–1103 (2002)
Levin, J.: Relational incentive contracts. Am. Econ. Rev. 93(3), 835–857 (2003)
MacLeod, W.B., Malcomson, J.A.: Implicit contracts, incentive compatibility, and involuntary unemployment. Econometrica 57(2), 447–480 (1989)
Markovich, S., Moenius, J.: Winning while losing: competition dynamics in the presence of indirect network effects. Int. J. Ind. Organ. 27(3), 346–357 (2009)
Miller, D.A., Watson, J.: A theory of disagreement in repeated games with bargaining. Econometrica 81(6), 2303–2350 (2013)
Pakes, A., McGuire, P.: Computing Markov-perfect nash equilibria: numerical implications of a dynamic differentiated product model. Rand J. Econ. 25(4), 555–589 (1994)
Pakes, A., McGuire, P.: Stochastic algorithms, symmetric Markov perfect equilibrium, and the “curse” of dimensionality. Econometrica 69(5), 1261–1281 (2001)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Rayo, L.: Relational incentives and moral hazard in teams. Rev. Econ. Stud. 74(3), 937–963 (2007)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Yeltekin, S., Cai, Y., Judd, K. L.: Computing Equilibria of Dynamic Games. Working Paper (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
Support by the German Research Foundation (DFG) through SFB-TR 15 for both authors and an individual research grant for the second author is gratefully acknowledged. Sebastian Kranz would like to thank the Cowles Foundation in Yale, where part of this work was conducted, for the stimulating research environment. Further thanks go to Dirk Bergemann, An Chen, Mehmet Ekmekci, Paul Heidhues, Johannes Hörner, Jon Levin, David Miller, Larry Samuelson, Philipp Strack, Juuso Välimäki, Joel Watson and seminar participants at Arizona State University, UC San Diego and Yale for very helpful discussions.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Remaining proofs
Appendix: Remaining proofs
Proof of Theorem 2:
For each state \(x\in X\) and regime \(k\in \mathcal {K},\) condition (16) allows to choose a distribution \(u_{i}^{k}(x),\) \(i=1,\ldots ,n\), of the surplus such that
and
holding with equality for \(i=k.\) A simple strategy profile with transfers \(p_{i}^{k}(x,\alpha ^{k}(x),x')\) achieves this distribution of payoffs if the expected transfers
satisfy
If we define \(\bar{t}_{i}^{k}(x)\) by this condition, it holds that \(\sum _{i=1}^{n}\bar{t}_{i}^{k}(x)=0\). Moreover, it follows from condition (31) that
The intuition behind this is that it is more difficult to induce an action and a subsequent expected payment afterward than to induce an expected payment. We still need to show that for each \(k\in \mathcal {K}\) and state x there exist payments \(t_{i}(x')=(1-\delta )p_{i}^{k}(x,\alpha ^{k}(x),x')\) for each state \(x'\) such that the following three conditions hold:
where \(q(x')=q(x'|x,\alpha ^{k}(x))\) is the transition probability from state x to state \(x'\) if \(\alpha ^{k}(x)\) is played. We use Theorem 22.1 in Rockafellar (1970) to show that such payments exist. This theorem says that the existence of a vector with entries \(t_{i}(x')\), \(i=1,\ldots ,n\), \(x'\in X\), that satisfies the above three conditions is equivalent to the nonexistence of real numbers \(\lambda _{i}(x')\ge 0\), \(\mu (x'),\) and \(\eta _{i},\) \(i=1,\ldots ,n\), \(x'\in X\), that satisfy the following two conditions:
We assume to the contrary that such a solution to (35) and (36) exists. These two conditions imply that
Let \(\tilde{x}\) be a state with \(\frac{\mu (\tilde{x})}{q(\tilde{x})}\le \frac{\mu (x')}{q(x')}\) for all \(x'\in X.\) Since condition (35) holds for all \(x'\in X,\) it also holds for \(x'=\tilde{x},\) i.e., \(\eta _{i}=-\frac{\lambda _{i}(\tilde{x})+\mu (\tilde{x})}{q(\tilde{x})}.\) Hence, it follows that
This implies
By definition of \(\tilde{x}\) and because of condition (15), the expression on the left-hand side is nonnegative. Hence, we arrived at a contradiction, which means that the system given by (32), (33), and (34) must have a solution, and we can define payments \((1-\delta )p_{i}^{k}(x,\alpha ^{k}(x),x')=t_{i}(x')\).
It remains to define the payments following a unilateral deviation. For any combination of states x,\(x^{\prime }\) and signal y with \(y_{i}\ne \alpha _{i}^{k}(x)\) and \(y_{-i}=\alpha _{-i}^{k}(x)\) we choose payments
such that continuation payoffs after a deviation in the action stage are indeed given by \(v_{i}\). Payments for players other than i can be defined such that
and
using condition (15).
Now we have to show that the so defined simple strategy profile is indeed a PPE. The budget and payment constraints are satisfied by definition. The relevant action constraints are satisfied because of inequality (31). Moreover, it holds by definition that \(u_{i}^{i}(x)=v_{i}(x),\) and since there is no money burning, \(U^{e}(x)=U(x).\) The payment plan that we have defined is an optimal payment plan, because U(x) is an upper bound of the equilibrium joint payoff \(U^{e}(x)\), and similarly
implies that \(v_{i}(x)\) is a lower bound of the punishment payoff \(u_{i}^{i}(x).\) Thus, we have shown the “if” part of the Theorem. The “only if” part is straightforward. \(\square \)
Proof of Proposition 3:
For a given policy \(\alpha ,\) let \(C_{i}^{\alpha }\) be an operator mapping the set of punishment payoffs into itself defined by
It can be easily verified that \(C_{i}^{\alpha }\) is a contraction-mapping operator. It follows from the contraction-mapping theorem that player i’s best-reply payoffs are given by the unique fixed point of \(C_{i}^{\alpha }\), which we denote by \(v_{i}(\alpha )\). This means
It is a well-known result that the operator \(C_{i}^{\alpha }\) is monotone:
where \(v_{i}\le \tilde{v}_{i}\) is defined as \(v_{i}(x)\le \tilde{v}_{i}(x)\forall x\in X\). We denote by \([C_{i}^{\alpha }]^{k}\) the operator that applies k times \(C_{i}^{\alpha }\) and define its limit by
The contraction-mapping theorem implies that \([C_{i}^{\alpha }]^{\infty }\) is well defined and transforms every payoff function v into the fixed point of \(C_{i}^{\alpha }\), i.e.,
Furthermore, it follows from monotonicity of \(C_{i}^{\alpha }\) that
and
where two payoff functions \(u_{i}\) and \(\tilde{u}_{i}\) satisfy \(u_{i}<\tilde{u}_{i}\) if \(u_{i}\le \tilde{u}_{i}\) and \(u_{i}\ne \tilde{u}_{i}\).
We now show that for any two policies a and \(\tilde{a}\) the following monotonicity results hold
We exemplify the proof for (44). It follows from (38), the left part of (44), (41) and (40) that
Equation 43 and can be proven similarly. To prove (45), assume that there is some \(\tilde{\alpha }\) with \(C_{i}^{\alpha }(v)\le C_{i}^{\tilde{\alpha }}(v)\) but \(\tilde{v}\ngeq v\). We find
which contradicts the assumption \(\tilde{v}\ngeq v\).
Intuitively, these monotonicity properties of the cheating payoff operator are crucial for why the algorithm works. If one wants to find out whether a policy \(\tilde{\alpha }\) can yield lower punishment payoffs for player i than a policy \(\alpha \), one does not have to solve player i’s Markov decision process under policy \(\tilde{\alpha }\). It suffices to check whether for some state x the cheating payoffs given policy \(\tilde{\alpha }\) and punishment payoffs \(v(\alpha )\) are lower than \(v(\alpha )(x)\). If this is not the case for any admissible policy \(\tilde{\alpha }\), then a policy \(\alpha \) is an optimal punishment policy, in the sense that it minimizes player i’s punishment payoffs in every state.
The fixed point condition (38) of the value determination step and the policy improvement step (20) imply that \(v^{r}=C_{i}^{\alpha ^{r}}(v^{r})\ge C_{i}^{\alpha ^{r+1}}(v^{r})\). We first establish that if
then we have \(v_{i}^{r}=\hat{v}_{i}\). For a proof by contradiction, assume that condition holds for some r but that there exists a policy \(\hat{\alpha }\) such that \(v(\alpha ^{r})\nleq v(\hat{\alpha }),\) i.e., \(\hat{\alpha }\) leads in at least some state x to a strictly lower best-reply payoff for player i than \(\alpha ^{r}\). By (45) this would imply \(C_{i}^{\alpha ^{r}}(v^{r})\nless C_{i}^{\tilde{\alpha }}(v^{r})\). This means that \(\hat{\alpha }\) must also be a solution to the policy improvement step and since (46) holds, we then must have
However, (43) then implies that \(v(\alpha ^{r})=v(\hat{\alpha })\), which contradicts the assumption \(v(\alpha ^{r})\nleq v(\hat{\alpha })\). Thus, if the algorithm stops in a round R, we indeed have \(v^{R}=\hat{v}_{i}\).
If the algorithm does not stop in round r, it must be the case that \(v^{r}=C_{i}^{\alpha ^{r}}(v^{r})>C_{i}^{\alpha ^{r+1}}(v^{r})\). (44) then directly implies the monotonicity result \(v^{r}>v^{r+1}\). The algorithm always stops in a finite number of rounds since the number of policies is finite and there are no cycles because of the monotonicity result. \(\square \)
Proof of Corollary 1:
Assume there exists a simple equilibrium with action plan \((\alpha _{2}^{k})_{k}\) and with optimal payment plan \((p^{k})_{k}\) such that joint payoffs are U (which implies that \(p_{1}^{e}+p_{2}^{e}=0)\) and punishment payoffs are \(v_{1}\) and \(\bar{v}_{2}.\) Define
The conditions (PC-k) imply that \(t^{k}(x,y,x')\ge 0\) and
Moreover, the conditions (AC-k),
imply that \(\alpha _{2}^{k}(x)\in \arg \max _{\tilde{a}}\pi _{2}(x,\tilde{a})+E[t^{k}|x,\tilde{a}]\).
For the other direction, assume that there exist payments \(t^{k}(x,y,x')\) as in the Proposition and define
and
as well as \(p_{1}^{e}=-p_{2}^{e}\), such that \(U^{e}(x)=U(x)\) and (BC-e) hold by definition, and \(\tilde{u}_{2}^{e}(x)=u_{2}^{e}(x).\) Moreover, (PC-e) for the agent holds because
and for the principal because
Further define
and \(p_{1}^{2}=-p_{2}^{2},\) such that \(u_{2}^{2}(x)=\bar{v}_{2}\) and (BC-2) and (PC-2) for the agent hold by definition, and for the principal because \(U(x')\ge v_{1}(x')+\bar{v}_{2}\). Finally, define
and
such that \(u_{1}^{1}(x)=v_{1}(x)\) and (PC-1) for the agent hold by definition, and for the principal for the same reason as (PC-e). The condition (BC-1) then holds as well, since it is equivalent to \(U(x')-v_{1}(x')-\bar{v}_{2}\ge \frac{1-\delta }{\delta }t^{1}(x,y,x')\). The action constraints follow from condition (24) since the payments were defined such that the agent’s continuation payoff is equal to \(\frac{(1-\delta )}{\delta }t\) plus a constant.
In case of perfect monitoring, any deviation is punished by withholding transfers while following prescribed play is rewarded by the maximum payment. If there is perfect monitoring in state x, the condition therefore is
which can be rearranged to equal condition (25). \(\square \)
Rights and permissions
About this article
Cite this article
Goldlücke, S., Kranz, S. Discounted stochastic games with voluntary transfers. Econ Theory 66, 235–263 (2018). https://doi.org/10.1007/s00199-017-1060-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00199-017-1060-1
Keywords
- Dynamic games
- Relational contracting
- Monetary transfers
- Computation
- Imperfect public monitoring
- Perfect public equilibria