Skip to main content
Log in

Stability and cooperative solution in stochastic games

  • Published:
Theory and Decision Aims and scope Submit manuscript

Abstract

This paper analyses the principles of stable cooperation for stochastic games. Starting from the non-cooperative version of a discounted, non zero-sum stochastic game, we build its cooperative form and find the cooperative solution. We then analyse the conditions under which this solution is stable. Principles of stability include subgame consistency, strategic stability and irrational behaviour proof of the cooperative solution. We finally discuss the existence of a stable cooperative solution, and consider a type of stochastic games for which the cooperative solution is found and the principles of stable cooperation are checked.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. From now on, we use the notation \(\eta _{i}\) if player i uses the stationary strategy in the game. When a player i uses a behaviour strategy (not necessarily stationary), we use the notation \(\varphi _{i}\).

  2. Without loss of generality, we may find the maximum in Eq. (3) over the set of pure actions of coalition N.

  3. The existence of the minmax value of two-player discounted stochastic game is proved by Shapley (1953a).

  4. In Eq. (6), the maximum in \(\min _{\hat{\eta }_{N\backslash S}}\max _{\eta _{S}}\sum _{i\in S}E_{i}^{\omega }(\eta _{S},\eta _{N\backslash S})\) is found over the set of pure strategies of coalition S, while the minimum in \(\max _{\hat{\eta }_{S}}\min _{\eta _{N\backslash S}}\sum _{i\in S}E_{i}^{\omega }(\eta _{S},\eta _{N\backslash S})\) is found over the set of pure strategies of coalition \(N\backslash S\).

  5. The property of superadditivity is not needed and is often omitted in cooperative game theory, because in real life there are a lot of motivations to consider both profitable and non-profitable coalitions. As Aumann and Dreze (1974, p. 233) note, there are arguments for superadditivity that are quite persuasive, but, as they also note, superadditivity is quite problematic in some economic applications.

  6. We suppose that the allocation mechanism gives a unique imputation for each subgame. But the principle of subgame consistency may be extended in the case when the imputation is a set.

  7. See Petrosjan and Danilov (1979) and Baranova and Petrosjan (2006).

  8. If the solution is the Shapley value, the nucleolus or another single-valued solution.

  9. Ehtamo and Hamalainen (1989) model this discrete-time effect in differential games.

  10. The strict definition of the behaviour strategy is given in the Appendix in the proof of Proposition 3.

  11. Things change for subgame perfectness. In this case, we need to prove that Eq. (13) holds for all possible histories and all stages. Therefore, we need to determine the strategy of a player even if more than one player deviates. Strategy (44), p. 31, defines the behaviour of the player given any history.

  12. In the case of multiple Nash equilibria, one of them should be chosen for the realisation of the punishment. Notice that this can be implemented because players use correlated strategies.

  13. Note that it is possible to formulate an analogous condition for repeated games.

  14. Notice that the actions of the players from coalition \(N\backslash z\) are correlated.

References

  • Amir, R. (2003). Stochastic games in economics: The lattice-theoretic approach. In A. Neyman & S. Sorin (Eds.), Stochastic games and applications (pp. 443–453)., NATO Science Series C, Mathematical and Physical Sciences Netherlands.: Springer.

    Chapter  Google Scholar 

  • Aumann, R. J. (1974). Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1, 67–96.

    Article  Google Scholar 

  • Aumann, R. J., & Dreze, J. H. (1974). Cooperative games with coalition structure. International Journal of Game Theory, 3, 217–237.

    Article  Google Scholar 

  • Aumann, R. J., & Peleg, B. (1960). Von Neumann–Morgenstern solutions to cooperative games without side payments. Bulletin of the American Mathematical Society, 66, 173–179.

    Article  Google Scholar 

  • Baranova, E. M., & Petrosjan, L. A. (2006). Cooperative stochastic games in stationary strategies. Game Theory and Applications, 11, 7–17.

    Google Scholar 

  • Dutta, P. (1995). A folk theorem for stochastic games. Journal of Economic Theory, 66, 1–32.

    Article  Google Scholar 

  • Ehtamo, H., & Hamalainen, R. P. (1989). Incentive strategies and equilibria for dynamic games with delayed information. Journal of Optimization Theory and Applications, 63, 355–369.

    Article  Google Scholar 

  • Fink, A. M. (1964). Equilibrium in a stochastic n-person game. Journal of Science of the Hiroshima University, A–I 28, 89–93.

    Google Scholar 

  • Herings, P. J., & Peeters, R. J. A. P. (2004). Stationary equlibria in stochastic games: Structure, selection, and computation. Journal of Economic Theory, 118, 32–60.

    Google Scholar 

  • Kohlberg, E., & Neyman, A. (2015). The cooperative solution of stochastic games. Harvard Business School. Working Paper, No. 15-071, March 2015.

  • Parilina, E. M. (2015). Stable cooperation in stochastic games. Automation and Remote Control, 76, 1111–1122.

    Article  Google Scholar 

  • Parilina, E., & Zaccour, G. (2015). Node-consistent core for games played over event trees. Automatica, 53, 304–311.

    Article  Google Scholar 

  • Petrosjan, L. A. (1977). Stability of the solutions in differential games with several players. In Vestnik Leningrad. University MMatematika Mekhanika Astronomiya, vol. 19 (pp. 46–52). (in Russian).

  • Petrosjan, L. A. (1993). Strongly time consistent differential optimality principles Vestnik Sankt-Peterburgskogo Universiteta. Ser 1. Matematika Mekhanika Astronomiya, 4, 35–40.

  • Petrosjan, L. A. (1998). Semicooperative games Vestnik Sankt-Peterburgskogo Universiteta. Ser 1. Matematika Mekhanika Astronomiya, 2, 62–69. (in Russian).

  • Petrosjan, L. A. (2006). Cooperative Stochastic Games. Advances in dynamic games, Annals of the International Society of Dynamic Games, 8, 139–146.

    Article  Google Scholar 

  • Petrosjan, L. A., & Danilov, N. N. (1979). Stability of the solutions in nonantagonistic differential games with transferable payoffs. Vestnik Leningrad. Univ. Mat. Mekh. Astronom., 1, 52–59. (in Russian).

    Google Scholar 

  • Petrosjan, L. A., & Zenkevich, N. A. (2015). Conditions for sustainable cooperation. Automation and Remote Control, 76(10), 1894–1904.

    Article  Google Scholar 

  • Schmeidler, D. (1969). The nucleolus of a characteristic function game. SIAM Journal of Applied Mathematics, 17(6), 1163–1170.

    Article  Google Scholar 

  • Shapley, L. S. (1953a). Stochastic games. Proceedings of National Academy of Sciences of the USA, 39, 1095–1100.

    Article  Google Scholar 

  • Shapley, L. S. (1953b). A value for n-person games. Contributions to the Theory of Games-II. In H. W. Kuhn & A. W. Tucker (Eds.), Annals of Mathematical Studies (Vol. 28, pp. 307–317). Princeton: Princeton University Press.

    Google Scholar 

  • Takahashi, M. (1964). Stochastic games with infinitely many strategies. Journal of Science of the Hiroshima University, A–I 28, 95–99.

    Google Scholar 

  • von Neumann, J., & Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton: Princeton University Press.

    Google Scholar 

  • Xu, N., & Veinott, A, Jr. (2013). Sequential stochastic core of a cooperative stochastic programming game. Operations Research Letters, 41, 430–435.

    Article  Google Scholar 

  • Yeung, D. W. K. (2006). An irrational-behavior-proof condition in cooperative differential games. International Game Theory Review, 8, 739–744.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessandro Tampieri.

Additional information

We are grateful to Arsen Palestini, Leon Petrosyan, Pierre M. Picard, Giuseppe Pignataro, Artem Sedakov, the Associate Editor of Theory and Decision and two anonymous referees for helpful comments. A previous version of this paper was presented at Game Theory and Management Conference (St. Petersburg, 2011). The usual disclaimer applies. The work of the first author was supported by Russian Science Foundation (Project 17-11-01079).

Appendix

Appendix

1.1 Calculations of players’ payoffs in stationary strategies

We denote a non-cooperative stochastic subgame in stationary strategies with initial state \(\omega \in \Omega \) by \(G^{\omega }\). Since the set of states \(\Omega \) is finite, there are only \({\bar{\omega }}\) subgames \(G^{\omega _{1}},\ldots ,G^{{\bar{\omega }}},\) each with initial states \(\omega _{1},\ldots ,{\bar{\omega }},\) respectively.

A player’s utility function is then defined as the expectation of the player’s discounted payoff in G. Let \(E_{i}^{\omega }(\eta )\) be the expected payoff of player i in subgame \(G^{\omega }\) when a profile \(\eta =(\eta _{1},\ldots ,\eta _{n})\) in stationary strategies is adopted. The vectorial form of the expected payoffs is \(E_{i}(\eta )=(E_{i}^{\omega _{1}}(\eta ),\ldots ,E_{i}^{{\bar{\omega }}}(\eta ))\).

Hence, a player i’s indirect utility function in the subgame \(G^{\omega }\) satisfies the following recurrent equation:

$$\begin{aligned} E_{i}^{\omega }(\eta )=K_{i}^{\omega }(a^{\omega })+\delta \sum \limits _{\omega ^{\prime }\in \Omega }p(\omega ^{\prime }|\omega ,a^{\omega })E_{i}^{\omega ^{\prime }}(\eta ), \end{aligned}$$
(26)

where \(\eta =(\eta _{1},\ldots ,\eta _{n})\) is the stationary strategy profile such as \(\eta _{i}(\omega )=a_{i}^{\omega }\in \Delta (A_{i}^{\omega })\), \(\omega \in \Omega \), \(i\in N\), and \(a^{\omega }=(a_{1}^{\omega },\ldots ,a_{n}^{\omega })\) for each \(\omega \in \Omega \), \(i\in N\).

We now define the \({\bar{\omega }}\times {\bar{\omega }}\)-matrix of transition probabilities in G, which is a function of the strategy profile \(\eta \):

$$\begin{aligned} \Pi (\eta )= \begin{pmatrix} p(\omega _{1}|\omega _{1},a^{\omega _{1}}) &{} \ldots &{} p({\bar{\omega }}|\omega _{1},a^{\omega _{1}}) \\ p(\omega _{1}|\omega _{2},a^{\omega _{2}}) &{} \ldots &{} p({\bar{\omega }}|\omega _{1},a^{\omega _{2}}) \\ \cdots &{} \cdots &{} \cdots \\ p(\omega _{1}|{\bar{\omega }},a^{{\bar{\omega }}}) &{} \ldots &{} p({\bar{\omega }}|\bar{ \omega },a^{{\bar{\omega }}}) \end{pmatrix} , \end{aligned}$$
(27)

where the element of the jth row and the \(j^{\prime }\)th column is the probability to transit from state jth to state \(j^{\prime }\)th. Given (27), we can rewrite Eq. (26) in matrix form as follows:

$$\begin{aligned} E_{i}(\eta )=K_{i}(a)+\delta \Pi (\eta )E_{i}(\eta ), \end{aligned}$$
(28)

where \(K_{i}(a)=(K_{i}^{\omega _{1}}(a^{\omega _{1}}),\ldots ,K_{i}^{\bar{ \omega }}(a^{{\bar{\omega }}}))\). Equation (28) is equivalent to:

$$\begin{aligned} E_{i}(\eta )=\left( {\mathbb {I}}-\delta \Pi (\eta )\right) ^{-1}K_{i}(a), \end{aligned}$$

where \({\mathbb {I}}\) is an identity matrix of size \({\bar{\omega }}\times \bar{ \omega }\). The matrix \(\left( {\mathbb {I}}-\delta \Pi (\eta )\right) ^{-1}\) always exists for \(\delta \in (0,1)\). The expected payoff of player i in the game G in stationary strategies is then:

$$\begin{aligned} \overline{E}_{i}(\eta )=\pi _{0}E_{i}(\eta )=\pi _{0}\left( {\mathbb {I}} -\delta \Pi (\eta )\right) ^{-1}K_{i}(a). \end{aligned}$$
(29)

1.2 Proof of Proposition 1

The first condition of PDP is equivalent to the following equation:

$$\begin{aligned} \sum \limits _{i\in N}\beta _{i}^{\omega }=\sum \limits _{i\in N}K_{i}^{\omega }(a^{\omega *}), \end{aligned}$$
(30)

where \(a^{\omega *}\) is an action profile adopted under cooperative profile \(\eta ^{*}\) in state \(\omega \). It is easy to show that \(\beta _{i}\) from (11) satisfies (30). Since \(\sum _{i\in N}\beta _{i}^{\omega }\) is equal to \(({\mathbb {I}}-\delta \Pi (\eta ^{*}))\sum _{i\in N}\sigma _{i}=({\mathbb {I}}-\delta \Pi (\eta ^{*}))V(N)\), and V(N) from (5), Eq. (30) holds.

The second condition of PDP is satisfied according to the expected total payoff of player i, denoted by \(B_{i}\), with new transfer \(\beta _{i}^{\omega }\) in state \(\omega \in \Omega \). The recurrent equation of \( B_{i}\) is given by:

$$\begin{aligned} B_{i}^{\omega }=\beta _{i}^{\omega }+\delta \sum \limits _{\omega ^{\prime }\in \Omega }p(\omega ^{\prime }|\omega ,a^{\omega *})B_{i}^{\omega ^{\prime }}, \end{aligned}$$

or, in vectorial form:

$$\begin{aligned} B_{i}=\beta _{i}+\delta \Pi (\eta ^{*})B_{i}, \end{aligned}$$
(31)

where \(B_{i}=(B_{i}^{\omega _{1}},\ldots ,B_{i}^{{\bar{\omega }}})\). Equation (31) is equivalent to:

$$\begin{aligned} B_{i}=\left( {\mathbb {I}}-\delta \Pi (\eta ^{*})\right) ^{-1}\beta _{i}. \end{aligned}$$
(32)

Given the second condition of PDP and Eq. (32) we obtain:

$$\begin{aligned} \sigma _{i}=\left( {\mathbb {I}}-\delta \Pi (\eta ^{*})\right) ^{-1}\beta _{i}, \end{aligned}$$
(33)

where \(\sigma _{i}=(\sigma _{i}^{\omega _{1}},\ldots ,\sigma _{i}^{\bar{ \omega }})\), \((\sigma _{1}^{\omega },\ldots ,\sigma _{n}^{\omega })=\sigma ^{\omega }\in \Sigma ^{\omega }\). Equation (33) can be rewritten equivalently as:

$$\begin{aligned} \beta _{i}=({\mathbb {I}}-\delta \Pi (\eta ^{*}))\sigma _{i}. \end{aligned}$$
(34)

Finally, Eq. (11) equals to:

$$\begin{aligned} \sigma _{i}=\beta _{i}+\delta \Pi (\eta ^{*})\sigma _{i}. \end{aligned}$$
(35)

The second item in the right part of (35) is the expected value of the transfers from the next stage onwards. Suppose that the imputation for each subgame is chosen following the same allocation rule that has been chosen by the players at the beginning of the game. If players maintain the cooperative strategy profile \(\eta ^{*}\), then the expected payoff of player i with new transfers is equal to the expected value of the correspondent imputation component in the cooperative stochastic game \({G}_\mathrm{c}\).

1.3 Proof of Proposition 2

At the beginning of the game, players choose the following solution: \( \overline{\sigma }=(\overline{\sigma }_{1},\ldots ,\overline{\sigma } _{n})\in \overline{\Sigma }\), where \(\overline{\sigma }_{i}=\pi _{0}\sigma _{i}\), \(\sigma _{i}=(\sigma _{i}^{\omega _{1}},\ldots ,\sigma _{i}^{\bar{ \omega }})\), \((\sigma _{1}^{\omega },\ldots ,\sigma _{n}^{\omega })=\sigma ^{\omega }\in \Sigma ^{\omega }\). A cooperative strategy profile is \(\eta ^{*}\). Consider the \(\sigma \)-regularisation of game G determined by Definition 7, and the set of transfers \(\{\beta _{i}\}_{i\in N}\) defined by (11) is a PDP which follows from Proposition 1. To prove that the \(\sigma \)-regularisation of the game G satisfies the principle of subgame consistency, we need to calculate the discounted payoffs in every subgame of the game \(G_{\sigma }\) when a cooperative strategy profile \(\eta ^{*}\) occurs. Consider any subgame \(G_{\sigma }^{\omega }\) starting from state \(\omega \in \Omega \). The discounted payoff of player i in this subgame is:

$$\begin{aligned} E_{i}^{\omega }(\eta ^{*})=\beta _{i}^{\omega }+\delta \sum _{\omega ^{\prime }\in \Omega }p(\omega ^{\prime \omega *})E_{i}(\eta ^{*}), \end{aligned}$$
(36)

where \(E_{i}(\eta ^{*})=(E_{i}^{\omega _{1}}(\eta ^{*}),\ldots ,E_{i}^{{\bar{\omega }}}(\eta ^{*})\) and \(E_{i}^{\omega }(\eta ^{*})\) is the discounted payoff of player i in the subgame \(G_{\sigma }^{\omega }\) starting from state \(\omega \) when players adopt \(\eta ^{*}\). Equation (36) can be rewritten in a vector form:

$$\begin{aligned} E_{i}(\eta ^{*})=\beta _{i}+\delta \Pi (\eta ^{*})E_{i}(\eta ^{*}), \end{aligned}$$

or

$$\begin{aligned} E_{i}(\eta ^{*})=\left( {\mathbb {I}}-\delta \Pi (\eta ^{*})\right) ^{-1}\beta _{i}. \end{aligned}$$

Since \(\beta _{i}\) satisfies (11), we obtain

$$\begin{aligned} E_{i}(\eta ^{*})=\left( {\mathbb {I}}-\delta \Pi (\eta ^{*})\right) ^{-1}\left( {\mathbb {I}}-\delta \Pi (\eta ^{*})\right) \sigma _{i}=\sigma _{i}. \end{aligned}$$

This equation proves that the \(\sigma \)-regularisation of game G satisfies the principle of subgame consistency.

1.4 Proof of Proposition 3

We determine the behaviour strategy profile \({\widehat{\varphi }}=(\widehat{ \varphi }_{1},\ldots ,{\widehat{\varphi }}_{n})\) where strategies \(\widehat{ \varphi }_{i}\), \(i\in N\) are:

$$\begin{aligned} {\widehat{\varphi }}_{i}(h(k))= {\left\{ \begin{array}{ll} a_{i}^{\omega *}, &{} \text {if }\omega (k)=\omega ,h(k)\subset h^{*}; \\ {\hat{a}}_{i}^{\omega }(z), &{} \text {if }\omega (k)=\omega , \text { and }\exists \,\,l\in [1,k-1], \\ &{} z\in N, i\ne z: h(l)\subset h^{*},\text { and} \\ &{} (\omega (l),a(l))\notin h^{*},\text { but} \\ &{} (\omega (l),(a_{z}^{*}(l),a_{N\backslash z}(l))\in h^{*}; \\ \text {any} &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$
(37)

where \(a_{i}^{\omega *}\) corresponds to the player i’s cooperative action, while \({\hat{a}}_{i}^{\omega }(z)\in \Delta (A_{i}^{\omega }) \) is the player i’s punishment that, together with actions \({\hat{a}}_{i^{\prime }}^{\omega }(z)\in \Delta (A_{i^{\prime }}^{\omega })\), of the players \( i^{\prime }\ne i\), \(i^{\prime }\in N\backslash z\), forms the action (either in pure or mixed strategies) of coalition \(N\backslash z\) against player z.Footnote 14 The proof of the proposition follows from the folk theorem for stochastic games (Dutta 1995) using the structure of the behaviour strategy (37). Notice that we do not define the reaction of players when they observe the deviations of more than one player. This is because we focus here on the Nash equilibrium (not subgame perfect). When more than one player deviates, the player chooses any strategy from the player’s set of strategies. We now prove that \({\widehat{\varphi }}(\cdot )=({\widehat{\varphi }} _{1}(\cdot ),\ldots ,{\widehat{\varphi }}_{n}(\cdot ))\) determined in (37) is an NE in the stochastic game \(G_{\sigma }\). Given strategy (37) and provided that all players do not deviate from a cooperative strategy profile \(\eta ^{*} \), the discounted payoff of player i in the subgame \(G_{\sigma }^{\omega }\), \(\omega \in \Omega \), is:

$$\begin{aligned} E_{i}^{\omega }({\widehat{\varphi }})=E_{i}^{\omega }(\eta ^{*}). \end{aligned}$$

Let \(E_{i}({\widehat{\varphi }})\) be equal to the vector \((E_{i}^{\omega _{1}}( {\widehat{\varphi }}),\ldots ,E_{i}^{{\bar{\omega }}}({\widehat{\varphi }})).\) Then for any player \(i\in N,\) the next equation holds:

$$\begin{aligned} E_{i}({\widehat{\varphi }})=({\mathbb {I}}-\delta \Pi (\eta ^{*}))^{-1}\beta _{i}. \end{aligned}$$
(38)

Consider next the profile of strategies \(({\varphi }_{z},{\widehat{\varphi }} _{N\backslash z})\), when some player z deviates from strategy \(\widehat{ \varphi }_{z}\). For any k, there exists \(l\in [1,k-1]\) such that \( h(l)\subset h^{*},\) but \((\omega (k),a(k))\notin h^{*}\) and \((\omega (k),(a_{z}^{*}(k),a_{N\backslash z}(k)))\in h^{*}\). Without loss of generality, we simplify \(\omega (k)=\omega \). In other words, the first individual deviation of player z occurs at stage k. We are now able to determine the total payoff of player z in the game \(G_{\sigma }\) with strategy profiles \((\varphi _{z},{\widehat{\varphi }}_{N\backslash z})\) by

$$\begin{aligned} \overline{E}_{z}^{\sigma }(\varphi _{z},{\widehat{\varphi }}_{N\backslash z})=\pi _{0}E_{z}^{\sigma }(\varphi _{z},{\widehat{\varphi }}_{N\backslash z}), \end{aligned}$$

where

$$\begin{aligned} E_{z}^{\sigma }(\varphi _{z},{\widehat{\varphi }}_{N\backslash z})=E_{z}^{\sigma ,[1,k-1]}(\varphi _{z},{\widehat{\varphi }}_{N\backslash z})+\delta ^{k-1}\Pi ^{k-1}(\varphi _{z},{\widehat{\varphi }}_{N\backslash z})E_{z}^{\sigma ,[k,\infty )}(\varphi _{z},{\widehat{\varphi }}_{N\backslash z}).\nonumber \\ \end{aligned}$$
(39)

The first term in the right hand side of (39) is the expected payoff of player z in the first \(k-1\) stages of the game \(G_{\sigma }\), the second term is the expected payoff of player z in the subgame of \( G_{\sigma }\) beginning from stage k, where \(E_{z}^{\sigma ,[k,\infty )}(\varphi _{z},{\widehat{\varphi }}_{N\backslash z})\) is the vector \( (E_{z}^{\sigma ,1}(\varphi _{z},{\widehat{\varphi }}_{N\backslash z}),\ldots ,E_{z}^{\sigma ,{\bar{\omega }}}(\varphi _{z},{\widehat{\varphi }}_{N\backslash z})),\) with \(E_{z}^{\sigma ,\omega }(\varphi _{z},{\widehat{\varphi }} _{N\backslash z})\) being the player z’s expected payoff in the regularised subgame \(G_{\sigma }^{\omega }\) beginning at state \(\omega \). Since there are no deviations from a cooperative strategy profile \(\eta ^{*}\) up to stage \(k-1\), the following equalities hold:

$$\begin{aligned} E_{z}^{\sigma ,[1,k-1]}(\varphi _{z},{\widehat{\varphi }}_{N\backslash z})= & {} E_{z}^{\sigma ,[1,k-1]}(\eta ^{*}),\\ \Pi ^{k-1}(\varphi _{z},{\widehat{\varphi }}_{N\backslash z})= & {} \Pi ^{k-1}(\eta ^{*}). \end{aligned}$$

We now find the discounted payoff of player z in the subgame \(G_{\sigma }^{\omega }\) beginning with stage k and when state \(\omega (k)\) is equal to \(\omega \). The following formula takes place:

$$\begin{aligned} E_{z}^{\sigma ,\omega }(\varphi _{z},{\widehat{\varphi }}_{N\backslash z})=K_{z}^{\omega }({\hat{a}}_{z}^{\omega },a_{N\backslash z}^{\omega *})+\delta \sum \limits _{\omega ^{\prime }\in \Omega }p(\omega ^{\prime }|\omega ,({\hat{a}}_{z}^{\omega },a_{N\backslash z}^{\omega *}))V^{\omega ^{\prime }}\left( \{z\}\right) , \end{aligned}$$
(40)

where \({\hat{a}}_{z}^{\omega }\in \Delta (A_{z}^{\omega })\). Players from the coalition \(N\setminus z\) punish player z by playing the strategies which allow player z to obtain her minmax payoff according to the definition of strategy profile \({\widehat{\varphi }}\). In (40), the value of the characteristic function \(V^{\omega ^{\prime }}(\{z\})\) is determined by (6). Since the expected payoffs of player z in the strategy profiles \({\widehat{\varphi }}\) and \(({\varphi }_{z},{\widehat{\varphi }} _{N\backslash z})\) do not change up to stage \(k-1\), then a deviation may increase player z’s payoff only at the expenses of the expected payoff in the subgame \(G_{\sigma }^{\omega }\), \(\omega \in \Omega \). In particular, the strategy profile \(({\varphi }_{z},{\widehat{\varphi }}_{N\backslash z})\) ensures the following expected payoff of player z from stage k:

$$\begin{aligned} F(\{z\})=\max \limits _{{\hat{a}}_{z}^{\omega }\in \Delta (A_{z}^{\omega })}\left\{ K_{z}^{\omega }({\hat{a}}_{z}^{\omega },a_{N\backslash z}^{\omega *})+\delta \sum \limits _{\omega ^{\prime }\in \Omega }p(\omega ^{\prime }|\omega ,({\hat{a}}_{z}^{\omega },a_{N\backslash z}^{\omega *}))V^{\omega ^{\prime }}\left( \{z\}\right) \right\} {.} \end{aligned}$$
(41)

According to the definition of PDP, the expected payoff of player z in the regularised subgame \(G_{\sigma }^{\omega }\) with a profile of strategies \( {\widehat{\varphi }}(\cdot )\) can be found from:

$$\begin{aligned} E_{z}^{\sigma }({\widehat{\varphi }})=({\mathbb {I}}-\delta \Pi (\eta ^{*}))^{-1}\beta _{z}=\sigma _{z}, \end{aligned}$$
(42)

where \(E_{z}^{\sigma }({\widehat{\varphi }})=(E_{z}^{\sigma ,\omega _{1}}( {\widehat{\varphi }}),\ldots ,E_{z}^{\sigma ,{\bar{\omega }}}({\widehat{\varphi }} )) \). Taking into account (14) from (41), (42) and the above discussion, we get

$$\begin{aligned} E_{z}^{\sigma }({\widehat{\varphi }})\geqslant E_{z}^{\sigma }({\varphi }_{z}, {\widehat{\varphi }}_{N\backslash z}), \end{aligned}$$

which is satisfied when inequality

$$\begin{aligned} \sigma _{z}=\left( \mathbb {I-\delta }\Pi \left( \eta ^{*}\right) \right) ^{-1}\beta _{z}\ge F(\{z\}) \end{aligned}$$
(43)

is true. In this case, a player is not willing to deviate from the cooperative strategy profile in any subgame of the \(\sigma \)-regularisation of game G. Thus, the behaviour strategy profile (37) is an NE in the \(\sigma \)-regularisation of game G. The discounted payoff of i in the game \(G_{\sigma }\) with profile of strategies \({\widehat{\varphi }}\) is equal to \(\overline{\sigma }_{i},\) where \(\overline{\sigma }_{i}=\pi _{0}\sigma _{i}\), while \(\sigma _{i}=(\sigma _{i}^{\omega _1},\ldots ,\sigma _{i}^{{\bar{\omega }}})\) consists of ith components of imputations \(\sigma ^{\omega _1}\), \(\ldots \), \(\sigma ^{{\bar{\omega }}}\) derived from the cooperative subgames \(G^{1}\), \(\ldots \), \(G^{{\bar{\omega }}}\) accordingly.

1.5 Proof of Proposition 4

The proof is similar to the proof of Proposition 3 using the structure of the “new” strategy profile. Determine this behaviour strategy profile as \({\tilde{\varphi }}=({\tilde{\varphi }}_{1},\ldots ,{\tilde{ \varphi }}_{n})\) where strategies \({\tilde{\varphi }}_{i}\), \(i\in N\) are:

$$\begin{aligned} {\tilde{\varphi }}_{i}(h(k))= {\left\{ \begin{array}{ll} a_{i}^{\omega *}, &{} \text {if }\quad \omega (k)=\omega ,h(k)\subset h^{*}; \\ {\hat{a}}^{\omega ,ne}_i, &{} \text {if }\quad h(k)\nsubseteq h^{*}, \end{array}\right. } \end{aligned}$$
(44)

where \({\hat{a}}^{\omega ,ne}_i\in \Delta (A_i^{\omega })\) is player’s i’s punishment, which can be either in pure or mixed strategies. Notice that, if a multi-player deviation is observed in the history, all players implement \( \hat{\eta }^{ne}\).

1.6 Proof of Proposition 5

In (37) we do not determine the behaviour of the players when they observe the deviations of more than one player. There is no need to define behaviour strategy profile for this case, because in this part we focus on the concept of Nash equilibrium. When more than one player deviates, the player chooses any strategy from a corresponding set.

In what follows, we show that condition (17) is sufficient for inequality (16) to hold for any \(k=1,2,\ldots \). The proof is based on the mathematical induction method. First, we rewrite (16) for \(k=1\). Then we transform (17) by considering definition \( \sigma _{i}\) and using PDP (33). We get

$$\begin{aligned} V(\{i\})\leqslant \beta _{i}+\delta \Pi (\eta ^{*})V(\{i\}). \end{aligned}$$
(45)

Suppose that (17) implies (16) for \(k=l\). Rewriting (16) for \(k=l\) yields:

$$\begin{aligned} V(\{i\})\leqslant \beta _{i}+\cdots +\delta ^{l-1}\Pi ^{l-1}(\eta ^{*})\beta _{i}+\delta ^{l}\Pi ^{l}(\eta ^{*})V(\{i\}). \end{aligned}$$
(46)

We adopt the same procedure for \(k=l+1\). Inequality (16) for \( k=l+1 \) is:

$$\begin{aligned} V(\{i\})\leqslant \beta _{i}+\cdots +\delta ^{l}\Pi ^{l}(\eta ^{*})\beta _{i}+\delta ^{l+1}\Pi ^{l+1}(\eta ^{*})V(\{i\}). \end{aligned}$$
(47)

Next, we need to prove that, if (17) holds, then (16) holds for \(k=l+1\). After the transformation the right hand side of (47) is:

$$\begin{aligned} \beta _{i}+\delta \Pi (\eta ^{*})\left\{ \beta _{i}+\delta \Pi (\eta ^{*})\beta _{i}+\cdots +\delta ^{l-1}\Pi ^{l-1}(\eta ^{*})\beta _{i}+\delta ^{l}\Pi ^{l}(\eta ^{*})V(\{i\})\right\} . \end{aligned}$$

Taking into account (46), the expression in brackets is not less than \(V(\{i\})\). Therefore, the right part of (47) is not less than \(\beta _{i}+\delta \Pi (\eta ^{*})V(\{i\})\). From Eqs. (11) and (17), we get (16) for \(k=l+1\), which proves the proposition.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parilina, E.M., Tampieri, A. Stability and cooperative solution in stochastic games. Theory Decis 84, 601–625 (2018). https://doi.org/10.1007/s11238-017-9619-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11238-017-9619-7

Keywords

Navigation