Skip to main content
Log in

Evolution of cooperation in stochastic games

  • Letter
  • Published:

From Nature

View current issue Submit your manuscript


Social dilemmas occur when incentives for individuals are misaligned with group interests1,2,3,4,5,6,7. According to the ‘tragedy of the commons’, these misalignments can lead to overexploitation and collapse of public resources. The resulting behaviours can be analysed with the tools of game theory8. The theory of direct reciprocity9,10,11,12,13,14,15 suggests that repeated interactions can alleviate such dilemmas, but previous work has assumed that the public resource remains constant over time. Here we introduce the idea that the public resource is instead changeable and depends on the strategic choices of individuals. An intuitive scenario is that cooperation increases the public resource, whereas defection decreases it. Thus, cooperation allows the possibility of playing a more valuable game with higher payoffs, whereas defection leads to a less valuable game. We analyse this idea using the theory of stochastic games16,17,18,19 and evolutionary game theory. We find that the dependence of the public resource on previous interactions can greatly enhance the propensity for cooperation. For these results, the interaction between reciprocity and payoff feedback is crucial: neither repeated interactions in a constant environment nor single interactions in a changing environment yield similar cooperation rates. Our framework shows which feedbacks between exploitation and environment—either naturally occurring or designed—help to overcome social dilemmas.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: In stochastic games, the decisions made by players in one round determine the game that will be played next round.
Fig. 2: Stochastic games can promote cooperation even if all individual games favour defection.
Fig. 3: Probabilistic transitions maximize cooperation in three different stochastic games.
Fig. 4: Strong immediate feedback maximizes cooperation.

Similar content being viewed by others


  1. Lloyd, W. F. Two Lectures on the Checks to Population (Oxford Univ. Press, Oxford, 1833).

    Google Scholar 

  2. Hardin, G. The tragedy of the commons. Science 162, 1243–1248 (1968).

    Article  ADS  PubMed  CAS  Google Scholar 

  3. Trivers, R. L. The evolution of reciprocal altruism. Q. Rev. Biol. 46, 35–57 (1971).

    Article  Google Scholar 

  4. Axelrod, R. The Evolution of Cooperation (Basic Books, New York, NY, 1984).

    MATH  Google Scholar 

  5. Ostrom, E. Governing the Commons: The Evolution of Institutions for Collective Action (Cambridge Univ. Press, Cambridge, 1990).

    Book  Google Scholar 

  6. Nowak, M. A. Five rules for the evolution of cooperation. Science 314, 1560–1563 (2006).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  7. Van Lange, P. A. M., Balliet, D., Parks, C. D. & Van Vugt, M. Social Dilemmas – The Psychology of Human Cooperation (Oxford Univ. Press, Oxford, 2015).

    Google Scholar 

  8. Sigmund, K. The Calculus of Selfishness (Princeton Univ. Press, Princeton, 2010).

    Book  MATH  Google Scholar 

  9. Nowak, M. & Sigmund, K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature 364, 56–58 (1993).

    Article  ADS  PubMed  CAS  Google Scholar 

  10. Hauert, C. & Schuster, H. G. Effects of increasing the number of players and memory size in the iterated prisoner’s dilemma: a numerical approach. Proc. R. Soc. Lond. B 264, 513–519 (1997).

    Article  ADS  Google Scholar 

  11. Killingback, T. & Doebeli, M. The continuous prisoner’s dilemma and the evolution of cooperation through reciprocal altruism with variable investment. Am. Nat. 160, 421–438 (2002).

    PubMed  Google Scholar 

  12. Szolnoki, A., Perc, M. & Szabó, G. Phase diagrams for three-strategy evolutionary prisoner’s dilemma games on regular graphs. Phys. Rev. E 80, 056104 (2009).

    Article  ADS  CAS  Google Scholar 

  13. Grujić, J., Cuesta, J. A. & Sánchez, A. On the coexistence of cooperators,defectors and conditional cooperators in the multiplayer iterated Prisoner’s Dilemma. J. Theor. Biol. 300, 299–308 (2012).

    Article  MathSciNet  PubMed  MATH  Google Scholar 

  14. García, J. & van Veelen, M. In and out of equilibrium I: evolution of strategies in repeated games with discounting. J. Econ. Theory 161, 161–189 (2016).

    Article  MathSciNet  MATH  Google Scholar 

  15. Hilbe, C., Chatterjee, K. & Nowak, M. A. Partners and rivals in direct reciprocity. Nat. Hum. Behav. (2018).

  16. Shapley, L. S. Stochastic games. Proc. Natl Acad. Sci. USA 39, 1095–1100 (1953).

    Article  ADS  MathSciNet  PubMed  MATH  CAS  Google Scholar 

  17. Neyman, A. & Sorin, S. (eds) Stochastic Games and Applications (Kluwer Academic Press, Dordrecht, 2003).

    MATH  Google Scholar 

  18. Mertens, J. F. & Neyman, A. Stochastic games. Int. J. Game Theory 10, 53–66 (1981).

    Article  MathSciNet  MATH  Google Scholar 

  19. Mertens, J. F. & Neyman, A. Stochastic games have a value. Proc. Natl Acad. Sci. USA 79, 2145–2146 (1982).

    Article  ADS  MathSciNet  PubMed  MATH  CAS  Google Scholar 

  20. Rand, D. G. & Nowak, M. A. Human cooperation. Trends Cogn. Sci. 17, 413–425 (2013).

    Article  PubMed  Google Scholar 

  21. Ledyard, J. O. in The Handbook of Experimental Economics (eds Kagel, J. H. & Roth, A. E.) 111–194 (Princeton Univ. Press, Princeton, 1995).

  22. Milinski, M., Sommerfeld, R. D., Krambeck, H.-J., Reed, F. A. & Marotzke, J. The collective-risk social dilemma and the prevention of simulated dangerous climate change. Proc. Natl Acad. Sci. USA 105, 2291–2294 (2008).

    Article  ADS  PubMed  Google Scholar 

  23. Alur, R., Henzinger, T. & Kupferman, O. Alternating-time temporal logic. J. Assoc. Comput. Mach. 49, 672–713 (2002).

    Article  MathSciNet  MATH  Google Scholar 

  24. Miltersen, P. B. & Sorensen, T. B. A near-optimal strategy for a heads-up no-limit texas hold’em poker tournament. In Proc. 6th International Joint Conference on Autonomous Agents and Multiagent Systems 191 (ACM, 2007).

  25. Ashcroft, P., Altrock, P. M. & Galla, T. Fixation in finite populations evolving in fluctuating environments. J. R. Soc. Interface 11, 20140663 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Gokhale, C. S. & Hauert, C. Eco-evolutionary dynamics of social dilemmas. Theor. Popul. Biol. 111, 28–42 (2016).

    Article  PubMed  MATH  Google Scholar 

  27. Hauert, C., Holmes, M. & Doebeli, M. Evolutionary games and population dynamics: maintenance of cooperation in public goods games. Proc. R. Soc. Lond. B 273, 2565–2570 (2006); corrigendum 273, 3131–313 (2006).

    Article  Google Scholar 

  28. Weitz, J. S., Eksin, C., Paarporn, K., Brown, S. P. & Ratcliff, W. C. An oscillating tragedy of the commons in replicator dynamics with game-environment feedback. Proc. Natl Acad. Sci. USA 113, E7518–E7525 (2016).

    Article  PubMed  CAS  Google Scholar 

  29. Tavoni, A., Schlüter, M. & Levin, S. The survival of the conformist: social pressure and renewable resource management. J. Theor. Biol. 299, 152–161 (2012).

    Article  MathSciNet  PubMed  MATH  Google Scholar 

  30. Traulsen, A., Nowak, M. A. & Pacheco, J. M. Stochastic dynamics of invasion and fixation. Phys. Rev. E 74, 011909 (2006).

    Article  ADS  CAS  Google Scholar 

  31. Neyman, A. Continuous-time stochastic games. Games Econ. Behav. 104, 92–130 (2017).

    Article  MathSciNet  MATH  Google Scholar 

  32. Nowak, M. A. & Sigmund, K. The evolution of stochastic strategies in the prisoner’s dilemma. Acta Appl. Math. 20, 247–265 (1990).

    Article  MathSciNet  MATH  Google Scholar 

  33. Ohtsuki, H. & Iwasa, Y. The leading eight: social norms that can maintain cooperation by indirect reciprocity. J. Theor. Biol. 239, 435–444 (2006).

    Article  MathSciNet  PubMed  Google Scholar 

  34. Stewart, A. J. & Plotkin, J. B. Collapse of cooperation in evolving games. Proc. Natl Acad. Sci. USA 111, 17558–17563 (2014).

    Article  ADS  PubMed  CAS  Google Scholar 

  35. Pinheiro, F. L., Vasconcelos, V. V., Santos, F. C. & Pacheco, J. M. Evolution of all-or-none strategies in repeated public goods dilemmas. PLOS Comput. Biol. 10, e1003945 (2014).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  36. Akin, E. in Ergodic Theory, Advances in Dynamics (ed. Assani, I.) 77–107 (de Gruyter, Berlin, 2016).

  37. Hilbe, C., Martinez-Vaquero, L. A., Chatterjee, K. & Nowak, M. A. Memory-n strategies of direct reciprocity. Proc. Natl Acad. Sci. USA 114, 4715–4720 (2017).

    Article  PubMed  CAS  Google Scholar 

  38. Stewart, A. J. & Plotkin, J. B. Small groups and long memories promote cooperation. Sci. Rep. 6, 26889 (2016).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  39. Reiter, J. G., Hilbe, C., Rand, D. G., Chatterjee, K. & Nowak, M. A. Crosstalk in concurrent repeated games impedes direct reciprocity and requires stronger levels of forgiveness. Nat. Commun. 9, 555 (2018).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  40. Fudenberg, D. & Imhof, L. A. Imitation processes with small mutations. J. Econ. Theory 131, 251–262 (2006).

    Article  MathSciNet  MATH  Google Scholar 

Download references


This work was supported by the European Research Council Start Grant 279307: Graph Games (to K.C.), Austrian Science Fund (FWF) grant P23499-N23 (to K.C.), FWF NFN grant S11407-N23 Rigorous Systems Engineering/Systematic Methods in Systems Engineering (to K.C.), Office of Naval Research Grant N00014-16-1- 2914 (to M.A.N.) and the John Templeton Foundation (M.A.N.). C.H. acknowledges support from the ISTFELLOW programme.

Reviewer information

Nature thanks A. Neyman and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations



All authors conceived the study, performed the analysis, discussed the results and wrote the manuscript.

Corresponding authors

Correspondence to Christian Hilbe, Krishnendu Chatterjee or Martin A. Nowak.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Our findings are robust with respect to parameter changes.

To test the robustness of our findings, we consider the stochastic game introduced in Fig. 2a and independently vary several key parameters. a, b, When we vary the benefit of cooperation in state 1, we find that the advantage of the stochastic game is most pronounced when this benefit is intermediate, 1.5 ≤ b1 ≤ 2.5. This conclusion holds independently of whether individuals use pure strategies only (a) or stochastic ones (b). cf, We obtain similar results when we vary the error rate ε (c), the strength of selection β (d), the discount factor δ (e) and the mutation rate μ (f). In all cases, we observe that stochastic games yield a cooperation premium, provided that errors are sufficiently rare, selection is sufficiently strong, players give sufficient weight to future payoffs and mutations are comparably rare. Solid lines indicate exact results in the limit of rare mutations, whereas square symbols and dashed lines represent simulation results (see Supplementary Information for details). Filled circles highlight the results obtained for the parameters in Fig. 2a. As default parameters, we used the same values as in Fig. 2a: N = 100, b1 = 2.0, b2 = 1.2, c = 1, β = 1, ε = 0.001, δ → 1 and μ → 0.

Extended Data Fig. 2 Whether cooperation evolves in two-player games depends critically on the form of the environmental feedback.

Keeping the game parameters fixed at the values used in Fig. 2a, we explored how the evolution of cooperation depends on the underlying transition structure of the stochastic game in the limit of rare mutations (see Supplementary Information). ah, We calculated the selection–mutation equilibrium for all possible stochastic games with two states when transitions are state-independent and deterministic. i, Overall, six of the eight transition structures lead players to spend more time in the more profitable state 1, in which mutual cooperation has a higher benefit. j, However, cooperation evolves in only two out of these six transition structures. These two structures have in common that mutual cooperation always leads to the beneficial state 1, whereas mutual defection leads to the detrimental state 2. Thus, cooperation is most likely to evolve if the environmental feedback itself incentivizes mutual cooperation and disincentivizes mutual defection. The transitions after unilateral defection have a less prominent role.

Extended Data Fig. 3 Analysis of the evolving strategies suggests that the evolution of cooperation hinges on the success of WSLS.

Here, we consider all state-invariant and deterministic stochastic games with two states and two players. ah, For each of the eight possible cases, we recorded the evolving cooperation rate (lower plots) and the relative abundance of each pure memory-one strategy (upper plots) for different values of b1. For clarity, we depict only two memory-one strategies explicitly, All D (the strategy that prescribes to always defect) and WSLS. The colour-shaded bars on top of the upper plots show parameter regimes in which either All D or WSLS is most abundant among all 16 strategies. In four of the eight cases, we observe that full cooperation evolves as the benefit to cooperation in state 1 approaches b1 = 3. These are exactly the cases in which mutual cooperation leads players towards the more beneficial state 1. Moreover, in these four cases the upper plots show that cooperation emerges owing to the success of WSLS, which is the predominant strategy whenever cooperation prevails. Except for the value of b1, all other parameter values are the same as in Extended Data Fig. 2.

Extended Data Fig. 4 Effect of transitions on cooperation in four-player public-goods games.

We also explored the effect of different transition structures for stochastic games between multiple players (with a public-goods game being played in each state). State 1 is again more beneficial because r1 > r2, but to be in state 1 there must be a minimum number k of cooperators in the previous round. af, For a four-player public-goods game, there are six possible monotonic configurations of the stochastic game because k can be any number from 0 (players always move to first state) to 5 (players never move to first state). h, There is a non-monotonic relationship between the six transition structures and the time spent in the more beneficial state 1. g, The evolving cooperation rate becomes maximal when any deviation from mutual cooperation leads players to state 2 (e). Parameters are as in Fig. 2b, but with the multiplication factor in the first state fixed to r1 = 2 and selection strength β = 1; to derive exact results, we considered the limit of rare mutations μ → 0 (see Supplementary Information for details).

Extended Data Fig. 5 WSLS sustains cooperation in multiplayer public-goods games.

This figure is analogous to Extended Data Fig. 3 for the case of multiplayer interactions. Again, we show evolving cooperation rates and the relative abundance of All D and WSLS for the six state-independent and deterministic games in which transitions are monotonic. In five of these games, cooperation emerges once the multiplication factor r1 becomes sufficiently large. In all of those, WSLS is the most abundant strategy when cooperation evolves. Except for r1, all parameters are the same as in Extended Data Fig. 4.

Extended Data Fig. 6 Probabilistic transitions can further enhance cooperation.

a, Here, we explore in more detail the stochastic game introduced in Fig. 3a (see Supplementary Information for details), in which any defection always leads to state 2. After mutual cooperation in state 1, players remain in state 1 with certainty. After mutual cooperation in state 2, players move towards state 1 with probability q. b, Calculating the cooperation rate in the selection–mutation equilibrium in the limit of rare mutations shows that the highest cooperation rate is achieved for intermediate values of q. c, We recorded the abundance of all 32 memory-one strategies in the selection–mutation equilibrium. The most abundant strategy is either All D (for small values of q, as indicated by the red squares), WSLS (for small but positive values of q, green circles) or AWSLS (for all other values of q, yellow triangles; AWSLS is a more ambitious variant of WSLS, see Supplementary Information, section 4.1). d, To estimate the time that it takes each resident strategy to be invaded, we randomly introduced other mutant strategies and recorded how long it took until a mutant successfully fixed (that is, the number of independent mutant strategies introduced before the mutant strategy was adopted by the whole population). To obtain a reliable estimate, we performed 10,000 runs for each resident strategy. e, f, In addition, we recorded which strategy eventually reaches fixation if the resident applies either All D or WSLS when q = 1. Parameters: b1 = 1.9, b2 = 1.4, c = 1, β = 1, N = 100.

Extended Data Fig. 7 Players benefit from a small endogenous risk that the game stops early.

a, We consider the stochastic game in Fig. 3b, in which players remain in state 1 after cooperation, but move towards state 2 with transition probability q if one of the players defects. In state 2, no profitable interactions are possible. All results are discussed in detail in Supplementary Information; here we provide a summary. b, According to our evolutionary simulations, a higher transition probability leads to more cooperation. c, However, a higher probability q also makes players move to the second state if one of them defected merely owing to an error; hence, the dependence of payoffs on q is non-monotonic. d, e, When q is small, Grim is the predominant strategy. Players with this strategy cooperate until one of the players defects; from then on, they defect forever. As q increases, WSLS strategies take over. As q → 1, unconditional cooperation becomes most successful. f, For the given parameter values, a homogeneous Grim population achieves only one-third of the maximum payoff possible, because any error leads to relentless defection. The other three strategies result in the maximum payoff b1 − c for q = 0, but this payoff decreases with q. Parameters: b1 = 2, c = 1, δ = 0.999, ε = 0.001, β = 1, N = 100.

Extended Data Fig. 8 Immediate environmental feedback enhances cooperation.

a, We consider a state-dependent stochastic game with two players and three states. Mutual cooperation always leads players to move to a superior state (or to remain in the most beneficial state s1). Similarly, mutual defection always leads to an inferior state (or players remain in the most detrimental state s3). After a unilateral defection, players remain in the same state. We consider four different versions of this game, depending on how quickly the payoffs decrease as players move towards an inferior state. b, Our numerical results show that an immediate negative response of the environment to defection is most favourable to the evolution of cooperation. c, As a consequence, the scenario with immediate consequences also yields the highest average payoffs once the benefit in state 1 exceeds a moderate threshold. dg, On the level of evolving strategies, we find that an immediately responding environment is most favourable to the evolution of WSLS strategies and strongly selects against defecting strategies. Again, the coloured bars on top of each panel indicate the strategy that is most favoured by selection for the respective value of b1 (see Supplementary Information for all details). Parameters: c = 1; b1 varies from 1 to 3; b2 is equal to c, (b1 + c)/2 or b1; and b3 is equal to either c or b1 depending on the scenario considered (as depicted in a); N = 100, β = 1, δ → 1, ε = 0.001.

Extended Data Fig. 9 Cooperation in stochastic games requires that players take future payoff consequences into account.

We repeated the numerical computations in Extended Data Fig. 8 for various discount rates δ. When players focus entirely on the present (δ = 0), cooperation evolves in none of the four treatments. As players increasingly take future payoffs into account, cooperation rates increase. Immediate payoff feedback is most conducive to cooperation across all values of δ considered. Except for the discount rate, parameters are the same as in Extended Data Fig. 8, with b1 = 1.8.

Extended Data Fig. 10 A systematic analysis of the expected game dynamics for different game payoffs.

Keeping the two-player game in state 2 fixed to the game in Fig. 2a, we varied the game that is played in state 1. We assume that payoffs in the first state are 1 (for mutual cooperation), S1 (for unilateral cooperation), T1 (for unilateral defection) and 0 (for mutual defection). Depending on T1 and S1, game 1 can be one of four different types: harmony game (HG), snowdrift game (SD), stag-hunt game (SH) or prisoner’s dilemma (PD); see Supplementary Information for details. For each of the eight possible state-independent transitions q, we systematically varied the temptation payoff T1 (x axis) and the sucker’s payoff S1 (y axis) in the first state (see Supplementary Information for details). For each combination of T1, S1 and q, we computed how often players cooperate in the selection–mutation equilibrium (left panels) and in what fraction of rounds they switch from one state to the other (right panels). ac, e, Full cooperation can evolve when players find themselves in state 1 after mutual cooperation. d, f, Players learn to switch between states only when mutual cooperation leads to state 2 and mutual defection leads to state 1. g, h, In the remaining cases, players hardly cooperate. The payoffs in game 2 are the same as in Fig. 2a—a prisoner’s dilemma with b2 = 1.2 and c = 1. For the evolutionary parameters we considered population size N = 100 and selection strength β = 1.

Supplementary information

Supplementary Information

This file contains a Supplementary Discussion, Supplementary Table 1 and Supplementary References. Supplementary Table 1 provides several examples of memory-1 strategies of stochastic games.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hilbe, C., Šimsa, Š., Chatterjee, K. et al. Evolution of cooperation in stochastic games. Nature 559, 246–249 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

  • Springer Nature Limited

This article is cited by