Abstract
Social dilemmas occur when incentives for individuals are misaligned with group interests^{1,2,3,4,5,6,7}. According to the ‘tragedy of the commons’, these misalignments can lead to overexploitation and collapse of public resources. The resulting behaviours can be analysed with the tools of game theory^{8}. The theory of direct reciprocity^{9,10,11,12,13,14,15} suggests that repeated interactions can alleviate such dilemmas, but previous work has assumed that the public resource remains constant over time. Here we introduce the idea that the public resource is instead changeable and depends on the strategic choices of individuals. An intuitive scenario is that cooperation increases the public resource, whereas defection decreases it. Thus, cooperation allows the possibility of playing a more valuable game with higher payoffs, whereas defection leads to a less valuable game. We analyse this idea using the theory of stochastic games^{16,17,18,19} and evolutionary game theory. We find that the dependence of the public resource on previous interactions can greatly enhance the propensity for cooperation. For these results, the interaction between reciprocity and payoff feedback is crucial: neither repeated interactions in a constant environment nor single interactions in a changing environment yield similar cooperation rates. Our framework shows which feedbacks between exploitation and environment—either naturally occurring or designed—help to overcome social dilemmas.
Similar content being viewed by others
References
Lloyd, W. F. Two Lectures on the Checks to Population (Oxford Univ. Press, Oxford, 1833).
Hardin, G. The tragedy of the commons. Science 162, 1243–1248 (1968).
Trivers, R. L. The evolution of reciprocal altruism. Q. Rev. Biol. 46, 35–57 (1971).
Axelrod, R. The Evolution of Cooperation (Basic Books, New York, NY, 1984).
Ostrom, E. Governing the Commons: The Evolution of Institutions for Collective Action (Cambridge Univ. Press, Cambridge, 1990).
Nowak, M. A. Five rules for the evolution of cooperation. Science 314, 1560–1563 (2006).
Van Lange, P. A. M., Balliet, D., Parks, C. D. & Van Vugt, M. Social Dilemmas – The Psychology of Human Cooperation (Oxford Univ. Press, Oxford, 2015).
Sigmund, K. The Calculus of Selfishness (Princeton Univ. Press, Princeton, 2010).
Nowak, M. & Sigmund, K. A strategy of winstay, loseshift that outperforms titfortat in the Prisoner’s Dilemma game. Nature 364, 56–58 (1993).
Hauert, C. & Schuster, H. G. Effects of increasing the number of players and memory size in the iterated prisoner’s dilemma: a numerical approach. Proc. R. Soc. Lond. B 264, 513–519 (1997).
Killingback, T. & Doebeli, M. The continuous prisoner’s dilemma and the evolution of cooperation through reciprocal altruism with variable investment. Am. Nat. 160, 421–438 (2002).
Szolnoki, A., Perc, M. & Szabó, G. Phase diagrams for threestrategy evolutionary prisoner’s dilemma games on regular graphs. Phys. Rev. E 80, 056104 (2009).
Grujić, J., Cuesta, J. A. & Sánchez, A. On the coexistence of cooperators,defectors and conditional cooperators in the multiplayer iterated Prisoner’s Dilemma. J. Theor. Biol. 300, 299–308 (2012).
García, J. & van Veelen, M. In and out of equilibrium I: evolution of strategies in repeated games with discounting. J. Econ. Theory 161, 161–189 (2016).
Hilbe, C., Chatterjee, K. & Nowak, M. A. Partners and rivals in direct reciprocity. Nat. Hum. Behav. (2018).
Shapley, L. S. Stochastic games. Proc. Natl Acad. Sci. USA 39, 1095–1100 (1953).
Neyman, A. & Sorin, S. (eds) Stochastic Games and Applications (Kluwer Academic Press, Dordrecht, 2003).
Mertens, J. F. & Neyman, A. Stochastic games. Int. J. Game Theory 10, 53–66 (1981).
Mertens, J. F. & Neyman, A. Stochastic games have a value. Proc. Natl Acad. Sci. USA 79, 2145–2146 (1982).
Rand, D. G. & Nowak, M. A. Human cooperation. Trends Cogn. Sci. 17, 413–425 (2013).
Ledyard, J. O. in The Handbook of Experimental Economics (eds Kagel, J. H. & Roth, A. E.) 111–194 (Princeton Univ. Press, Princeton, 1995).
Milinski, M., Sommerfeld, R. D., Krambeck, H.J., Reed, F. A. & Marotzke, J. The collectiverisk social dilemma and the prevention of simulated dangerous climate change. Proc. Natl Acad. Sci. USA 105, 2291–2294 (2008).
Alur, R., Henzinger, T. & Kupferman, O. Alternatingtime temporal logic. J. Assoc. Comput. Mach. 49, 672–713 (2002).
Miltersen, P. B. & Sorensen, T. B. A nearoptimal strategy for a headsup nolimit texas hold’em poker tournament. In Proc. 6th International Joint Conference on Autonomous Agents and Multiagent Systems 191 (ACM, 2007).
Ashcroft, P., Altrock, P. M. & Galla, T. Fixation in finite populations evolving in fluctuating environments. J. R. Soc. Interface 11, 20140663 (2014).
Gokhale, C. S. & Hauert, C. Ecoevolutionary dynamics of social dilemmas. Theor. Popul. Biol. 111, 28–42 (2016).
Hauert, C., Holmes, M. & Doebeli, M. Evolutionary games and population dynamics: maintenance of cooperation in public goods games. Proc. R. Soc. Lond. B 273, 2565–2570 (2006); corrigendum 273, 3131–313 (2006).
Weitz, J. S., Eksin, C., Paarporn, K., Brown, S. P. & Ratcliff, W. C. An oscillating tragedy of the commons in replicator dynamics with gameenvironment feedback. Proc. Natl Acad. Sci. USA 113, E7518–E7525 (2016).
Tavoni, A., Schlüter, M. & Levin, S. The survival of the conformist: social pressure and renewable resource management. J. Theor. Biol. 299, 152–161 (2012).
Traulsen, A., Nowak, M. A. & Pacheco, J. M. Stochastic dynamics of invasion and fixation. Phys. Rev. E 74, 011909 (2006).
Neyman, A. Continuoustime stochastic games. Games Econ. Behav. 104, 92–130 (2017).
Nowak, M. A. & Sigmund, K. The evolution of stochastic strategies in the prisoner’s dilemma. Acta Appl. Math. 20, 247–265 (1990).
Ohtsuki, H. & Iwasa, Y. The leading eight: social norms that can maintain cooperation by indirect reciprocity. J. Theor. Biol. 239, 435–444 (2006).
Stewart, A. J. & Plotkin, J. B. Collapse of cooperation in evolving games. Proc. Natl Acad. Sci. USA 111, 17558–17563 (2014).
Pinheiro, F. L., Vasconcelos, V. V., Santos, F. C. & Pacheco, J. M. Evolution of allornone strategies in repeated public goods dilemmas. PLOS Comput. Biol. 10, e1003945 (2014).
Akin, E. in Ergodic Theory, Advances in Dynamics (ed. Assani, I.) 77–107 (de Gruyter, Berlin, 2016).
Hilbe, C., MartinezVaquero, L. A., Chatterjee, K. & Nowak, M. A. Memoryn strategies of direct reciprocity. Proc. Natl Acad. Sci. USA 114, 4715–4720 (2017).
Stewart, A. J. & Plotkin, J. B. Small groups and long memories promote cooperation. Sci. Rep. 6, 26889 (2016).
Reiter, J. G., Hilbe, C., Rand, D. G., Chatterjee, K. & Nowak, M. A. Crosstalk in concurrent repeated games impedes direct reciprocity and requires stronger levels of forgiveness. Nat. Commun. 9, 555 (2018).
Fudenberg, D. & Imhof, L. A. Imitation processes with small mutations. J. Econ. Theory 131, 251–262 (2006).
Acknowledgements
This work was supported by the European Research Council Start Grant 279307: Graph Games (to K.C.), Austrian Science Fund (FWF) grant P23499N23 (to K.C.), FWF NFN grant S11407N23 Rigorous Systems Engineering/Systematic Methods in Systems Engineering (to K.C.), Office of Naval Research Grant N00014161 2914 (to M.A.N.) and the John Templeton Foundation (M.A.N.). C.H. acknowledges support from the ISTFELLOW programme.
Reviewer information
Nature thanks A. Neyman and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Author information
Authors and Affiliations
Contributions
All authors conceived the study, performed the analysis, discussed the results and wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Our findings are robust with respect to parameter changes.
To test the robustness of our findings, we consider the stochastic game introduced in Fig. 2a and independently vary several key parameters. a, b, When we vary the benefit of cooperation in state 1, we find that the advantage of the stochastic game is most pronounced when this benefit is intermediate, 1.5 ≤ b_{1} ≤ 2.5. This conclusion holds independently of whether individuals use pure strategies only (a) or stochastic ones (b). c–f, We obtain similar results when we vary the error rate ε (c), the strength of selection β (d), the discount factor δ (e) and the mutation rate μ (f). In all cases, we observe that stochastic games yield a cooperation premium, provided that errors are sufficiently rare, selection is sufficiently strong, players give sufficient weight to future payoffs and mutations are comparably rare. Solid lines indicate exact results in the limit of rare mutations, whereas square symbols and dashed lines represent simulation results (see Supplementary Information for details). Filled circles highlight the results obtained for the parameters in Fig. 2a. As default parameters, we used the same values as in Fig. 2a: N = 100, b_{1} = 2.0, b_{2} = 1.2, c = 1, β = 1, ε = 0.001, δ → 1 and μ → 0.
Extended Data Fig. 2 Whether cooperation evolves in twoplayer games depends critically on the form of the environmental feedback.
Keeping the game parameters fixed at the values used in Fig. 2a, we explored how the evolution of cooperation depends on the underlying transition structure of the stochastic game in the limit of rare mutations (see Supplementary Information). a–h, We calculated the selection–mutation equilibrium for all possible stochastic games with two states when transitions are stateindependent and deterministic. i, Overall, six of the eight transition structures lead players to spend more time in the more profitable state 1, in which mutual cooperation has a higher benefit. j, However, cooperation evolves in only two out of these six transition structures. These two structures have in common that mutual cooperation always leads to the beneficial state 1, whereas mutual defection leads to the detrimental state 2. Thus, cooperation is most likely to evolve if the environmental feedback itself incentivizes mutual cooperation and disincentivizes mutual defection. The transitions after unilateral defection have a less prominent role.
Extended Data Fig. 3 Analysis of the evolving strategies suggests that the evolution of cooperation hinges on the success of WSLS.
Here, we consider all stateinvariant and deterministic stochastic games with two states and two players. a–h, For each of the eight possible cases, we recorded the evolving cooperation rate (lower plots) and the relative abundance of each pure memoryone strategy (upper plots) for different values of b_{1}. For clarity, we depict only two memoryone strategies explicitly, All D (the strategy that prescribes to always defect) and WSLS. The colourshaded bars on top of the upper plots show parameter regimes in which either All D or WSLS is most abundant among all 16 strategies. In four of the eight cases, we observe that full cooperation evolves as the benefit to cooperation in state 1 approaches b_{1} = 3. These are exactly the cases in which mutual cooperation leads players towards the more beneficial state 1. Moreover, in these four cases the upper plots show that cooperation emerges owing to the success of WSLS, which is the predominant strategy whenever cooperation prevails. Except for the value of b_{1}, all other parameter values are the same as in Extended Data Fig. 2.
Extended Data Fig. 4 Effect of transitions on cooperation in fourplayer publicgoods games.
We also explored the effect of different transition structures for stochastic games between multiple players (with a publicgoods game being played in each state). State 1 is again more beneficial because r_{1} > r_{2}, but to be in state 1 there must be a minimum number k of cooperators in the previous round. a–f, For a fourplayer publicgoods game, there are six possible monotonic configurations of the stochastic game because k can be any number from 0 (players always move to first state) to 5 (players never move to first state). h, There is a nonmonotonic relationship between the six transition structures and the time spent in the more beneficial state 1. g, The evolving cooperation rate becomes maximal when any deviation from mutual cooperation leads players to state 2 (e). Parameters are as in Fig. 2b, but with the multiplication factor in the first state fixed to r_{1} = 2 and selection strength β = 1; to derive exact results, we considered the limit of rare mutations μ → 0 (see Supplementary Information for details).
Extended Data Fig. 5 WSLS sustains cooperation in multiplayer publicgoods games.
This figure is analogous to Extended Data Fig. 3 for the case of multiplayer interactions. Again, we show evolving cooperation rates and the relative abundance of All D and WSLS for the six stateindependent and deterministic games in which transitions are monotonic. In five of these games, cooperation emerges once the multiplication factor r_{1} becomes sufficiently large. In all of those, WSLS is the most abundant strategy when cooperation evolves. Except for r_{1}, all parameters are the same as in Extended Data Fig. 4.
Extended Data Fig. 6 Probabilistic transitions can further enhance cooperation.
a, Here, we explore in more detail the stochastic game introduced in Fig. 3a (see Supplementary Information for details), in which any defection always leads to state 2. After mutual cooperation in state 1, players remain in state 1 with certainty. After mutual cooperation in state 2, players move towards state 1 with probability q. b, Calculating the cooperation rate in the selection–mutation equilibrium in the limit of rare mutations shows that the highest cooperation rate is achieved for intermediate values of q. c, We recorded the abundance of all 32 memoryone strategies in the selection–mutation equilibrium. The most abundant strategy is either All D (for small values of q, as indicated by the red squares), WSLS (for small but positive values of q, green circles) or AWSLS (for all other values of q, yellow triangles; AWSLS is a more ambitious variant of WSLS, see Supplementary Information, section 4.1). d, To estimate the time that it takes each resident strategy to be invaded, we randomly introduced other mutant strategies and recorded how long it took until a mutant successfully fixed (that is, the number of independent mutant strategies introduced before the mutant strategy was adopted by the whole population). To obtain a reliable estimate, we performed 10,000 runs for each resident strategy. e, f, In addition, we recorded which strategy eventually reaches fixation if the resident applies either All D or WSLS when q = 1. Parameters: b_{1} = 1.9, b_{2} = 1.4, c = 1, β = 1, N = 100.
Extended Data Fig. 7 Players benefit from a small endogenous risk that the game stops early.
a, We consider the stochastic game in Fig. 3b, in which players remain in state 1 after cooperation, but move towards state 2 with transition probability q if one of the players defects. In state 2, no profitable interactions are possible. All results are discussed in detail in Supplementary Information; here we provide a summary. b, According to our evolutionary simulations, a higher transition probability leads to more cooperation. c, However, a higher probability q also makes players move to the second state if one of them defected merely owing to an error; hence, the dependence of payoffs on q is nonmonotonic. d, e, When q is small, Grim is the predominant strategy. Players with this strategy cooperate until one of the players defects; from then on, they defect forever. As q increases, WSLS strategies take over. As q → 1, unconditional cooperation becomes most successful. f, For the given parameter values, a homogeneous Grim population achieves only onethird of the maximum payoff possible, because any error leads to relentless defection. The other three strategies result in the maximum payoff b_{1} − c for q = 0, but this payoff decreases with q. Parameters: b_{1} = 2, c = 1, δ = 0.999, ε = 0.001, β = 1, N = 100.
Extended Data Fig. 8 Immediate environmental feedback enhances cooperation.
a, We consider a statedependent stochastic game with two players and three states. Mutual cooperation always leads players to move to a superior state (or to remain in the most beneficial state s_{1}). Similarly, mutual defection always leads to an inferior state (or players remain in the most detrimental state s_{3}). After a unilateral defection, players remain in the same state. We consider four different versions of this game, depending on how quickly the payoffs decrease as players move towards an inferior state. b, Our numerical results show that an immediate negative response of the environment to defection is most favourable to the evolution of cooperation. c, As a consequence, the scenario with immediate consequences also yields the highest average payoffs once the benefit in state 1 exceeds a moderate threshold. d–g, On the level of evolving strategies, we find that an immediately responding environment is most favourable to the evolution of WSLS strategies and strongly selects against defecting strategies. Again, the coloured bars on top of each panel indicate the strategy that is most favoured by selection for the respective value of b_{1} (see Supplementary Information for all details). Parameters: c = 1; b_{1} varies from 1 to 3; b_{2} is equal to c, (b_{1} + c)/2 or b_{1}; and b_{3} is equal to either c or b_{1} depending on the scenario considered (as depicted in a); N = 100, β = 1, δ → 1, ε = 0.001.
Extended Data Fig. 9 Cooperation in stochastic games requires that players take future payoff consequences into account.
We repeated the numerical computations in Extended Data Fig. 8 for various discount rates δ. When players focus entirely on the present (δ = 0), cooperation evolves in none of the four treatments. As players increasingly take future payoffs into account, cooperation rates increase. Immediate payoff feedback is most conducive to cooperation across all values of δ considered. Except for the discount rate, parameters are the same as in Extended Data Fig. 8, with b_{1} = 1.8.
Extended Data Fig. 10 A systematic analysis of the expected game dynamics for different game payoffs.
Keeping the twoplayer game in state 2 fixed to the game in Fig. 2a, we varied the game that is played in state 1. We assume that payoffs in the first state are 1 (for mutual cooperation), S_{1} (for unilateral cooperation), T_{1} (for unilateral defection) and 0 (for mutual defection). Depending on T_{1} and S_{1}, game 1 can be one of four different types: harmony game (HG), snowdrift game (SD), staghunt game (SH) or prisoner’s dilemma (PD); see Supplementary Information for details. For each of the eight possible stateindependent transitions q, we systematically varied the temptation payoff T_{1} (x axis) and the sucker’s payoff S_{1} (y axis) in the first state (see Supplementary Information for details). For each combination of T_{1}, S_{1} and q, we computed how often players cooperate in the selection–mutation equilibrium (left panels) and in what fraction of rounds they switch from one state to the other (right panels). a–c, e, Full cooperation can evolve when players find themselves in state 1 after mutual cooperation. d, f, Players learn to switch between states only when mutual cooperation leads to state 2 and mutual defection leads to state 1. g, h, In the remaining cases, players hardly cooperate. The payoffs in game 2 are the same as in Fig. 2a—a prisoner’s dilemma with b_{2} = 1.2 and c = 1. For the evolutionary parameters we considered population size N = 100 and selection strength β = 1.
Supplementary information
Supplementary Information
This file contains a Supplementary Discussion, Supplementary Table 1 and Supplementary References. Supplementary Table 1 provides several examples of memory1 strategies of stochastic games.
Rights and permissions
About this article
Cite this article
Hilbe, C., Šimsa, Š., Chatterjee, K. et al. Evolution of cooperation in stochastic games. Nature 559, 246–249 (2018). https://doi.org/10.1038/s415860180277x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s415860180277x
 Springer Nature Limited
This article is cited by

Evolutionarily stable payoff matrix in hawk–dove games
BMC Ecology and Evolution (2024)

Effect of reciprocity mechanisms on evolutionary dynamics in feedbackevolving games
Nonlinear Dynamics (2024)

The effect of environmental information on evolution of cooperation in stochastic games
Nature Communications (2023)

Evolutionary games with two species and delayed reciprocity
Nonlinear Dynamics (2023)

Path probability selection in nature and path integral
Scientific Reports (2022)