Skip to main content

Advertisement

Log in

Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning

  • Original paper
  • Published:
Nonlinear Dynamics Aims and scope Submit manuscript

Abstract

Large-scale cooperation underpins the evolution of ecosystems and the human society, and the collective behaviors by self-organization of multi-agent systems are the key for understanding. As artificial intelligence (AI) prevails in almost all branches of science, it would be of great interest to see what new insights of collective behaviors could be obtained from a multi-agent AI system. Here, we introduce a typical reinforcement learning (RL) algorithm—Q-learning into evolutionary game dynamics, where agents pursue optimal action on the basis of the introspectiveness rather than the outward manner such as the birth–death or imitation processes in the traditional evolutionary game (EG). We investigate the cooperation prevalence numerically for a general \(2\times 2\) game setting. We find that the cooperation prevalence in the multi-agent AI is unexpectedly of equal level as in the traditional EG in most cases. However, in the snowdrift games with RL, we reveal that explosive cooperation appears in the form of periodic oscillation, and we study the impact of the payoff structure on its emergence. Finally, we show that the periodic oscillation can also be observed in some other EGs with the RL algorithm, such as the rock–paper–scissors game. Our results offer a reference point to understand the emergence of cooperation and oscillatory behaviors in nature and society from AI’s perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Greig, D., Travisano, M.: The prisoner’s dilemma and polymorphism in yeast suc genes. Proc. R. Soc. Lond. B Biol. Sci. 271(Suppl 3), S25–S26 (2004)

    Google Scholar 

  2. Nowak, M.A.: Evolutionary Dynamics. Harvard University Press, Cambridge (2006)

    MATH  Google Scholar 

  3. West, S.A., Griffin, A.S., Gardner, A.: Evolutionary explanations for cooperation. Curr. Biol. 17(16), R661–R672 (2007)

    Google Scholar 

  4. Craig Maclean, R., Brandon, C.: Stable public goods cooperation and dynamic social interactions in yeast. J. Evol. Biol. 21(6), 1836–1843 (2008)

    Google Scholar 

  5. Granovetter, M.: Threshold models of collective behavior. Am. J. Sociol. 83(6), 1420–1443 (1978)

    Google Scholar 

  6. Hamilton, W.D.: The genetical evolution of social behaviour. ii. J. Theor. Biol. 7(1), 17–52 (1964)

    Google Scholar 

  7. Bourke, A.F.: Principles of Social Evolution. Oxford University Press, Oxford (2011)

    Google Scholar 

  8. Smith, J.M., Price, G.R.: The logic of animal conflict. Nature 246(5427), 15 (1973)

    MATH  Google Scholar 

  9. Lee, D.: Game theory and neural basis of social decision making. Nat. Neurosci. 11(4), 404 (2008)

    Google Scholar 

  10. Sanfey, A.G.: Social decision-making: insights from game theory and neuroscience. Science 318(5850), 598–602 (2007)

    Google Scholar 

  11. Zomorrodi, A.R., Segrè, D.: Genome-driven evolutionary game theory helps understand the rise of metabolic interdependencies in microbial communities. Nat. Commun. 8(1), 1563 (2017)

    Google Scholar 

  12. Xu, X., Chen, Z., Si, G., Hu, X., Jiang, Y., Xu, X.: The chaotic dynamics of the social behavior selection networks in crowd simulation. Nonlinear Dyn. 64(1–2), 117–126 (2011)

    MathSciNet  MATH  Google Scholar 

  13. Trivers, R.L.: The evolution of reciprocal altruism. Q. Rev. Biol. 46(1), 35–57 (1971)

    Google Scholar 

  14. Van Veelen, M., García, J., Rand, D.G., Nowak, M.A.: Direct reciprocity in structured populations. Proc. Natl. Acad. Sci. 109(25), 9929–9934 (2012)

    MATH  Google Scholar 

  15. Rand, D.G., Ohtsuki, H., Nowak, M.A.: Direct reciprocity with costly punishment: generous tit-for-tat prevails. J. Theor. Biol. 256(1), 45–57 (2009)

    MathSciNet  MATH  Google Scholar 

  16. Nowak, M.A., Sigmund, K.: Evolution of indirect reciprocity. Nature 437(7063), 1291 (2005)

    Google Scholar 

  17. Panchanathan, K., Boyd, R.: Indirect reciprocity can stabilize cooperation without the second-order free rider problem. Nature 432(7016), 499 (2004)

    Google Scholar 

  18. Rockenbach, B., Milinski, M.: The efficient interaction of indirect reciprocity and costly punishment. Nature 444(7120), 718 (2006)

    Google Scholar 

  19. Szabó, G., Fath, G.: Evolutionary games on graphs. Phys. Rep. 446(4–6), 97–216 (2007)

    MathSciNet  Google Scholar 

  20. Ohtsuki, H., Hauert, C., Lieberman, E., Nowak, M.A.: A simple rule for the evolution of cooperation on graphs and social networks. Nature 441(7092), 502 (2006)

    Google Scholar 

  21. Rapp, P.E.: Why are so many biological systems periodic? Prog. Neurobiol. 29(3), 261–273 (1987)

    Google Scholar 

  22. Solé, R.V., Miramontes, O., Goodwin, B.C.: Oscillations and chaos in ant societies. J. Theor. Biol. 161(3), 343–357 (1993)

    Google Scholar 

  23. Sumpter, D.J.: Collective Animal Behavior. Princeton University Press, Princeton (2010)

    MATH  Google Scholar 

  24. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Malaysia (2016)

    MATH  Google Scholar 

  25. Mitchell, R.S., Michalski, J.G., Carbonell, T.M.: An Artificial Intelligence Approach. Springer, Berlin (2013)

    Google Scholar 

  26. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Google Scholar 

  27. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  28. Nasrabadi, N.M.: Pattern recognition and machine learning. J. Electron. Imaging 16(4), 049901 (2007)

    MathSciNet  Google Scholar 

  29. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)

    Google Scholar 

  30. Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2014)

  31. Cruz, J.A., Wishart, D.S.: Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2, 117693510600200030 (2006)

    Google Scholar 

  32. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Google Scholar 

  33. Brown, N., Sandholm, T.: Superhuman ai for heads-up no-limit poker: libratus beats top professionals. Science 359(6374), 418–424 (2018)

    MathSciNet  MATH  Google Scholar 

  34. Parkes, D.C., Wellman, M.P.: Economic reasoning and artificial intelligence. Science 349(6245), 267–272 (2015)

    MathSciNet  MATH  Google Scholar 

  35. Tian, J., Gu, H.: Anomaly detection combining one-class svms and particle swarm optimization algorithms. Nonlinear Dyn. 61(1–2), 303–310 (2010)

    MATH  Google Scholar 

  36. Jin, X., Shao, J., Zhang, X., An, W., Malekian, R.: Modeling of nonlinear system based on deep learning framework. Nonlinear Dyn. 84(3), 1327–1340 (2016)

    MathSciNet  Google Scholar 

  37. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)

    MathSciNet  MATH  Google Scholar 

  38. Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010)

    MATH  Google Scholar 

  39. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Google Scholar 

  40. Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming, vol. 2050. Princeton University Press, Princeton (2015)

    MATH  Google Scholar 

  41. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  42. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI, vol. 2, Phoenix, AZ, p. 5 (2016)

  43. Van Hasselt, H.: Double q-learning. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2010)

  44. Cao, M., Morse, A.S., Anderson, B.D.: Coordination of an asynchronous multi-agent system via averaging. IFAC Proc. 38(1), 17–22 (2005)

    Google Scholar 

  45. Zeng, H.-L., Alava, M., Aurell, E., Hertz, J., Roudi, Y.: Maximum likelihood reconstruction for ising models with asynchronous updates. Phys. Rev. Lett. 110(21), 210601 (2013)

    Google Scholar 

  46. Stewart, A.J., Plotkin, J.B.: Collapse of cooperation in evolving games. Proc. Natl. Acad. Sci. 111(49), 17558–17563 (2014)

    Google Scholar 

  47. Zhang, S.-P., Zhang, J.-Q., Huang, Z.-G., Guo, B.-H., Wu, Z.-X., Wang, J.: Collective behavior of artificial intelligence population: transition from optimization to game. Nonlinear Dyn. 1–11 (2019)

Download references

Acknowledgements

Si-Ping Zhang is supported by grants from the National Natural Science Foundation of China (Grant Nos. 11975178, 61431012). Li Chen and Ji-Qiang Zhang are supported by the National Natural Science Foundation of China under Grants No. 61703257.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ji-Qiang Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1943 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, SP., Zhang, JQ., Chen, L. et al. Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning. Nonlinear Dyn 99, 3301–3312 (2020). https://doi.org/10.1007/s11071-019-05398-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11071-019-05398-4

Keywords

Navigation