Abstract
Large-scale cooperation underpins the evolution of ecosystems and the human society, and the collective behaviors by self-organization of multi-agent systems are the key for understanding. As artificial intelligence (AI) prevails in almost all branches of science, it would be of great interest to see what new insights of collective behaviors could be obtained from a multi-agent AI system. Here, we introduce a typical reinforcement learning (RL) algorithm—Q-learning into evolutionary game dynamics, where agents pursue optimal action on the basis of the introspectiveness rather than the outward manner such as the birth–death or imitation processes in the traditional evolutionary game (EG). We investigate the cooperation prevalence numerically for a general \(2\times 2\) game setting. We find that the cooperation prevalence in the multi-agent AI is unexpectedly of equal level as in the traditional EG in most cases. However, in the snowdrift games with RL, we reveal that explosive cooperation appears in the form of periodic oscillation, and we study the impact of the payoff structure on its emergence. Finally, we show that the periodic oscillation can also be observed in some other EGs with the RL algorithm, such as the rock–paper–scissors game. Our results offer a reference point to understand the emergence of cooperation and oscillatory behaviors in nature and society from AI’s perspective.
Similar content being viewed by others
References
Greig, D., Travisano, M.: The prisoner’s dilemma and polymorphism in yeast suc genes. Proc. R. Soc. Lond. B Biol. Sci. 271(Suppl 3), S25–S26 (2004)
Nowak, M.A.: Evolutionary Dynamics. Harvard University Press, Cambridge (2006)
West, S.A., Griffin, A.S., Gardner, A.: Evolutionary explanations for cooperation. Curr. Biol. 17(16), R661–R672 (2007)
Craig Maclean, R., Brandon, C.: Stable public goods cooperation and dynamic social interactions in yeast. J. Evol. Biol. 21(6), 1836–1843 (2008)
Granovetter, M.: Threshold models of collective behavior. Am. J. Sociol. 83(6), 1420–1443 (1978)
Hamilton, W.D.: The genetical evolution of social behaviour. ii. J. Theor. Biol. 7(1), 17–52 (1964)
Bourke, A.F.: Principles of Social Evolution. Oxford University Press, Oxford (2011)
Smith, J.M., Price, G.R.: The logic of animal conflict. Nature 246(5427), 15 (1973)
Lee, D.: Game theory and neural basis of social decision making. Nat. Neurosci. 11(4), 404 (2008)
Sanfey, A.G.: Social decision-making: insights from game theory and neuroscience. Science 318(5850), 598–602 (2007)
Zomorrodi, A.R., Segrè, D.: Genome-driven evolutionary game theory helps understand the rise of metabolic interdependencies in microbial communities. Nat. Commun. 8(1), 1563 (2017)
Xu, X., Chen, Z., Si, G., Hu, X., Jiang, Y., Xu, X.: The chaotic dynamics of the social behavior selection networks in crowd simulation. Nonlinear Dyn. 64(1–2), 117–126 (2011)
Trivers, R.L.: The evolution of reciprocal altruism. Q. Rev. Biol. 46(1), 35–57 (1971)
Van Veelen, M., García, J., Rand, D.G., Nowak, M.A.: Direct reciprocity in structured populations. Proc. Natl. Acad. Sci. 109(25), 9929–9934 (2012)
Rand, D.G., Ohtsuki, H., Nowak, M.A.: Direct reciprocity with costly punishment: generous tit-for-tat prevails. J. Theor. Biol. 256(1), 45–57 (2009)
Nowak, M.A., Sigmund, K.: Evolution of indirect reciprocity. Nature 437(7063), 1291 (2005)
Panchanathan, K., Boyd, R.: Indirect reciprocity can stabilize cooperation without the second-order free rider problem. Nature 432(7016), 499 (2004)
Rockenbach, B., Milinski, M.: The efficient interaction of indirect reciprocity and costly punishment. Nature 444(7120), 718 (2006)
Szabó, G., Fath, G.: Evolutionary games on graphs. Phys. Rep. 446(4–6), 97–216 (2007)
Ohtsuki, H., Hauert, C., Lieberman, E., Nowak, M.A.: A simple rule for the evolution of cooperation on graphs and social networks. Nature 441(7092), 502 (2006)
Rapp, P.E.: Why are so many biological systems periodic? Prog. Neurobiol. 29(3), 261–273 (1987)
Solé, R.V., Miramontes, O., Goodwin, B.C.: Oscillations and chaos in ant societies. J. Theor. Biol. 161(3), 343–357 (1993)
Sumpter, D.J.: Collective Animal Behavior. Princeton University Press, Princeton (2010)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Malaysia (2016)
Mitchell, R.S., Michalski, J.G., Carbonell, T.M.: An Artificial Intelligence Approach. Springer, Berlin (2013)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Nasrabadi, N.M.: Pattern recognition and machine learning. J. Electron. Imaging 16(4), 049901 (2007)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in Neural Information Processing Systems, pp. 1799–1807 (2014)
Cruz, J.A., Wishart, D.S.: Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2, 117693510600200030 (2006)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Brown, N., Sandholm, T.: Superhuman ai for heads-up no-limit poker: libratus beats top professionals. Science 359(6374), 418–424 (2018)
Parkes, D.C., Wellman, M.P.: Economic reasoning and artificial intelligence. Science 349(6245), 267–272 (2015)
Tian, J., Gu, H.: Anomaly detection combining one-class svms and particle swarm optimization algorithms. Nonlinear Dyn. 61(1–2), 303–310 (2010)
Jin, X., Shao, J., Zhang, X., An, W., Malekian, R.: Modeling of nonlinear system based on deep learning framework. Nonlinear Dyn. 84(3), 1327–1340 (2016)
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Busoniu, L., Babuska, R., De Schutter, B., Ernst, D.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton (2010)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming, vol. 2050. Princeton University Press, Princeton (2015)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI, vol. 2, Phoenix, AZ, p. 5 (2016)
Van Hasselt, H.: Double q-learning. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2010)
Cao, M., Morse, A.S., Anderson, B.D.: Coordination of an asynchronous multi-agent system via averaging. IFAC Proc. 38(1), 17–22 (2005)
Zeng, H.-L., Alava, M., Aurell, E., Hertz, J., Roudi, Y.: Maximum likelihood reconstruction for ising models with asynchronous updates. Phys. Rev. Lett. 110(21), 210601 (2013)
Stewart, A.J., Plotkin, J.B.: Collapse of cooperation in evolving games. Proc. Natl. Acad. Sci. 111(49), 17558–17563 (2014)
Zhang, S.-P., Zhang, J.-Q., Huang, Z.-G., Guo, B.-H., Wu, Z.-X., Wang, J.: Collective behavior of artificial intelligence population: transition from optimization to game. Nonlinear Dyn. 1–11 (2019)
Acknowledgements
Si-Ping Zhang is supported by grants from the National Natural Science Foundation of China (Grant Nos. 11975178, 61431012). Li Chen and Ji-Qiang Zhang are supported by the National Natural Science Foundation of China under Grants No. 61703257.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhang, SP., Zhang, JQ., Chen, L. et al. Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning. Nonlinear Dyn 99, 3301–3312 (2020). https://doi.org/10.1007/s11071-019-05398-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11071-019-05398-4