# Collective behavior of artificial intelligence population: transition from optimization to game

- 45 Downloads

## Abstract

Collective behavior in the resource allocation systems has attracted much attention, where the efficiency of the system is intimately depended on the self-organized processes of the multiple agents that composed the system. Nowadays, as artificial intelligence (AI) is adopted ubiquitously in decision making in various scenes, it becomes crucial and unavoidable to understand what would emerge in an multi-agent AI systems for resource allocation and how can we intervene the collective behavior there in the future, as we have experience of the possible unexpected outcomes that are induced by collective behavior. Here, we introduce the reinforcement learning (RL) algorithm into minority game (MG) dynamics, in which agents have learning ability based on one typical RL scheme, Q-learning. We investigate the dynamical behaviors of the system numerically and analytically for a different game setting, with combination of two different types of agents which mimic the diversified situations. It is found that through short-term training, the multi-agent AI system adopting Q-learning algorithm relaxes to the optimal solution of the game. Moreover, one striking phenomenon is the transition of interaction mechanism from self-organized *optimization* to *game* through tuning the fraction of RL agents \(\eta _{q}\). The critical curve for transition between the two mechanisms in phase diagram is obtained analytically. The adaptability of the AI agents population against the time-variable environment is also discussed. To gain further understanding of these phenomena, a theoretical framework with mean-field approximation is also developed. Our findings from the simplified multi-agent AI system may give new enlightenment to how the reconciliation and optimization can be breed in the coming era of AI.

## Keywords

Self-organized processes Resource allocation Artificial intelligence Minority game Reinforcement learning## Notes

### Acknowledgements

We thank Prof. Ying-Cheng Lai, Richong Zhang, Liang Huang and Dr. Xu-sheng Liu for helpful discussions. This work was supported by NSFC Nos. 11275003, 11575072, 61431012 and 11475074, the Science and Technology Coordination Innovation Project of Shaanxi Province (2016KTCQ01-45), and the Fundamental Research Funds for the Central Universities No. lzujbky-2016-123. ZGH gratefully acknowledges the support of K. C. Wong Education Foundation.

### Compliance with ethical standards

### Conflicts of interest

The authors declare that they have no conflict of interest.

## References

- 1.Kauffman, S.A.: The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press, Oxford (1993)Google Scholar
- 2.Levin, S.A.: Ecosystems and the biosphere as complex adaptive systems. Ecosystems
**1**(5), 431–436 (1998)CrossRefGoogle Scholar - 3.Brian Arthur, W., Durlauf, S.N., Lane, D.A.: The Economy as an Evolving Complex System II, vol. 28. Addison-Wesley, Reading (1997)Google Scholar
- 4.Nowak, M.A., Page, K.M., Sigmund, K.: Fairness versus reason in the ultimatum game. Science
**289**(5485), 1773–1775 (2000)CrossRefGoogle Scholar - 5.Roca, C.P., Cuesta, J.A., Sánchez, A.: Effect of spatial structure on the evolution of cooperation. Phys. Rev. E
**80**(4), 046106 (2009)CrossRefGoogle Scholar - 6.Press, W.H., Dyson, F.J.: Iterated prisoner dilemma contains strategies that dominate any evolutionary opponent. Proc. Natl. Acad. Sci. (UDA)
**109**(26), 10409–10413 (2012)CrossRefGoogle Scholar - 7.Challet, D., Zhang, Y.-C.: Emergence of cooperation and organization in an evolutionary game. arXiv preprint adap-org/9708006, (1997)Google Scholar
- 8.Arthur, W.B.: Inductive reasoning and bounded rationality. Am. Econ. Rev.
**84**(2), 406–411 (1994)Google Scholar - 9.Challet, D., Marsili, M.: Phase transition and symmetry breaking in the minority game. Phys. Rev. E
**60**(6), R6271 (1999)CrossRefGoogle Scholar - 10.Savit, R., Manuca, R., Riolo, R.: Adaptive competition, market efficiency, and phase transitions. Phys. Rev. Lett.
**82**(10), 2203 (1999)CrossRefGoogle Scholar - 11.Johnson, N.F., Hart, M., Hui, P.M.: Crowd effects and volatility in markets with competing agents. Physica A
**269**(1), 1–8 (1999)CrossRefGoogle Scholar - 12.Kalinowski, T., Schulz, H.-J., Birese, M.: Cooperation in the minority game with local information. Physica A
**277**, 502 (2000)CrossRefGoogle Scholar - 13.Paczuski, M., Bassler, K.E., Corral, Á.: Self-organized networks of competing boolean agents. Phys. Rev. Lett.
**84**(14), 3185 (2000)CrossRefGoogle Scholar - 14.Eguiluz, V.M., Zimmermann, M.G.: Transmission of information and herd behavior: an application to financial markets. Phys. Rev. Lett.
**85**(26), 5659 (2000)CrossRefGoogle Scholar - 15.Slanina, F.: Harms and benefits from social imitation. Physica A
**299**, 334 (2001)CrossRefGoogle Scholar - 16.Hart, M., Jefferies, P., Johnson, N.F., Hui, P.M.: Crowd-anticrowd theory of the minority game. Physica A
**298**(3), 537–544 (2001)CrossRefGoogle Scholar - 17.Marsili, M.: Market mechanism and expectations in minority and majority games. Physica A
**299**(1), 93–103 (2001)MathSciNetCrossRefGoogle Scholar - 18.Galstyan, A., Lerman, K.: Adaptive boolean networks and minority games with time-dependent capacities. Phys. Rev. E
**66**, 015103 (2002)CrossRefGoogle Scholar - 19.De Martino, A., Marsili, M., Mulet, R.: Adaptive drivers in a model of urban traffic. Europhys. Lett.
**65**(2), 283 (2004)CrossRefGoogle Scholar - 20.Anghel, M., Toroczkai, Z., Bassler, K.E., Korniss, G.: Competition-driven network dynamics: emergence of a scale-free leadership structure and collective efficiency. Phys. Rev. Lett.
**92**, 058701 (2004)CrossRefGoogle Scholar - 21.Lo, T.S., Chan, H.Y., Hui, P.M., Johnson, N.F.: Theory of networked minority games based on strategy pattern dynamics. Phys. Rev. E
**70**, 056102 (2004)CrossRefGoogle Scholar - 22.Moro, E.: Advances in Condensed Matter and Statistical Physics, Chapter the Minority Games: An Introductory Guide. Nova Science Publishers, New York (2004)Google Scholar
- 23.Xie, Y.B., Hu, C.-K., Wang, B.H., Zhou, T.: Global optimization of minority game by intelligent agents. Eur. Phys. J. B
**47**, 587 (2005)CrossRefGoogle Scholar - 24.Zhong, L.-X., Zheng, D.-F., Zheng, B., Hui, P.M.: Effects of contrarians in the minority game. Phys. Rev. E
**72**, 026134 (2005)CrossRefGoogle Scholar - 25.Zhou, T., Wang, B.-H., Zhou, P.-L., Yang, C.-X., Liu, J.: Self-organized boolean game on networks. Phys. Rev. E
**72**(4), 046139 (2005)CrossRefGoogle Scholar - 26.Challet, D., Marsili, M., Zhang, Y.-C.: Minority Games. Oxford Finance, Oxford University Press, Oxford (2005)zbMATHGoogle Scholar
- 27.Lo, T.S., Chan, K.P., Hui, P.M., Johnson, N.F.: Theory of enhanced performance emerging in a sparsely connected competitive population. Phys. Rev. E
**71**, 050101 (2005)CrossRefGoogle Scholar - 28.Borghesi, C., Marsili, M., Miccichè, S.: Emergence of time-horizon invariant correlation structure in financial returns by subtraction of the market mode. Phys. Rev. E
**76**, 026104 (2007)MathSciNetCrossRefGoogle Scholar - 29.Challet, D., De Martino, A., Marsili, M.: Dynamical instabilities in a simple minority game with discounting. J. Stat. Mech. Theory Exp
**2008**(4), L04004 (2008)CrossRefGoogle Scholar - 30.Yeung, C.H., Zhang, Y.C.: Minority games. In: Meyers, R.A. (ed.) Encyclopedia of Complexity and Systems Science, pp. 5588–5604. Springer, New York (2009)CrossRefGoogle Scholar
- 31.Bianconi, G., De Martino, A., Ferreira, F.F., Marsili, M.: Multi-asset minority games. Quant. Finance
**8**(3), 225–231 (2008)MathSciNetCrossRefGoogle Scholar - 32.Huang, Z.-G., Zhang, J.-Q., Dong, J.-Q., Huang, L., Lai, Y.-C.: Emergence of grouping in multi-resource minority game dynamics. Sci. Rep.
**2**, 703 (2012)CrossRefGoogle Scholar - 33.Zhang, J.-Q., Huang, Z.-G., Dong, J.-Q., Huang, L., Lai, Y.-C.: Controlling collective dynamics in complex minority-game resource-allocation systems. Phys. Rev. E
**87**, 052808 (2013)CrossRefGoogle Scholar - 34.Dong, J.-Q., Huang, Z.-G., Huang, L., Lai, Y.-C.: Triple grouping and period-three oscillations in minority-game dynamics. Phys. Rev. E
**90**(6), 062917 (2014)CrossRefGoogle Scholar - 35.Zhang, J.-Q., Huang, Z.-G., Wu, Z.-X., Su, R.-Q., Lai, Y.-C.: Controlling herding in minority game systems. Sci. Rep.
**6**, 20925 (2016)CrossRefGoogle Scholar - 36.Das, R., Wales, D.J.: Energy landscapes for a machine-learning prediction of patient discharge. Phys. Rev. E
**93**, 063310 (2016)CrossRefGoogle Scholar - 37.Kim, B.-J., Kim, S.-H.: Prediction of inherited genomic susceptibility to 20 common cancer types by a supervised machine-learning method. Proc. Natl. Acad. Sci. (UDA)
**115**(6), 1322–1327 (2018)CrossRefGoogle Scholar - 38.Singh, S., Okun, A., Jackson, A.: Artificial intelligence: learning to play go from scratch. Nature
**550**(2), 336–337 (2017)Google Scholar - 39.Murray Campbell, A., Joseph Hoane, A., Hsu, F.H.: Deep blue. Artif. Intell.
**134**(1), 57–83 (2002)CrossRefGoogle Scholar - 40.Gebru, T., Krause, J., Wang, Y., Chen, D., Deng, J., Aiden, E.L., Li, F.-F.: Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States. Proc. Natl. Acad. Sci. (UDA)
**114**(50), 13108–13113 (2017)CrossRefGoogle Scholar - 41.Naik, N., Kominers, S.D., Raskar, R., Glaeser, E.L., Hidalgo, C.A.: Computer vision uncovers predictors of physical urban change. Proc. Natl. Acad. Sci. (UDA)
**114**(29), 7571–7576 (2017)CrossRefGoogle Scholar - 42.Blumenstock, J., Cadamuro, G., On, R.: Predicting poverty and wealth from mobile phone metadata. Science
**350**(6264), 1073–1076 (2015)CrossRefGoogle Scholar - 43.Dia, H., Panwai, S.: Modelling drivers’ compliance and route choice behaviour in response to travel information. Nonlinear Dynam
**49**(4), 493–509 (2007)CrossRefGoogle Scholar - 44.Li, D.-J., Tang, L., Liu, Y.-J.: Adaptive intelligence learning for nonlinear chaotic systems. Nonlinear Dyn.
**73**(4), 2103–2109 (2013)MathSciNetCrossRefGoogle Scholar - 45.Kianercy, A., Galstyan, A.: Coevolutionary networks of reinforcement-learning agents. Phys. Rev. E
**88**, 012815 (2013)CrossRefGoogle Scholar - 46.Zhang, S.-P., Dong, J.Q., Liu, L., Huang, Z.-G., Huang, L., Lai. Y.-C.: Artificial intelligence meets minority game: toward optimal resource allocation.
*ArXiv e-prints*, (2018)Google Scholar - 47.Barto, A.G., Sutton, R.S.: Reinforcement Learning: An Introduction, vol. 21. The MIT press, Cambridge (1998)Google Scholar
- 48.Bellman, R.E.: Dynamic Programing. Princeton University Press, Princeton (1957)Google Scholar
- 49.Sutton, R.S.: Learning top redict by the methods of temporal difference. Mach. Learn.
**3**, 9–44 (1998)Google Scholar - 50.Watkins, C.J.C.: Learning from delayed rewards. Ph.D. thesis Cambridge University, (1989)Google Scholar
- 51.Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn.
**8**, 279–292 (1992)zbMATHGoogle Scholar - 52.Potapov, A., Ali, M.K.: Convergence of reinforcement learning algorithms and acceleration of learning. Phys. Rev. E
**67**, 026706 (2003)CrossRefGoogle Scholar - 53.Sato, Y., Crutchfield, J.P.: Coupled replicator equations for the dynamics of learning in multiagent systems. Phys. Rev. E
**67**, 015206 (2003)CrossRefGoogle Scholar - 54.Kianercy, A., Galstyan, A.: Dynamics of boltzmann \(q\) learning in two-player two-action games. Phys. Rev. E
**85**, 041145 (2012)CrossRefGoogle Scholar