Journal of Global Optimization

, Volume 73, Issue 1, pp 171–192 | Cite as

A Bayesian optimization approach to find Nash equilibria

  • Victor PichenyEmail author
  • Mickael Binois
  • Abderrahmane Habbal


Game theory finds nowadays a broad range of applications in engineering and machine learning. However, in a derivative-free, expensive black-box context, very few algorithmic solutions are available to find game equilibria. Here, we propose a novel Gaussian-process based approach for solving games in this context. We follow a classical Bayesian optimization framework, with sequential sampling decisions based on acquisition functions. Two strategies are proposed, based either on the probability of achieving equilibrium or on the stepwise uncertainty reduction paradigm. Practical and numerical aspects are discussed in order to enhance the scalability and reduce computation time. Our approach is evaluated on several synthetic game problems with varying number of players and decision space dimensions. We show that equilibria can be found reliably for a fraction of the cost (in terms of black-box evaluations) compared to classical, derivative-based algorithms. The method is available in the R package GPGame available on CRAN at


Game theory Gaussian processes Stepwise uncertainty reduction 



The authors acknowledge inspiration from Lorentz Center Workshop “SAMCO-Surrogate Model Assisted Multicriteria Optimization”, at Leiden University Feb 29–March 4, 2016. Mickal Binois is grateful for support from National Science Foundation Grant DMS-1521702.

Supplementary material


  1. 1.
    Adams, R.A., Fournier, J.J.: Sobolev Spaces, vol. 140. Academic Press, Cambridge (2003)zbMATHGoogle Scholar
  2. 2.
    Álvarez, M.A., Rosasco, L., Lawrence, N.D.: Kernels for vector-valued functions: a review. Found. Trend Mach. Learn. 4(3), 195–266 (2011). zbMATHGoogle Scholar
  3. 3.
    Azzalini, A., Genz, A.: The R package mnormt: the multivariate normal and \(t\) distributions (version 1.5–4). (2016). Accessed 8 Mar 2016
  4. 4.
    Başar, T.: Relaxation techniques and asynchronous algorithms for on-line computation of noncooperative equilibria. J. Econ. Dyn. Control. 11(4), 531–549 (1987)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Bect, J., Ginsbourger, D., Li, L., Picheny, V., Vazquez, E.: Sequential design of computer experiments for the estimation of a probability of failure. Stat. Comput. 22(3), 773–793 (2012)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Bect, J., Bachoc, F., Ginsbourger, D.: A supermartingale approach to Gaussian process based sequential design of experiments. arXiv preprint arXiv:1608.01118 (2016)
  7. 7.
    Brown, N., Ganzfried, S., Sandholm, T.: Hierarchical abstraction, distributed equilibrium computation, and post-processing, with application to a champion no-limit Texas hold’em agent. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp. 7–15 (2015)Google Scholar
  8. 8.
    Chevalier, C., Ginsbourger, D.: Fast computation of the multi-points expected improvement with applications in batch selection. In: Learning and Intelligent Optimization, Springer, pp. 59–69 (2013)Google Scholar
  9. 9.
    Chevalier, C., Emery, X., Ginsbourger, D.: Fast update of conditional simulation ensembles. Math. Geosci. 47(7), 771–789 (2015)zbMATHGoogle Scholar
  10. 10.
    Cressie, N.: Statistics for spatial data. Terra Nova 4(5), 613–617 (1992)Google Scholar
  11. 11.
    Dorsch, D., Jongen, H.T., Shikhman, V.: On structure and computation of generalized nash equilibria. SIAM J. Optim. 23(1), 452–474 (2013)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Facchinei, F., Kanzow, C.: Generalized nash equilibrium problems. Annal. Oper. Res. 175(1), 177–211 (2010)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Fleuret, F., Geman, D.: Graded learning for object detection. In: Proceedings of the Workshop on Statistical and Computational Theories of Vision of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR/SCTV), vol. 2 (1999)Google Scholar
  14. 14.
    Friedman, A.: Stochastic differential games. J. Differ. Equ. 11(1), 79–108 (1972)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Games, I.L.S.C.: Lenient learning in independent-learner stochastic cooperative games. J. Mach. Learn. Res. 17, 1–42 (2016)MathSciNetGoogle Scholar
  16. 16.
    Garivier, A., Kaufmann, E., Koolen, W. M.: Maximin action identification: a new bandit framework for games. In: 29th Annual Conference on Learning Theory, pp. 1028–1050 (2016)Google Scholar
  17. 17.
    Genz, A., Bretz, F.: Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics. Springer, Heidelberg (2009)zbMATHGoogle Scholar
  18. 18.
    Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: Multivariate normal and t Distributions., r package version 1.0–5 (2016). Accessed 2 Feb 2016
  19. 19.
    Gibbons, R.: Game Theory for Applied Economists. Princeton University Press, Princeton (1992)Google Scholar
  20. 20.
    Ginsbourger, D., Le Riche, R.: Towards Gaussian process-based optimization with finite time horizon. In: mODa9–Advances in Model-Oriented Design and Analysis, Springer, pp. 89–96 (2010)Google Scholar
  21. 21.
    Gonzalez, J., Osborne, M., Lawrence, N.: Glasses: relieving the myopia of Bayesian optimisation. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 790–799 (2016)Google Scholar
  22. 22.
    Gramacy, R.B., Apley, D.W.: Local gaussian process approximation for large computer experiments. J. Comput. Graph. Stat. 24(2), 561–578 (2015)MathSciNetGoogle Scholar
  23. 23.
    Gramacy, R.B., Ludkovski, M.: Sequential design for optimal stopping problems. SIAM J. Financ. Math. 6(1), 748–775 (2015)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Habbal, A., Kallel, M.: Neumann–Dirichlet Nash strategies for the solution of elliptic Cauchy problems. SIAM J. Control Optim. 51(5), 4066–4083 (2013). MathSciNetzbMATHGoogle Scholar
  25. 25.
    Habbal, A., Petersson, J., Thellner, M.: Multidisciplinary topology optimization solved as a Nash game. Int. J. Numer. Methods Eng. 61, 949–963 (2004)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Harsanyi, J.C.: Games with randomly disturbed payoffs: a new rationale for mixed-strategy equilibrium points. Int. J. Game Theory 2(1), 1–23 (1973)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Heaton, M. J., Datta, A., Finley, A., Furrer, R., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D., Katzfuss, M., Lindgren, F., et al.: A case study competition among methods for analyzing large spatial data. arXiv preprint arXiv:1710.05013 (2017)
  28. 28.
    Hecht, F., Pironneau, O., Le Hyaric, A., Ohtsuka, K.: Freefem++ v. 2.11. Users? Manual University of Paris 6 (2010)Google Scholar
  29. 29.
    Hennig, P., Schuler, C.J.: Entropy search for information-efficient global optimization. J. Mach. Learn. Res. 13, 1809–1837 (2012)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Hernández-Lobato, J.M., Hoffman, M.W., Ghahramani, Z.: Predictive entropy search for efficient global optimization of black-box functions. In: Advances in neural information processing systems, pp. 918–926 (2014)Google Scholar
  31. 31.
    Hernández-Lobato, J.M., Gelbart, M.A., Adams, R.P., Hoffman, M.W., Ghahramani, Z.: A general framework for constrained bayesian optimization using information-based search. J. Mach. Learn. Res. 17(160), 1–53 (2016)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Hu, J., Wellman, M.P.: Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4, 1039–1069 (2003)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Isaacs, R.: Differential Games. A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. Wiley, New York (1965)zbMATHGoogle Scholar
  34. 34.
    Jala, M., Lévy-Leduc, C., Moulines, É., Conil, E., Wiart, J.: Sequential design of computer experiments for the assessment of fetus exposure to electromagnetic fields. Technometrics 58(1), 30–42 (2016)MathSciNetGoogle Scholar
  35. 35.
    Johanson, M., Bowling, M.H.: Data biased robust counter strategies. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 264–271 (2009)Google Scholar
  36. 36.
    Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13(4), 455–492 (1998)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Kanzow, C., Steck, D.: Augmented lagrangian methods for the solution of generalized nash equilibrium problems. SIAM J. Optim. 26(4), 2034–2058 (2016)MathSciNetzbMATHGoogle Scholar
  38. 38.
    Lanctot, M., Burch, N., Zinkevich, M., Bowling, M., Gibson, R.G.: No-regret learning in extensive-form games with imperfect recall. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp. 65–72 (2012)Google Scholar
  39. 39.
    León, E.R., Pape, A.L., Désidéri, J.A., Alfano, D., Costes, M.: Concurrent aerodynamic optimization of rotor blades using a nash game method. J. Am. Helicopter Soc. 61, 1–13 (2014)Google Scholar
  40. 40.
    Li, S., Başar, T.: Distributed algorithms for the computation of noncooperative equilibria. Autom. J. IFAC 23(4), 523–533 (1987)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Littman, M.L., Stone, P.: A polynomial-time nash equilibrium algorithm for repeated games. Decis. Support Syst. 39(1), 55–66 (2005)Google Scholar
  42. 42.
    McKay, M.D., Beckman, R.J., Conover, W.J.: Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2), 239–245 (1979)MathSciNetzbMATHGoogle Scholar
  43. 43.
    Mockus, J.: Bayesian Approach to Global Optimization: Theory and Applications. Springer, Berlin (1989)zbMATHGoogle Scholar
  44. 44.
    Neyman, A., Sorin, S.: Stochastic Games and Applications, vol. 570. Springer, Berlin (2003)zbMATHGoogle Scholar
  45. 45.
    Nishimura, R., Hayashi, S., Fukushima, M.: Robust nash equilibria in n-person non-cooperative games: uniqueness and reformulation. Pac. J. Optim. 5(2), 237–259 (2009)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Parr, J. M.: Improvement Criteria for Constraint Handling and Multiobjective Optimization. Ph.D thesis, University of Southampton (2012)Google Scholar
  47. 47.
    Picheny, V.: A stepwise uncertainty reduction approach to constrained global optimization. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, JMLR W&CP, vol 33, pp. 787–795 (2014)Google Scholar
  48. 48.
    Picheny, V., Binois, M.: GPGame: solving complex game problems using Gaussian processes. URL, r package version 0.1.3 (2017)
  49. 49.
    Plumlee, M.: Fast prediction of deterministic functions using sparse grid experimental designs. J. Am. Stat. Assoc. 109(508), 1581–1591 (2014)MathSciNetzbMATHGoogle Scholar
  50. 50.
    R Core Team (2016) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. Accessed 15 Mar 2018
  51. 51.
    Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. MIT Press. (2006)
  52. 52.
    Rosenmüller, J.: On a generalization of the lemke-howson algorithm to noncooperative n-person games. SIAM J. Appl. Math. 21(1), 73–79 (1971)MathSciNetzbMATHGoogle Scholar
  53. 53.
    Roustant, O., Ginsbourger, D., Deville, Y.: DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization. J. Stat. Softw. 51(1), 1–55 (2012)Google Scholar
  54. 54.
    Rullière, D., Durrande, N., Bachoc, F., Chevalier, C.: Nested kriging predictions for datasets with a large number of observations. Stat. Comput. 28, 1–19 (2016)MathSciNetzbMATHGoogle Scholar
  55. 55.
    Scilab Enterprises (2012) Scilab: Free and Open Source Software for Numerical Computation. Scilab Enterprises, Orsay. Accessed 1 Apr 2015
  56. 56.
    Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the human out of the loop: a review of bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)Google Scholar
  57. 57.
    Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39(10), 1095–1100 (1953)MathSciNetzbMATHGoogle Scholar
  58. 58.
    Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.: Information-theoretic regret bounds for gaussian process optimization in the bandit setting. Inf. Theory IEEE Trans. 58(5), 3250–3265 (2012)MathSciNetzbMATHGoogle Scholar
  59. 59.
    Uryas’ev, S., Rubinstein, R.Y.: On relaxation algorithms in computation of noncooperative equilibria. IEEE Trans. Autom. Control 39(6), 1263–1267 (1994)MathSciNetzbMATHGoogle Scholar
  60. 60.
    Villemonteix, J., Vazquez, E., Walter, E.: An informational approach to the global optimization of expensive-to-evaluate functions. J. Glob. Optim. 44(4), 509–534 (2009)MathSciNetzbMATHGoogle Scholar
  61. 61.
    Wagner, T., Emmerich, M., Deutz, A., Ponweiser, W.: On expected-improvement criteria for model-based multi-objective optimization. In: International Conference on Parallel Problem Solving from Nature, Springer, Berlin. pp. 718–727 (2010)Google Scholar
  62. 62.
    Wang, G., Shan, S.: Review of metamodeling techniques in support of engineering design optimization. J. Mech. Des. 129(4), 370 (2007)Google Scholar
  63. 63.
    Wilson, A., Nickisch, H.: Kernel interpolation for scalable structured Gaussian processes (kiss-gp). In: International Conference on Machine Learning, pp. 1775–1784 (2015)Google Scholar
  64. 64.
    Žilinskas, A., Zhigljavsky, A.: Stochastic global optimization: a review on the occasion of 25 years of informatica. Informatica 27(2), 229–256 (2016)zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Victor Picheny
    • 1
    Email author
  • Mickael Binois
    • 2
  • Abderrahmane Habbal
    • 3
  1. 1.MIATUniversité de Toulouse, INRACastanet-TolosanFrance
  2. 2.The University of Chicago Booth School of BusinessChicagoUSA
  3. 3.Inria, CNRS, LJAD, UMR 7351Université Côte d’AzurParc Valrose, NiceFrance

Personalised recommendations