Advertisement

Mathematical Programming

, Volume 173, Issue 1–2, pp 465–507 | Cite as

Learning in games with continuous action sets and unknown payoff functions

  • Panayotis MertikopoulosEmail author
  • Zhengyuan Zhou
Full Length Paper Series A

Abstract

This paper examines the convergence of no-regret learning in games with continuous action sets. For concreteness, we focus on learning via “dual averaging”, a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then “mirror” the output back to their action sets. In terms of feedback, we assume that players can only estimate their payoff gradients up to a zero-mean error with bounded variance. To study the convergence of the induced sequence of play, we introduce the notion of variational stability, and we show that stable equilibria are locally attracting with high probability whereas globally stable equilibria are globally attracting with probability 1. We also discuss some applications to mixed-strategy learning in finite games, and we provide explicit estimates of the method’s convergence speed.

Keywords

Continuous games Dual averaging Variational stability Fenchel coupling Nash equilibrium 

Mathematics Subject Classification

Primary 91A26 90C15 Secondary 90C33 68Q32 

References

  1. 1.
    Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control Optim. 43(2), 477–501 (2004)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Benaïm, M.: Dynamics of stochastic approximation algorithms. In: Azéma, J., Émery, M., Ledoux, M., Yor, M. (eds.) Séminaire de Probabilités XXXIII, Lecture Notes in Mathematics, vol. 1709, pp. 1–68. Springer, Berlin (1999)Google Scholar
  5. 5.
    Bervoets, S., Bravo, M., Faure, M.: Learning and convergence to Nash in network games with continuous action set. Working paper (2016)Google Scholar
  6. 6.
    Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)zbMATHGoogle Scholar
  7. 7.
    Bravo, M., Mertikopoulos, P.: On the robustness of learning in games with stochastically perturbed payoff observations. Games Econ. Behav. 103(John Nash Memorial issue), 41–66 (2017)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5(1), 1–122 (2012)zbMATHGoogle Scholar
  9. 9.
    Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Cohen, J., Héliou, A., Mertikopoulos, P.: Hedging under uncertainty: regret minimization meets exponentially fast convergence. In: SAGT ’17: Proceedings of the 10th International Symposium on Algorithmic Game Theory (2017)Google Scholar
  11. 11.
    Coucheney, P., Gaujal, B., Mertikopoulos, P.: Penalty-regulated dynamics and robust learning procedures in games. Math. Oper. Res. 40(3), 611–633 (2015)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. 4OR 5(3), 173–210 (2007)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: SODA ’05: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005)Google Scholar
  14. 14.
    Hall, P., Heyde, C.C.: Martingale Limit Theory and Its Application. Probability and Mathematical Statistics. Academic Press, New York (1980)zbMATHGoogle Scholar
  15. 15.
    Hazan, E.: A survey: the convex optimization approach to regret minimization. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 287–304. MIT Press, Cambridge (2012)Google Scholar
  16. 16.
    Hofbauer, J., Sandholm, W.H.: On the global convergence of stochastic fictitious play. Econometrica 70(6), 2265–2294 (2002)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Hofbauer, J., Sandholm, W.H.: Stable games and their dynamics. J. Econ. Theory 144(4), 1665–1693 (2009)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Hofbauer, J., Schuster, P., Sigmund, K.: A note on evolutionarily stable strategies and game dynamics. J. Theor. Biol. 81(3), 609–612 (1979)Google Scholar
  19. 19.
    Juditsky, A., Nemirovski, A.S., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Kiwiel, K.C.: Free-steering relaxation methods for problems with strictly convex costs and linear constraints. Math. Oper. Res. 22(2), 326–349 (1997)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Laraki, R., Mertikopoulos, P.: Higher order game dynamics. J. Econ. Theory 148(6), 2666–2695 (2013)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Leslie, D.S., Collins, E.J.: Individual \(Q\)-learning in normal form games. SIAM J. Control Optim. 44(2), 495–514 (2005)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Maynard Smith, J., Price, G.R.: The logic of animal conflict. Nature 246, 15–18 (1973)zbMATHGoogle Scholar
  25. 25.
    McKelvey, R.D., Palfrey, T.R.: Quantal response equilibria for normal form games. Games Econ. Behav. 10(6), 6–38 (1995)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Mertikopoulos, P., Sandholm, W.H.: Learning in games via reinforcement and regularization. Math. Oper. Res. 41(4), 1297–1324 (2016)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Mertikopoulos, P., Papadimitriou, C.H., Piliouras, G.: Cycles in adversarial regularized learning. In: SODA ’18: Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (2018)Google Scholar
  28. 28.
    Monderer, D., Shapley, L.S.: Potential games. Games Econ. Behav. 14(1), 124–143 (1996)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Nemirovski, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)Google Scholar
  30. 30.
    Nemirovski, A.S., Juditsky, A., Lan, G.G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Kluwer Academic Publishers, Dordrecht (2004)zbMATHGoogle Scholar
  32. 32.
    Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program. 109(2), 319–344 (2007)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Neyman, A.: Correlated equilibrium and potential games. Int. J. Game Theory 26(2), 223–227 (1997)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Perkins, S., Leslie, D.S.: Asynchronous stochastic approximation with differential inclusions. Stoch. Syst. 2(2), 409–446 (2012)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Perkins, S., Mertikopoulos, P., Leslie, D.S.: Mixed-strategy learning with continuous action sets. IEEE Trans. Autom. Control 62(1), 379–384 (2017)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)zbMATHGoogle Scholar
  38. 38.
    Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)zbMATHGoogle Scholar
  39. 39.
    Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. A Series of Comprehensive Studies in Mathematics, vol. 317. Springer, Berlin (1998)zbMATHGoogle Scholar
  40. 40.
    Rosen, J.B.: Existence and uniqueness of equilibrium points for concave \({N}\)-person games. Econometrica 33(3), 520–534 (1965)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Sandholm, W.H.: Population games and deterministic evolutionary dynamics. In: Young, H.P., Zamir, S. (eds.) Handbook of Game Theory IV, pp. 703–778. Elsevier, Amsterdam (2015)Google Scholar
  42. 42.
    Scutari, G., Facchinei, F., Palomar, D.P., Pang, J.S.: Convex optimization, game theory, and variational inequality theory in multiuser communication systems. IEEE Signal Process. Mag. 27(3), 35–49 (2010)Google Scholar
  43. 43.
    Shalev-Shwartz, S.: Online learning: theory, algorithms, and applications. Ph.D. thesis, Hebrew University of Jerusalem (2007)Google Scholar
  44. 44.
    Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trends Mach. Learn. 4(2), 107–194 (2011)zbMATHGoogle Scholar
  45. 45.
    Shalev-Shwartz, S., Singer, Y.: Convex repeated games and Fenchel duality. In: Advances in Neural Information Processing Systems, vol. 19, pp. 1265–1272. MIT Press, Cambridge (2007)Google Scholar
  46. 46.
    Shiryaev, A.N.: Probability, 2nd edn. Springer, Berlin (1995)zbMATHGoogle Scholar
  47. 47.
    Sorin, S., Wan, C.: Finite composite games: equilibria and dynamics. J. Dyn. Games 3(1), 101–120 (2016)MathSciNetzbMATHGoogle Scholar
  48. 48.
    Viossat, Y., Zapechelnyuk, A.: No-regret dynamics and fictitious play. J. Econ. Theory 148(2), 825–842 (2013)MathSciNetzbMATHGoogle Scholar
  49. 49.
    Vovk, V.G.: Aggregating strategies. In: COLT ’90: Proceedings of the 3rd Workshop on Computational Learning Theory, pp. 371–383 (1990)Google Scholar
  50. 50.
    Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res. 11, 2543–2596 (2010)MathSciNetzbMATHGoogle Scholar
  51. 51.
    Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: ICML ’03: Proceedings of the 20th International Conference on Machine Learning, pp. 928–936 (2003)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2018

Authors and Affiliations

  1. 1.CNRS, Inria, LIGUniv. Grenoble AlpesGrenobleFrance
  2. 2.Department of Electrical EngineeringStanford UniversityStanfordUSA

Personalised recommendations