# Learning in games with continuous action sets and unknown payoff functions

- 307 Downloads
- 2 Citations

## Abstract

This paper examines the convergence of no-regret learning in games with continuous action sets. For concreteness, we focus on learning via “dual averaging”, a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then “mirror” the output back to their action sets. In terms of feedback, we assume that players can only estimate their payoff gradients up to a zero-mean error with bounded variance. To study the convergence of the induced sequence of play, we introduce the notion of *variational stability*, and we show that stable equilibria are locally attracting with high probability whereas globally stable equilibria are globally attracting with probability 1. We also discuss some applications to mixed-strategy learning in finite games, and we provide explicit estimates of the method’s convergence speed.

## Keywords

Continuous games Dual averaging Variational stability Fenchel coupling Nash equilibrium## Mathematics Subject Classification

Primary 91A26 90C15 Secondary 90C33 68Q32## References

- 1.Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control Optim.
**43**(2), 477–501 (2004)MathSciNetzbMATHGoogle Scholar - 2.Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput.
**8**(1), 121–164 (2012)MathSciNetzbMATHGoogle Scholar - 3.Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett.
**31**(3), 167–175 (2003)MathSciNetzbMATHGoogle Scholar - 4.Benaïm, M.: Dynamics of stochastic approximation algorithms. In: Azéma, J., Émery, M., Ledoux, M., Yor, M. (eds.) Séminaire de Probabilités XXXIII, Lecture Notes in Mathematics, vol. 1709, pp. 1–68. Springer, Berlin (1999)Google Scholar
- 5.Bervoets, S., Bravo, M., Faure, M.: Learning and convergence to Nash in network games with continuous action set. Working paper (2016)Google Scholar
- 6.Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc.
**362**(6), 3319–3363 (2010)zbMATHGoogle Scholar - 7.Bravo, M., Mertikopoulos, P.: On the robustness of learning in games with stochastically perturbed payoff observations. Games Econ. Behav.
**103**(John Nash Memorial issue), 41–66 (2017)MathSciNetzbMATHGoogle Scholar - 8.Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn.
**5**(1), 1–122 (2012)zbMATHGoogle Scholar - 9.Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim.
**3**(3), 538–543 (1993)MathSciNetzbMATHGoogle Scholar - 10.Cohen, J., Héliou, A., Mertikopoulos, P.: Hedging under uncertainty: regret minimization meets exponentially fast convergence. In: SAGT ’17: Proceedings of the 10th International Symposium on Algorithmic Game Theory (2017)Google Scholar
- 11.Coucheney, P., Gaujal, B., Mertikopoulos, P.: Penalty-regulated dynamics and robust learning procedures in games. Math. Oper. Res.
**40**(3), 611–633 (2015)MathSciNetzbMATHGoogle Scholar - 12.Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. 4OR
**5**(3), 173–210 (2007)MathSciNetzbMATHGoogle Scholar - 13.Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: SODA ’05: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005)Google Scholar
- 14.Hall, P., Heyde, C.C.: Martingale Limit Theory and Its Application. Probability and Mathematical Statistics. Academic Press, New York (1980)zbMATHGoogle Scholar
- 15.Hazan, E.: A survey: the convex optimization approach to regret minimization. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 287–304. MIT Press, Cambridge (2012)Google Scholar
- 16.Hofbauer, J., Sandholm, W.H.: On the global convergence of stochastic fictitious play. Econometrica
**70**(6), 2265–2294 (2002)MathSciNetzbMATHGoogle Scholar - 17.Hofbauer, J., Sandholm, W.H.: Stable games and their dynamics. J. Econ. Theory
**144**(4), 1665–1693 (2009)MathSciNetzbMATHGoogle Scholar - 18.Hofbauer, J., Schuster, P., Sigmund, K.: A note on evolutionarily stable strategies and game dynamics. J. Theor. Biol.
**81**(3), 609–612 (1979)Google Scholar - 19.Juditsky, A., Nemirovski, A.S., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst.
**1**(1), 17–58 (2011)MathSciNetzbMATHGoogle Scholar - 20.Kiwiel, K.C.: Free-steering relaxation methods for problems with strictly convex costs and linear constraints. Math. Oper. Res.
**22**(2), 326–349 (1997)MathSciNetzbMATHGoogle Scholar - 21.Laraki, R., Mertikopoulos, P.: Higher order game dynamics. J. Econ. Theory
**148**(6), 2666–2695 (2013)MathSciNetzbMATHGoogle Scholar - 22.Leslie, D.S., Collins, E.J.: Individual \(Q\)-learning in normal form games. SIAM J. Control Optim.
**44**(2), 495–514 (2005)MathSciNetzbMATHGoogle Scholar - 23.Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput.
**108**(2), 212–261 (1994)MathSciNetzbMATHGoogle Scholar - 24.Maynard Smith, J., Price, G.R.: The logic of animal conflict. Nature
**246**, 15–18 (1973)zbMATHGoogle Scholar - 25.McKelvey, R.D., Palfrey, T.R.: Quantal response equilibria for normal form games. Games Econ. Behav.
**10**(6), 6–38 (1995)MathSciNetzbMATHGoogle Scholar - 26.Mertikopoulos, P., Sandholm, W.H.: Learning in games via reinforcement and regularization. Math. Oper. Res.
**41**(4), 1297–1324 (2016)MathSciNetzbMATHGoogle Scholar - 27.Mertikopoulos, P., Papadimitriou, C.H., Piliouras, G.: Cycles in adversarial regularized learning. In: SODA ’18: Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (2018)Google Scholar
- 28.Monderer, D., Shapley, L.S.: Potential games. Games Econ. Behav.
**14**(1), 124–143 (1996)MathSciNetzbMATHGoogle Scholar - 29.Nemirovski, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)Google Scholar
- 30.Nemirovski, A.S., Juditsky, A., Lan, G.G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim.
**19**(4), 1574–1609 (2009)MathSciNetzbMATHGoogle Scholar - 31.Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Kluwer Academic Publishers, Dordrecht (2004)zbMATHGoogle Scholar
- 32.Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program.
**109**(2), 319–344 (2007)MathSciNetzbMATHGoogle Scholar - 33.Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program.
**120**(1), 221–259 (2009)MathSciNetzbMATHGoogle Scholar - 34.Neyman, A.: Correlated equilibrium and potential games. Int. J. Game Theory
**26**(2), 223–227 (1997)MathSciNetzbMATHGoogle Scholar - 35.Perkins, S., Leslie, D.S.: Asynchronous stochastic approximation with differential inclusions. Stoch. Syst.
**2**(2), 409–446 (2012)MathSciNetzbMATHGoogle Scholar - 36.Perkins, S., Mertikopoulos, P., Leslie, D.S.: Mixed-strategy learning with continuous action sets. IEEE Trans. Autom. Control
**62**(1), 379–384 (2017)MathSciNetzbMATHGoogle Scholar - 37.Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)zbMATHGoogle Scholar
- 38.Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)zbMATHGoogle Scholar
- 39.Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. A Series of Comprehensive Studies in Mathematics, vol. 317. Springer, Berlin (1998)zbMATHGoogle Scholar
- 40.Rosen, J.B.: Existence and uniqueness of equilibrium points for concave \({N}\)-person games. Econometrica
**33**(3), 520–534 (1965)MathSciNetzbMATHGoogle Scholar - 41.Sandholm, W.H.: Population games and deterministic evolutionary dynamics. In: Young, H.P., Zamir, S. (eds.) Handbook of Game Theory IV, pp. 703–778. Elsevier, Amsterdam (2015)Google Scholar
- 42.Scutari, G., Facchinei, F., Palomar, D.P., Pang, J.S.: Convex optimization, game theory, and variational inequality theory in multiuser communication systems. IEEE Signal Process. Mag.
**27**(3), 35–49 (2010)Google Scholar - 43.Shalev-Shwartz, S.: Online learning: theory, algorithms, and applications. Ph.D. thesis, Hebrew University of Jerusalem (2007)Google Scholar
- 44.Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trends Mach. Learn.
**4**(2), 107–194 (2011)zbMATHGoogle Scholar - 45.Shalev-Shwartz, S., Singer, Y.: Convex repeated games and Fenchel duality. In: Advances in Neural Information Processing Systems, vol. 19, pp. 1265–1272. MIT Press, Cambridge (2007)Google Scholar
- 46.Shiryaev, A.N.: Probability, 2nd edn. Springer, Berlin (1995)zbMATHGoogle Scholar
- 47.Sorin, S., Wan, C.: Finite composite games: equilibria and dynamics. J. Dyn. Games
**3**(1), 101–120 (2016)MathSciNetzbMATHGoogle Scholar - 48.Viossat, Y., Zapechelnyuk, A.: No-regret dynamics and fictitious play. J. Econ. Theory
**148**(2), 825–842 (2013)MathSciNetzbMATHGoogle Scholar - 49.Vovk, V.G.: Aggregating strategies. In: COLT ’90: Proceedings of the 3rd Workshop on Computational Learning Theory, pp. 371–383 (1990)Google Scholar
- 50.Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. J. Mach. Learn. Res.
**11**, 2543–2596 (2010)MathSciNetzbMATHGoogle Scholar - 51.Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: ICML ’03: Proceedings of the 20th International Conference on Machine Learning, pp. 928–936 (2003)Google Scholar