Skip to main content
Log in

A unified stochastic approximation framework for learning in games

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

We develop a flexible stochastic approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite). The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, the exponential/multiplicative weights algorithm for learning in finite games, optimistic and bandit variants of the above, etc. In addition to providing an integrated view of these algorithms, our framework further allows us to obtain several new convergence results, both asymptotic and in finite time, in both continuous and finite games. Specifically, we provide a range of criteria for identifying classes of Nash equilibria and sets of action profiles that are attracting with high probability, and we also introduce the notion of coherence, a game-theoretic property that includes strict and sharp equilibria, and which leads to convergence in finite time. Importantly, our analysis applies to both oracle-based and bandit, payoff-based methods—that is, when players only observe their realized payoffs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The authors thank S. Sorin for proposing this definition.

  2. Strictly speaking, the regularizer \( x \log x - x\) is not strongly convex over \(\mathbb {R}_{+}\) but it is strongly convex over any bounded subset of \(\mathbb {R}_{+}\)—and it can be made strongly convex over all of \(\mathbb {R}_{+}\) by adding a small quadratic penalty of the form \(\varepsilon x^{2} / 2\). This issue does not change the essence of our results, so we sidestep the details.

  3. In some cases, the index set may be enlarged to include all positive half-integers (\(n= 1/2,1,3/2,\dotsc \)).

  4. This formulation of (SPSA) is tailored to unconstrained problems. In this case, to ensure that the resulting gradient estimator remains bounded, it is customary to include an indicator of the form \({{\,\mathrm{\mathbb {1}}\,}}( \Vert \hat{X}_{n}\Vert \le R_{n})\) for some suitably chosen sequence \(R_{n}\rightarrow \infty \) [64]. This would lead to the same analysis but at the cost of heavier notation so, instead, we will assume that the players’ payoff functions are bounded when discussing (SPSA). For a detailed discussion of how to adapt (SPSA) in the presence of constraints, we refer the reader to Bravo et al. [9] who show that the relevant entries of Table 1 apply verbatim when \({\mathcal {X}}\) is compact.

  5. To see this, let \({\mathcal {K}}= L_{c_{0}}^{+}(\Phi )\) be a convex upper level set of \(\Phi \) in \({{\,\textrm{ri}\,}}{\mathcal {X}}\). Then, for all \(c\le c_{0}\) and all x with \(\Phi ( x) = c\), the segment \( x + \tau (p- x)\), \(\tau \in [0,1]\), is contained in \(L_{c}^{+}(\Phi ) \supseteq L_{c_{0}}^{+}(\Phi )\), so the function \(\phi (\tau ) = \Phi ( x + \tau (p- x))\) cannot have \(\phi '(0) < 0\). This implies that \(0 \le \left\langle \nabla \Phi ( x) \right. ,\left. p- x\right\rangle = \left\langle v( x) \right. ,\left. p- x\right\rangle \) for all \( x\in {\mathcal {X}}\setminus {\mathcal {K}}\), i.e.,  \({\mathcal {G}}\) is subcoercive.

  6. We are grateful to V. Boone for pointing out this simple argument.

  7. That such a function exists is an exercise in the construction of aproximate identities, which we omit.

  8. A point \( y\in {\mathcal {Y}}\) is said to be attainable; by \(Y_{n}\) if, for every neighborhood \({\mathcal {W}}\) of y in \({\mathcal {Y}}\) and for all \(n\ge 1\), we have .

  9. For the general case, take \(E( y) = [h^{*}( y) - \inf h^{*}]^{-1}\).

  10. The case of mixed strategies dominated by mixed strategies requires heavier notation, so we do not treat it.

  11. As we explain in Appendix 1, the image \({{\,\textrm{im}\,}}Q\) of \(Q\) coincides with the prox-domain \({\mathcal {X}}_{h}= {{\,\textrm{dom}\,}}\partial h\) of \(h\). As such, a sufficient condition for \(Q\) to be surjective is for \(h\) to be Lipschitz continuous on \({\mathcal {X}}\).

References

  1. Arrow, K.J., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming. Stanford University Press, Redwood City (1958)

    Google Scholar 

  2. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science (1995)

  3. Azizian, W., Iutzeler, F., Malick, J., Mertikopoulos, P.: On the rate of convergence of Bregman proximal methods in constrained variational inequalities. arXiv:2211.08043 (2022)

  4. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, New York (2017)

    Google Scholar 

  5. Benaïm, M.: Vertex reinforced random walks and a conjecture of Pemantle. Ann. Probab. 25, 361–392 (1997)

    MathSciNet  Google Scholar 

  6. Benaïm, M.: Dynamics of stochastic approximation algorithms. In: Azéma, J., Émery, M., Ledoux, M., Yor, M. (eds.) Séminaire de Probabilités XXXIII. Lecture Notes in Mathematics, vol. 1709, pp. 1–68. Springer, Berlin (1999)

  7. Benaïm, M., Hirsch, M.W.: Asymptotic pseudotrajectories and chain recurrent flows, with applications. J. Dyn. Differ. Equ. 8(1), 141–176 (1996)

    MathSciNet  Google Scholar 

  8. Bervoets, S., Bravo, M., Faure, M.: Learning with minimal information in continuous games. Theor. Econ. 15, 1471–1508 (2020)

    MathSciNet  Google Scholar 

  9. Bravo, M., Leslie, D.S., Mertikopoulos, P.: Bandit learning in concave \({N}\)-person games. In: NeurIPS ’18: Proceedings of the 32nd International Conference of Neural Information Processing Systems (2018)

  10. Brown, G.W.: Iterative solutions of games by fictitious play. In: Coopmans, T.C. (ed.) Activity Analysis of Productions and Allocation, pp. 374–376. Wiley (1951)

  11. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)

    Google Scholar 

  12. Coucheney, P., Gaujal, B., Mertikopoulos, P.: Penalty-regulated dynamics and robust learning procedures in games. Math. Oper. Res. 40(3), 611–633 (2015)

    MathSciNet  Google Scholar 

  13. Daskalakis, C., Panageas, I.: Last-iterate convergence: Zero-sum games and constrained min–max optimization. In: ITCS ’19: Proceedings of the 10th Conference on Innovations in Theoretical Computer Science (2019)

  14. Daskalakis, C., Ilyas, A., Syrgkanis, V., Zeng, H.: Training GANs with optimism. In: ICLR ’18: Proceedings of the 2018 International Conference on Learning Representations (2018)

  15. Debreu, G.: A social equilibrium existence theorem. Proc. Natl. Acad. Sci. USA 38(10), 886–893 (1952)

    ADS  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  16. Duflo, M.: Cibles atteignables avec une probabilité positive d’après M. Benaïm, Mimeo (1997)

  17. Duvocelle, B., Mertikopoulos, P., Staudigl, M., Vermeulen, D.: Multi-agent online learning in time-varying games. Math. Oper. Res. 48(2), 914–941 (2023)

    MathSciNet  Google Scholar 

  18. Even-dar, E., Mansour, Y., Nadav, U.: On the convergence of regret minimization dynamics in concave games. In: STOC ’09: Proceedings of the 41st Annual ACM Symposium on the Theory of Computing, pp. 523–532. ACM, New York (2009)

  19. Flokas, L., Vlatakis-Gkaragkounis, E.V., Lianeas, T., Mertikopoulos, P., Piliouras, G.: No-regret learning and mixed Nash equilibria: they do not mix. In: NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems (2020)

  20. Giannou, A., Vlatakis-Gkaragkounis, E.V., Mertikopoulos, P.: The convergence rate of regularized learning in games: From bandits and uncertainty to optimism and beyond. In: NeurIPS ’21: Proceedings of the 35th International Conference on Neural Information Processing Systems (2021)

  21. Giannou, A., Lotidis, K., Mertikopoulos, P., Vlatakis-Gkaragkounis, E.V.: On the convergence of policy gradient methods to Nash equilibria in general stochastic games. In: NeurIPS ’22: Proceedings of the 36th International Conference on Neural Information Processing Systems (2022)

  22. Gidel, G., Berard, H., Vignoud, G., Vincent, P., Lacoste-Julien, S.: A variational inequality perspective on generative adversarial networks. In: ICLR ’19: Proceedings of the 2019 International Conference on Learning Representations (2019)

  23. Hall, P., Heyde, C.C.: Martingale limit theory and its application. In: Probability and Mathematical Statistics. Academic Press, New York (1980)

  24. Hart, S., Mas-Colell, A.: Uncoupled dynamics do not lead to Nash equilibrium. Am. Econ. Rev. 93(5), 1830–1836 (2003)

    Google Scholar 

  25. Hart, S., Mas-Colell, A.: Stochastic uncoupled dynamics and Nash equilibrium. Games Econ. Behav. 57, 286–303 (2006)

    MathSciNet  Google Scholar 

  26. Héliou, A., Cohen, J., Mertikopoulos, P.: Learning with bandit feedback in potential games. In: NIPS ’17: Proceedings of the 31st International Conference on Neural Information Processing Systems (2017)

  27. Héliou, A., Mertikopoulos, P., Zhou, Z.: Gradient-free online learning in continuous games with delayed rewards. In: ICML ’20: Proceedings of the 37th International Conference on Machine Learning (2020)

  28. Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, Berlin (2001)

    Google Scholar 

  29. Hofbauer, J., Sandholm, W.H.: On the global convergence of stochastic fictitious play. Econometrica 70(6), 2265–2294 (2002)

    MathSciNet  Google Scholar 

  30. Hofbauer, J., Sigmund, K.: Evolutionary game dynamics. Bull. Am. Math. Soc. 40(4), 479–519 (2003)

    MathSciNet  Google Scholar 

  31. Hsieh, Y.G., Iutzeler, F., Malick, J., Mertikopoulos, P.: On the convergence of single-call stochastic extra-gradient methods. In: NeurIPS ’19: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 6936–6946 (2019)

  32. Hsieh, Y.G., Iutzeler, F., Malick, J., Mertikopoulos, P.: Explore aggressively, update conservatively: stochastic extragradient methods with variable stepsize scaling. In: NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems (2020)

  33. Hsieh, Y.P., Mertikopoulos, P., Cevher, V.: The limits of min-max optimization algorithms: Convergence to spurious non-critical sets. In: ICML ’21: Proceedings of the 38th International Conference on Machine Learning (2021)

  34. Juditsky, A., Nemirovski, A.S., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011)

    MathSciNet  Google Scholar 

  35. Kelly, F.P., Maulloo, A.K., Tan, D.K.H.: Rate control for communication networks: shadow prices, proportional fairness and stability. J. Oper. Res. Soc. 49(3), 237–252 (1998)

    Google Scholar 

  36. Korpelevich, G.M.: The extragradient method for finding saddle points and other problems. Èkonom i Mat Metody 12, 747–756 (1976)

    MathSciNet  Google Scholar 

  37. Kushner, H.J., Yin, G.G.: Stochastic Approximation Algorithms and Applications. Springer-Verlag, New York (1997)

    Google Scholar 

  38. Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)

    Google Scholar 

  39. Leslie, D.S., Collins, E.J.: Individual \(Q\)-learning in normal form games. SIAM J. Control. Optim. 44(2), 495–514 (2005)

    MathSciNet  Google Scholar 

  40. Leslie, D.S., Collins, E.J.: Generalised weakened fictitious play. Games Econ. Behav. 56(2), 285–298 (2006)

    MathSciNet  Google Scholar 

  41. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)

    MathSciNet  Google Scholar 

  42. Maynard Smith, J., Price, G.R.: The logic of animal conflict. Nature 246, 15–18 (1973)

    Google Scholar 

  43. Mertikopoulos, P., Sandholm, W.H.: Learning in games via reinforcement and regularization. Math. Oper. Res. 41(4), 1297–1324 (2016)

    MathSciNet  Google Scholar 

  44. Mertikopoulos, P., Zhou, Z.: Learning in games with continuous action sets and unknown payoff functions. Math. Program. 173(1–2), 465–507 (2019)

    MathSciNet  Google Scholar 

  45. Mertikopoulos, P., Papadimitriou, C.H., Piliouras, G.: Cycles in adversarial regularized learning. In: SODA ’18: Proceedings of the 29th annual ACM-SIAM Symposium on Discrete Algorithms (2018)

  46. Mertikopoulos, P., Lecouat, B., Zenati, H., Foo, C.S., Chandrasekhar, V., Piliouras, G.: Optimistic mirror descent in saddle-point problems: going the extra (gradient) mile. In: ICLR ’19: Proceedings of the 2019 International Conference on Learning Representations (2019)

  47. Mertikopoulos, P., Hallak, N., Kavis, A., Cevher, V.: On the almost sure convergence of stochastic gradient descent in non-convex problems. In: NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems (2020)

  48. Monderer, D., Shapley, L.S.: Potential games. Games Econ. Behav. 14(1), 124–143 (1996)

    MathSciNet  Google Scholar 

  49. Nemirovski, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)

    Google Scholar 

  50. Nesterov, Y.: Primal–dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)

    MathSciNet  Google Scholar 

  51. Nevel’son, M.B., Khasminskii, R.Z.: Stochastic Approximation and Recursive Estimation. American Mathematical Society, Providence, RI (1976)

    Google Scholar 

  52. Oliveira, T.R., Rodrigues, V.H.P., Kristić, M., Başar, T.: Nash equilibrium seeking with arbitrarily delayed player actions. In: CDC ’20: Proceedings of the 59th IEEE Annual Conference on Decision and Control (2019)

  53. Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)

    Google Scholar 

  54. Popov, L.D.: A modification of the Arrow–Hurwicz method for search of saddle points. Math. Notes Acad. Sci. USSR 28(5), 845–848 (1980)

    ADS  Google Scholar 

  55. Rakhlin, A., Sridharan, K.: Optimization, learning, and games with predictable sequences. In: NIPS ’13: Proceedings of the 27th International Conference on Neural Information Processing Systems (2013)

  56. Ratliff, L.J., Burden, S.A., Sastry, S.S.: On the characterization of local Nash equilibria in continuous games. IEEE Trans. Autom. Control 61(8), 2301–2307 (2016)

    MathSciNet  Google Scholar 

  57. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    MathSciNet  Google Scholar 

  58. Robinson, J.: An iterative method for solving a game. Ann. Math. 54, 296–301 (1951)

    MathSciNet  Google Scholar 

  59. Rosen, J.B.: Existence and uniqueness of equilibrium points for concave \({N}\)-person games. Econometrica 33(3), 520–534 (1965)

    MathSciNet  Google Scholar 

  60. Rosenthal, R.W.: A class of games possessing pure-strategy Nash equilibria. Int. J. Game Theory 2, 65–67 (1973)

    MathSciNet  Google Scholar 

  61. Samuelson, L., Zhang, J.: Evolutionary stability in asymmetric games. J. Econ. Theory 57, 363–391 (1992)

    MathSciNet  Google Scholar 

  62. Scutari, G., Facchinei, F., Palomar, D.P., Pang, J.S.: Convex optimization, game theory, and variational inequality theory in multiuser communication systems. IEEE Signal Process. Mag. 27(3), 35–49 (2010)

    ADS  Google Scholar 

  63. Shalev-Shwartz, S., Singer, Y.: Convex repeated games and Fenchel duality. In: NIPS’ 06: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, pp. 1265–1272. MIT Press (2006)

  64. Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control 37(3), 332–341 (1992)

    MathSciNet  Google Scholar 

  65. Syrgkanis, V., Agarwal, A., Luo, H., Schapire, R.E.: Fast convergence of regularized learning in games. In: NIPS ’15: Proceedings of the 29th International Conference on Neural Information Processing Systems, pp. 2989–2997 (2015)

  66. Tatarenko, T., Kamgarpour, M.: Learning generalized Nash equilibria in a class of convex games. IEEE Trans. Autom. Control 64(4), 1426–1439 (2019)

    MathSciNet  Google Scholar 

  67. Tatarenko, T., Kamgarpour, M.: Learning Nash equilibria in monotone games. In: CDC ’19: Proceedings of the 58th IEEE Annual Conference on Decision and Control. https://doi.org/10.1109/CDC40024.2019.9029659 (2019b)

  68. Taylor, P.D., Jonker, L.B.: Evolutionary stable strategies and game dynamics. Math. Biosci. 40(1–2), 145–156 (1978)

    MathSciNet  Google Scholar 

  69. Tullock, G.: Efficient rent seeking. In: Tullock, G. (ed.) Tollison JMBRD. Toward a Theory of the Rent-Seeking Society. Texas A &M University Press (1980)

  70. Vovk, VG.: Aggregating strategies. In: COLT ’90: Proceedings of the 3rd Workshop on Computational Learning Theory, pp. 371–383 (1990)

  71. Zhang, R., Ren, Z., Li, N.: Gradient play in multi-agent Markov stochastic games: stationary points and convergence. arXiv:2106.00198 (2021)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panayotis Mertikopoulos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

P. Mertikopoulos is grateful for financial support by the French National Research Agency (ANR) in the framework of the “Investissements d’avenir” program (ANR-15-IDEX-02), the LabEx PERSYVAL (ANR-11-LABX-0025-01), MIAI@Grenoble Alpes (ANR-19-P3IA-0003), and the bilateral ANR-NRF grant ALIAS (ANR-19-CE48-0018-01). This work has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No 725594-TIME-DATA), from the Swiss National Science Foundation (SNSF) under Grant number 200021-205011, and project MIS 5154714 of the National Recovery and Resilience Plan Greece 2.0 funded by the European Union under the NextGenerationEU Program. The authors are grateful to the associate editor and two anonymous referees for many insightful comments and remarks. The authors are likewise grateful to Victor Boone, Pierre-Louis Cauvin, Angeliki Giannou, Kyriakos Lotidis, Sylvain Sorin, and Manolis Vlatakis for many fruitful discussions. Part of this work was done while P. Mertikopoulos was visiting the Simons Institute for the Theory of Computing.

Appendices

A Regularizers and mirror maps

In this appendix we present some basic properties of the mirror map \(Q\). To state them, recall first that the subdifferential of a \(h\) at \( x\in {\mathcal {X}}\) is defined as \( \partial h( x) {:}{=} \{ y\in {\mathcal {Y}}: h(x') \ge h( x) + \left\langle y \right. ,\left. x'- x\right\rangle \; \hbox { for all}\ x'\in {\mathcal {V}} \},\) the domain of subdifferentiability of \(h\) is \({{\,\textrm{dom}\,}}\partial h{:}{=} \{ x\in {{\,\textrm{dom}\,}}h: \partial h\ne \varnothing \}\), and the convex conjugate of \(h\) is defined as \(h^{*}( y) = \max _{ x\in {\mathcal {X}}} \{ \left\langle y \right. ,\left. x\right\rangle - h( x) \}\) for all \( y\in {\mathcal {Y}}\). We then have the following basic results.

Lemma A.1

Let \(h\) be a regularizer on \({\mathcal {X}}\), and let \(Q:{\mathcal {Y}}\rightarrow {\mathcal {X}}\) be its induced mirror map. Then:

  1. 1.

    \(Q\) is single-valued on \({\mathcal {Y}}\): in particular, for all \( x\in {\mathcal {X}}\), \( y\in {\mathcal {Y}}\), we have \( x = Q( y) \iff y \in \partial h( x)\).

  2. 2.

    The prox-domain \({\mathcal {X}}_{h}{:}{=} {{\,\textrm{im}\,}}Q\) of \(h\) satisfies \({\mathcal {X}}_{h}= {{\,\textrm{dom}\,}}\partial h\) and, hence, \({{\,\textrm{ri}\,}}{\mathcal {X}}\subseteq {\mathcal {X}}_{h}\subseteq {\mathcal {X}}\).

  3. 3.

    \(Q\) is \((1/K)\)-Lipschitz continuous and \(Q= \nabla h^{*}\).

  4. 4.

    For all \( x\in {{\,\textrm{ri}\,}}{\mathcal {X}}\), we have \( y,y'\in \partial h( x)\) if and only if \(\left\langle y'- y \right. ,\left. x'- x\right\rangle = 0\) for all \(x'\in {\mathcal {X}}\).

Our second basic result concerns the Fenchel coupling

$$\begin{aligned} F(p, y) = h(p) + h^{*}( y) - \left\langle y \right. ,\left. p\right\rangle \quad \hbox {for} p\in {\mathcal {X}}\hbox {,} y\in {\mathcal {Y}}. \end{aligned}$$
(A.1)

For our purposes, the most relevant properties of \(F\) are as follows:

Lemma A.2

For all \(p\in {\mathcal {X}}\) and all \( y,y'\in {\mathcal {Y}}\), we have:

$$\begin{aligned} \quad a)&\quad F(p, y) \ge 0 \quad \hbox { with equality if and only if}\ p= Q( y).&\end{aligned}$$
(A.2a)
$$\begin{aligned} \quad b)&\quad F(p, y) \ge \tfrac{1}{2} K\, \Vert Q( y) - p\Vert ^{2}.&\end{aligned}$$
(A.2b)
$$\begin{aligned} \quad c)&\quad F(p,y') \le F(p, y) + \left\langle y'- y \right. ,\left. Q( y) - p\right\rangle + \tfrac{1}{2K} \Vert y'- y\Vert _{*}^{2}. \end{aligned}$$
(A.2c)

In particular, if \(h(0) = 0\), we have

$$\begin{aligned} (K/2) \Vert Q( y)\Vert ^{2}\le & {} h^{*}( y) \le -\min h+ \left\langle y \right. ,\left. Q( y)\right\rangle + (2/K) \Vert y\Vert _{*}^{2} \nonumber \\{} & {} \hbox {for all}\quad y\in {\mathcal {Y}}\end{aligned}$$
(A.3)

Variants of Lemmas A.1 and A.2 already exist in the literature (see e.g., [44] and references therein), so we do not provide a proof. Instead, we proceed below to show how the above extends to the setwise Fenchel coupling

$$\begin{aligned} F_{{\mathcal {S}}}( y) {:}{=} h^{*}( y) - h^{*}_{{\mathcal {S}}}( y) = \min _{p\in {\mathcal {S}}} \{ h(p) + h^{*}( y) - \left\langle y \right. ,\left. p\right\rangle \} = \min _{p\in {\mathcal {S}}} F(p, y)\qquad \end{aligned}$$
(A.4)

where \({\mathcal {S}}\) is a nonempty compact convex subset of \({\mathcal {X}}\) and \(h^{*}_{{\mathcal {S}}}( y) = \max _{ x\in {\mathcal {S}}} \{\left\langle y \right. ,\left. x\right\rangle - h( y)\}\) denotes the convex conjugate of \(h\) relative to \({\mathcal {S}}\). The most important properties of \(F_{{\mathcal {S}}}\) are encoded in the following lemma.

Lemma A.3

With notation as above, we have:

  1. 1.

    \(F_{{\mathcal {S}}}( y) \ge 0\) with equality if and only if \(Q( y) \in {\mathcal {S}}\). Moreover, under the reciprocity condition (R), we have \(F_{{\mathcal {S}}}( y) \rightarrow 0\) if and only if \(Q( y) \rightarrow {\mathcal {S}}\).

  2. 2.

    \(F_{{\mathcal {S}}}\) is differentiable and \(\nabla F_{{\mathcal {S}}}( y) = Q( y) - Q_{{\mathcal {S}}}( y)\), where \(Q_{{\mathcal {S}}}( y) = {{\,\mathrm{arg\,max}\,}}_{ x\in {\mathcal {S}}} \{ \left\langle y \right. ,\left. x\right\rangle - h( x) \}\).

  3. 3.

    For all \(y,y'\in {\mathcal {Y}}\) we have \(\Vert \nabla F_{{\mathcal {S}}}(y') - \nabla F_{{\mathcal {S}}}( y)\Vert \le (2/K) \Vert y'- y\Vert _{*}\).

Proof

Since \({\mathcal {S}}\subseteq {\mathcal {X}}\), we have \(h^{*}_{{\mathcal {S}}} \le h^{*}\) by definition, and hence \(F_{{\mathcal {S}}} \ge 0\). Moreover, since the minimum in (A.4) must be attained in \({\mathcal {S}}\), we get \(F_{{\mathcal {S}}}( y) = 0\) if and only if \(h(p) - h^{*}( y) - \left\langle y \right. ,\left. p\right\rangle = 0\) for some \(p\in {\mathcal {S}}\); by Lemma A.2, this occurs if and only if \(Q( y) = p\in {\mathcal {S}}\), so our first claim follows.

Moving forward, to show that \(F_{{\mathcal {S}}}( y)\rightarrow 0\) if and only if \(Q( y) \rightarrow {\mathcal {S}}\), let \( y_{n}\) be a sequence in \({\mathcal {Y}}\), and let \( x_{n} = Q( y_{n})\). For the “if” part, since \({\mathcal {S}}\) is compact, we may assume without loss of generality (and by descending to a subsequence if necessary) that \( x_{n}\) converges to some \( x\in {\mathcal {S}}\). Observe now that (a) \(0 \le F_{{\mathcal {S}}}( y_{n}) \le F( x, y_{n})\) by the minimum (A.4); and (b) \(F( x, y_{n})\rightarrow 0\) by (R). Thus, by sandwiching, we conclude that \(F_{{\mathcal {S}}}( y_{n})\rightarrow 0\). Conversely, if \(F_{{\mathcal {S}}}( y_{n}) \rightarrow 0\), we may again assume by compactness (and by descending to a subsequence if necessary) that \( x_{n} = Q( y_{n})\) converges to some \({\hat{ x\in {\mathcal {X}}}}\). If \({\hat{ x\not \in {\mathcal {S}}}}\), then, by (R), we have \(\lim _{n\rightarrow \infty } F( x, y_{n}) > 0\) for all \( x\in {\mathcal {S}}\). Since \({\mathcal {S}}\) is compact and \(F( x, y)\) is continuous in x, we conclude that \(\liminf _{n\rightarrow \infty } F_{{\mathcal {S}}}( y_{n}) > 0\), a contradiction which establishes our claim.

Our last two claims follow by applying Lemma A.1 to \(h\) and \(h+\delta _{{\mathcal {S}}}\) where \(\delta _{{\mathcal {S}}}\) denotes the convex indicator of \({\mathcal {S}}\). \(\square \)

The next properties we discuss concern the way that different regions of \({\mathcal {Y}}\) are mapped to \({\mathcal {X}}\) under \(Q\).

Lemma A.4

[43, Prop. A.1]. Let \(h\) be a regularizer on the simplex \({{\,\mathrm{{{\,\mathrm{\Delta }\,}}}\,}}(\mathcal {A}) \subseteq \mathbb {R}^{\mathcal {A}}\). If \( y_{\alpha } - y_{\beta } \rightarrow -\infty \), then \(Q_{\alpha }( y) \rightarrow 0\).

Lemma A.5

Let \(h\) be a regularizer on \({\mathcal {X}}\), let \( y_{n}\), \(n=1,2,\dotsc \) be a sequence in \({\mathcal {Y}}\), and fix some \( x\in {\mathcal {X}}\). If \(\left\langle y_{n} \right. ,\left. z\right\rangle \rightarrow -\infty \) for every nonzero \(z\in {{\,\textrm{TC}\,}}( x)\), we have \(Q( y_{n}) \rightarrow x\).

Proof

Assume that \(\limsup _{n} \Vert x_{n} - x\Vert > 0\). Then, given that \( y_{n} \in \partial h( x_{n})\), we get \( h( x) \ge h( x_{n}) + \left\langle y_{n} \right. ,\left. x - x_{n}\right\rangle \ge h( x_{n}) - \left\langle y_{n} \right. ,\left. z_{n}\right\rangle \Vert x_{n} - x\Vert , \) where we set \(z_{n} = ( x_{n} - x) / \Vert x_{n} - x\Vert \). If we further assume (by descending to a subsequence if needed) that \(z_{n}\) converges in the unit sphere of \(\Vert \cdot \Vert \), there exists some \(z\in {{\,\textrm{TC}\,}}( x)\) with \(\Vert z\Vert = 1\) and such that \(\left\langle y_{n} \right. ,\left. z_{n}\right\rangle \le (1+\varepsilon ) \left\langle y_{n} \right. ,\left. z\right\rangle \) for some \(\varepsilon >0\). Thus, taking the \(\limsup \) of the above estimate gives \(h( x) \ge \infty \), a contradiction which proves our claim. \(\square \)

Lemma A.6

Let \(h\) be a regularizer on a convex polytope \({\mathcal {P}}\) of \({\mathcal {V}}\), let \({\mathcal {S}}\) be a face of \({\mathcal {P}}\), and let \({\mathcal {Z}}= \{ z_{1},\dotsc ,z_{m} \}\) be a set of unit vectors of \({\mathcal {V}}\) such that every point \( x\in {\mathcal {P}}\setminus {\mathcal {S}}\) can be written as \( x = p+ \lambda z\) for some \(p\in {\mathcal {S}}\), \(z\in {\mathcal {Z}}\) and \(\lambda >0\). If \(\max _{z\in {\mathcal {Z}}} \left\langle y \right. ,\left. z\right\rangle \rightarrow -\infty \), then \(Q( y) \rightarrow {\mathcal {S}}\).

Proof

By the compactness of \({\mathcal {P}}\) (and descending to a subsequence if necessary), we may assume that \( x_{n} = Q( y_{n})\) converges to some \( x\in {\mathcal {P}}\). If \( x\notin {\mathcal {S}}\), there exist \(p\in {\mathcal {S}}\), \(z\in {\mathcal {Z}}\) and \(\lambda >0\) such that \( x = p+ \lambda z\). In turn, this gives \(h(p) \ge h( x_{n}) + \left\langle y_{n} \right. ,\left. p- x_{n}\right\rangle = h( x_{n}) - \left\langle y_{n} \right. ,\left. z_{n}\right\rangle \Vert x_{n} - p\Vert \) where we set \(z_{n} = ( x_{n} - p)/\Vert x_{n} - p\Vert \). Since \(z_{n} \rightarrow z\), taking \(n\rightarrow \infty \) yields \(h(p) \ge \infty \), a contradiction which shows that \( x = \lim x_{n} \in {\mathcal {S}}\), as claimed. \(\square \)

We conclude this appendix with the dynamical properties of the Fenchel coupling under (MD). The heavy lifting will be provided by the following simple lemma:

Lemma A.7

Let \( x_{}(t) = Q(y_{}(t))\) be an orbit of (MD). Then, for every nonempty closed convex subset \({\mathcal {S}}\) of \({\mathcal {X}}\), we have

$$\begin{aligned} {\dot{F}}_{{\mathcal {S}}}( y) = \left\langle v( x) \right. ,\left. x - x_{{\mathcal {S}}}\right\rangle \end{aligned}$$
(A.5)

where \( x_{{\mathcal {S}}} = Q_{{\mathcal {S}}}( y)\) denotes the mirror image of y on \({\mathcal {S}}\). In particular, if \({\mathcal {S}}= \{p\}\), we have \({\dot{F}}(p, y) = \left\langle v( x) \right. ,\left. x - p\right\rangle \).

Proof

Simply note that \({\dot{ y}} = -v( x)\) and apply Lemma A.3. \(\square \)

B Omitted proofs and calculations

1.1 B.1 Error bounds for specific algorithms

Our aim in this appendix is to prove the bounds on the bias and magnitude of \({\hat{v}}_{n}\) reported in Proposition 3 and Table 1.

Proof of Proposition 3

We proceed in a method-by-method basis starting with the oracle-based methods of Sect. 3.2, that is, Algorithms 14. For this, we will make free use of the fact that we can take \(M_{n}^{q} = 3^{q-1} (G^{q} + B_{n}^{q} + \sigma _{n}^{q})\) in (8), cf.  the discussion after (6).

\(\blacktriangleright \)  Algorithm 1: Stochastic gradient ascent  For (SGA), we have \(U_{n} = {{\,\textrm{err}\,}}(X_{n};\theta _{n})\) and \(b_{n} = 0\), so our claim follows immediately from the stated assumptions for (SFO).

\(\blacktriangleright \)  Algorithm 2: Extra-gradient  For (EG), we have \({\hat{v}}_{n} = V(X_{n+1/2};\theta _{n+1/2})\) so . We thus get

(B.1)

and, analogously

(B.2)

so under (13), as claimed.

\(\blacktriangleright \)  Algorithm 3: Optimistic gradient  For (OG), we have again , so the same series of arguments as above gives

(B.3)

under (13) with \(q=\infty \). The noise term \(U_{n}\) can be bounded in exactly the same way, so we omit the calculations.

\(\blacktriangleright \)  Algorithm 4: Exponential/multiplicative weights  We consider two cases, based on the information available to the players. For the full information oracle (14a), we have \({\hat{v}}_{n} = v(X_{n})\) so \(b_{n} = U_{n} = 0\) by definition (i.e.,  the oracle is perfect). Otherwise, under the realization-based oracle (14b), we have because \(\alpha _{n}\) is sampled according to \(X_{n}\). We thus get \(b_{n} = 0\) and \(U_{n} = {{\,\mathrm{{\mathcal {O}}}\,}}(1)\), which proves our assertion.

We now proceed with the payoff-based methods of Sect. 3.2, namely Algorithms 57.

\(\blacktriangleright \)  Algorithm 5: Single-point stochastic approximation  Since \(u_{i}\) is assumed bounded in the context of (SPSA), the bound for \(M_{n}\) follows trivially. As for the bias of (SPSA), it will be convenient to set \(V_{i}^{\delta }( x;w) = (d_{i}/\delta ) \, u_{i}( x+\delta w) \, w_{i}\) so, in obvious notation, \({\hat{v}}_{i,n} = V_{i}^{\delta _{n}}(X_{n};W_{n})\). Thus, if we fix a pivot point \( x\in {\mathcal {X}}\) and a query point \(\hat{x}= x+ \delta w\) for some \(w\in {\mathcal {E}}= \prod _{i} {\mathcal {E}}_{i}\), a first-order Taylor expansion of \(u_{i}\) with integral remainder gives

$$\begin{aligned} V_{i}^{\delta }( x;w)&= \frac{d_{i}}{\delta } u_{i}(\hat{x}) \cdot w_{i} = \frac{d_{i}}{\delta } u_{i}( x) \cdot w_{i} + \frac{d_{i}}{\delta } \left\langle \nabla u_{i}( x) \right. ,\left. z\right\rangle \cdot w_{i} \end{aligned}$$
(B.4a)
$$\begin{aligned}&+ \int _{0}^{1} \left\langle \nabla u_{i}( x+\tau z) - \nabla u_{i}( x) \right. ,\left. z\right\rangle \,d\tau \cdot w_{i} \end{aligned}$$
(B.4b)

where we set \(z= \hat{x}- x= \delta w\). Hence, if \(w\) is drawn uniformly at random from \({\mathcal {E}}\), taking expectations yields

(B.5)

where we used the fact that for all \(i\in \mathcal {N}\) and that \(w_{i}\) and \(w_{j}\) are independent for all \(i,j\in \mathcal {N}\), \(i\ne j\). As for the second term, Assumption 1 readily yields

(B.6)

Thus, by combining (B.5) and (B.6), we conclude that , which immediately yields the desired bound \(B_{n} = {{\,\mathrm{{\mathcal {O}}}\,}}(\delta _{n}) = {{\,\mathrm{{\mathcal {O}}}\,}}(1/n^{\ell _{\delta }})\) for (SPSA).

\(\blacktriangleright \)  Algorithm 6: Dampened gradient approximation  Recall that \({\hat{v}}_{i,n} = n\cdot \log ( 1 + (u_{i}(X_{n+1/2}) - u_{i}(X_{n})) W_{i,n} )\). Since \(u_{i}(X_{n+1/2}) - u_{i}(X_{n}) = (1/n) v_{i}(X_{n}) W_{i,n} + {{\,\mathrm{{\mathcal {O}}}\,}}(1/n^{2})\) by the definition of \(X_{n+1/2}\), expanding the logairthm readily yiels \(B_{n} = {{\,\mathrm{{\mathcal {O}}}\,}}(1/n)\) and \(M_{n} = {{\,\mathrm{{\mathcal {O}}}\,}}(1)\). Our claim then follows as above.

\(\blacktriangleright \)  Algorithm 7: Exponential weights for exploration and exploitation  Since \(\hat{\alpha }_{n}\) is sampled according to \(\hat{X}_{n}\), we readily get , so \(B_{n} = {{\,\mathrm{{\mathcal {O}}}\,}}(\Vert \hat{X}_{n} - X_{n}\Vert ) = {{\,\mathrm{{\mathcal {O}}}\,}}(\delta _{n}) = {{\,\mathrm{{\mathcal {O}}}\,}}(1/n^{\ell _{\delta }})\). Moreover, since \(\hat{X}_{i\alpha _{i},n} \ge \delta _{n}/A_{i}\), it follows that \(\Vert {\hat{v}}_{n}\Vert _{*} = {{\,\mathrm{{\mathcal {O}}}\,}}(1/\delta _{n}) = {{\,\mathrm{{\mathcal {O}}}\,}}(n^{\ell _{\delta }})\), and our proof is complete. \(\square \)

1.2 B.2 Energy function derivations

Our aim in this last appendix is to prove the energy properties of the Fenchel coupling as stated in Lemmas 1 and 2. For concision, we will prove both as a special case of the following general result:

Proposition B.12

Let \({\mathcal {S}}\) be a nonempty compact convex subset of \({\mathcal {X}}\), and assume that there exists a neighborhood \({\mathcal {U}}\) of \({\mathcal {S}}\) such that

$$\begin{aligned} \left\langle v( x) \right. ,\left. x - p\right\rangle \le 0 \quad \hbox {for all}\quad x\in {\mathcal {U}},\quad p\in {\mathcal {S}}, \end{aligned}$$
(B.7)

with equality if and only if \( x\in {\mathcal {S}}\). If (R) holds and \(\varphi \) is defined as in (31), the function \(E:{\mathcal {Y}}\rightarrow \mathbb {R}\) given by

$$\begin{aligned} E( y) = \varphi (F_{{\mathcal {S}}}( y)) \quad \hbox {for all}\quad y\in {\mathcal {Y}}\end{aligned}$$
(B.8)

is a local energy function for \({\mathcal {S}}\) under (MD). In addition, if \({\mathcal {U}}= {\mathcal {X}}\), \(E\) is a global energy function for \({\mathcal {S}}\).

Proof

We will verify the requirements of Definition 5 in order.

  1. 1.

    For the first (Lipschitz continuity and smoothness), note that \(\nabla E( y) = \varphi '(F_{{\mathcal {S}}}( y)) \nabla F_{{\mathcal {S}}}( y)\), so, lettting \( x = Q( y)\) and \( x_{{\mathcal {S}}} = Q_{{\mathcal {S}}}( y)\), Lemma A.3 yields

    $$\begin{aligned} \nabla E( y) = ( x - x_{{\mathcal {S}}}) \cdot {\left\{ \begin{array}{ll} 1 &{}\text {if}\quad F_{{\mathcal {S}}}( y) \le 1,\\ 1/\sqrt{F_{{\mathcal {S}}}( y)} &{}\text {otherwise.} \end{array}\right. } \end{aligned}$$
    (B.9)

    Furthermore, by Lemma A.2, we also have \(F(p, y) \ge (K/2) \Vert x - p\Vert ^{2}\) for all \(p\in {\mathcal {S}}\), so, by minimizing over \(p\in {\mathcal {S}}\), we get

    $$\begin{aligned} F_{{\mathcal {S}}}( y) \ge (K/2) {{\,\textrm{dist}\,}}( x,{\mathcal {S}})^{2} = (K/2) \Vert x - {{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x)\Vert ^{2} \end{aligned}$$
    (B.10)

    where \({{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x) {:}{=} {{\,\mathrm{arg\,min}\,}}_{p\in {\mathcal {S}}} \Vert x - p\Vert \). In turn, this gives \(\Vert x - {{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x)\Vert \le \sqrt{2/K}\) whenever \(F_{{\mathcal {S}}}( y) \le 1\), so we get

    $$\begin{aligned} \Vert \nabla E( y)\Vert= & {} \Vert x - x_{{\mathcal {S}}}\Vert \le \Vert x - {{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x)\Vert + \Vert {{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x) - x_{{\mathcal {S}}}\Vert \nonumber \\\le & {} \sqrt{2/K} + {{\,\textrm{diam}\,}}({\mathcal {S}}) \end{aligned}$$
    (B.11)

    whenever \(F_{{\mathcal {S}}}( y) \le 1\). On the other hand, if \(F_{{\mathcal {S}}}( y) \ge 1\), we have

    (B.12)

    By the reciprocity condition (R), it follows that the set \(\{ x = Q( y): F_{{\mathcal {S}}}( y) \ge 1 \}\) is well-separated from \({\mathcal {S}}\), so \(\Vert x - {{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x)\Vert \) is bounded away from zero if \(F_{{\mathcal {S}}}( y) \ge 1\). Thus, by combining Eqs. (B.11) and (B.12), we conclude that \(\Vert \nabla E( y)\Vert \) is bounded. Finally, again by Lemma A.2, \(F( y)\) is \((1/K)\)-Lipschitz smooth, so Eq. 35—which is equivalent to the Lipschitz smoothness of \(E\)—follows immediately from the concavity of \(\varphi \).

  2. 2.

    For the positive-definiteness requirement of Definition 5, note that Lemma A.3 and the reciprocity condition (R) yield \(Q( y) \rightarrow {\mathcal {S}}\) if and only if \(F_{{\mathcal {S}}}( y) \rightarrow 0\). Thus, given that \(\varphi (z) = z\) for small z, the same will hold for \(E= \varphi \circ F_{{\mathcal {S}}}\), and our claim follows.

  3. 3.

    Finally, for the Lyapunov properties of \(E\) under (MD), recall that Lemma A.7 gives \({\dot{F}}_{{\mathcal {S}}}( y) = \left\langle v( x) \right. ,\left. x - x_{{\mathcal {S}}}\right\rangle \), so

    $$\begin{aligned} {\dot{E}}( y)= & {} \left\langle {\dot{ y}} \right. ,\left. \nabla E( y)\right\rangle = \varphi '(F( y)) \left\langle v( x) \right. ,\left. x - x_{{\mathcal {S}}}\right\rangle < 0 \nonumber \\{} & {} \hbox {whenever}\quad x \in {\mathcal {U}}\setminus {\mathcal {S}}\end{aligned}$$
    (B.13)

    where we used the defining property (B.7) of \({\mathcal {S}}\) (recall that \( x_{{\mathcal {S}}} \in {\mathcal {S}}\) by construction). Moving forward, by Lemma A.3, there exists some \(E_{+}>0\) such that the sublevel set \({\mathcal {D}}= \{ y\in {\mathcal {Y}}: F_{{\mathcal {S}}}( y) \le E_{+} \}\) is mapped to \({\mathcal {U}}\) under \(Q\), i.e.,  \(Q( y) \in {\mathcal {U}}\) whenever \(F_{{\mathcal {S}}}( y) \le E_{+}\). Thus, putting everythrpf ing together, we conclude that \({\dot{E}}( y)\rightarrow 0\) if and only if \(F_{{\mathcal {S}}}( y)\rightarrow 0\), which implies that \(\sup \{ {\dot{E}}( y): E_{-}< E( y)< E_{+} \} < 0\) for all \(E_{-}\in (0,E_{+})\), and our proof is complete.

\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mertikopoulos, P., Hsieh, YP. & Cevher, V. A unified stochastic approximation framework for learning in games. Math. Program. 203, 559–609 (2024). https://doi.org/10.1007/s10107-023-02001-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-023-02001-y

Mathematics Subject Classification

Navigation