A unified stochastic approximation framework for learning in games

Mertikopoulos, Panayotis; Hsieh, Ya-Ping; Cevher, Volkan

doi:10.1007/s10107-023-02001-y

A unified stochastic approximation framework for learning in games

Full Length Paper
Series B
Published: 04 August 2023

Volume 203, pages 559–609, (2024)
Cite this article

Mathematical Programming Submit manuscript

450 Accesses
1 Citation
Explore all metrics

Abstract

We develop a flexible stochastic approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite). The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, the exponential/multiplicative weights algorithm for learning in finite games, optimistic and bandit variants of the above, etc. In addition to providing an integrated view of these algorithms, our framework further allows us to obtain several new convergence results, both asymptotic and in finite time, in both continuous and finite games. Specifically, we provide a range of criteria for identifying classes of Nash equilibria and sets of action profiles that are attracting with high probability, and we also introduce the notion of coherence, a game-theoretic property that includes strict and sharp equilibria, and which leads to convergence in finite time. Importantly, our analysis applies to both oracle-based and bandit, payoff-based methods—that is, when players only observe their realized payoffs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A Crash Course in Differential Games and Applications

Article Open access 25 March 2024

Notes

The authors thank S. Sorin for proposing this definition.
Strictly speaking, the regularizer $ x \log x - x$ is not strongly convex over $\mathbb {R}_{+}$ but it is strongly convex over any bounded subset of $\mathbb {R}_{+}$—and it can be made strongly convex over all of $\mathbb {R}_{+}$ by adding a small quadratic penalty of the form $\varepsilon x^{2} / 2$. This issue does not change the essence of our results, so we sidestep the details.
In some cases, the index set may be enlarged to include all positive half-integers ($n= 1/2,1,3/2,\dotsc $).
This formulation of (SPSA) is tailored to unconstrained problems. In this case, to ensure that the resulting gradient estimator remains bounded, it is customary to include an indicator of the form ${{\,\mathrm{\mathbb {1}}\,}}( \Vert \hat{X}_{n}\Vert \le R_{n})$ for some suitably chosen sequence $R_{n}\rightarrow \infty $ [64]. This would lead to the same analysis but at the cost of heavier notation so, instead, we will assume that the players’ payoff functions are bounded when discussing (SPSA). For a detailed discussion of how to adapt (SPSA) in the presence of constraints, we refer the reader to Bravo et al. [9] who show that the relevant entries of Table 1 apply verbatim when ${\mathcal {X}}$ is compact.
To see this, let ${\mathcal {K}}= L_{c_{0}}^{+}(\Phi )$ be a convex upper level set of $\Phi $ in ${{\,\textrm{ri}\,}}{\mathcal {X}}$. Then, for all $c\le c_{0}$ and all x with $\Phi ( x) = c$, the segment $ x + \tau (p- x)$, $\tau \in [0,1]$, is contained in $L_{c}^{+}(\Phi ) \supseteq L_{c_{0}}^{+}(\Phi )$, so the function $\phi (\tau ) = \Phi ( x + \tau (p- x))$ cannot have $\phi '(0) < 0$. This implies that $0 \le \left\langle \nabla \Phi ( x) \right. ,\left. p- x\right\rangle = \left\langle v( x) \right. ,\left. p- x\right\rangle $ for all $ x\in {\mathcal {X}}\setminus {\mathcal {K}}$, i.e., ${\mathcal {G}}$ is subcoercive.
We are grateful to V. Boone for pointing out this simple argument.
That such a function exists is an exercise in the construction of aproximate identities, which we omit.
A point $ y\in {\mathcal {Y}}$ is said to be attainable; by $Y_{n}$ if, for every neighborhood ${\mathcal {W}}$ of y in ${\mathcal {Y}}$ and for all $n\ge 1$, we have .
For the general case, take $E( y) = [h^{*}( y) - \inf h^{*}]^{-1}$.
The case of mixed strategies dominated by mixed strategies requires heavier notation, so we do not treat it.
As we explain in Appendix 1, the image ${{\,\textrm{im}\,}}Q$ of $Q$ coincides with the prox-domain ${\mathcal {X}}_{h}= {{\,\textrm{dom}\,}}\partial h$ of $h$. As such, a sufficient condition for $Q$ to be surjective is for $h$ to be Lipschitz continuous on ${\mathcal {X}}$.

References

Arrow, K.J., Hurwicz, L., Uzawa, H.: Studies in Linear and Non-linear Programming. Stanford University Press, Redwood City (1958)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science (1995)
Azizian, W., Iutzeler, F., Malick, J., Mertikopoulos, P.: On the rate of convergence of Bregman proximal methods in constrained variational inequalities. arXiv:2211.08043 (2022)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, New York (2017)
Google Scholar
Benaïm, M.: Vertex reinforced random walks and a conjecture of Pemantle. Ann. Probab. 25, 361–392 (1997)
MathSciNet Google Scholar
Benaïm, M.: Dynamics of stochastic approximation algorithms. In: Azéma, J., Émery, M., Ledoux, M., Yor, M. (eds.) Séminaire de Probabilités XXXIII. Lecture Notes in Mathematics, vol. 1709, pp. 1–68. Springer, Berlin (1999)
Benaïm, M., Hirsch, M.W.: Asymptotic pseudotrajectories and chain recurrent flows, with applications. J. Dyn. Differ. Equ. 8(1), 141–176 (1996)
MathSciNet Google Scholar
Bervoets, S., Bravo, M., Faure, M.: Learning with minimal information in continuous games. Theor. Econ. 15, 1471–1508 (2020)
MathSciNet Google Scholar
Bravo, M., Leslie, D.S., Mertikopoulos, P.: Bandit learning in concave ${N}$-person games. In: NeurIPS ’18: Proceedings of the 32nd International Conference of Neural Information Processing Systems (2018)
Brown, G.W.: Iterative solutions of games by fictitious play. In: Coopmans, T.C. (ed.) Activity Analysis of Productions and Allocation, pp. 374–376. Wiley (1951)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)
Google Scholar
Coucheney, P., Gaujal, B., Mertikopoulos, P.: Penalty-regulated dynamics and robust learning procedures in games. Math. Oper. Res. 40(3), 611–633 (2015)
MathSciNet Google Scholar
Daskalakis, C., Panageas, I.: Last-iterate convergence: Zero-sum games and constrained min–max optimization. In: ITCS ’19: Proceedings of the 10th Conference on Innovations in Theoretical Computer Science (2019)
Daskalakis, C., Ilyas, A., Syrgkanis, V., Zeng, H.: Training GANs with optimism. In: ICLR ’18: Proceedings of the 2018 International Conference on Learning Representations (2018)
Debreu, G.: A social equilibrium existence theorem. Proc. Natl. Acad. Sci. USA 38(10), 886–893 (1952)
ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Duflo, M.: Cibles atteignables avec une probabilité positive d’après M. Benaïm, Mimeo (1997)
Duvocelle, B., Mertikopoulos, P., Staudigl, M., Vermeulen, D.: Multi-agent online learning in time-varying games. Math. Oper. Res. 48(2), 914–941 (2023)
MathSciNet Google Scholar
Even-dar, E., Mansour, Y., Nadav, U.: On the convergence of regret minimization dynamics in concave games. In: STOC ’09: Proceedings of the 41st Annual ACM Symposium on the Theory of Computing, pp. 523–532. ACM, New York (2009)
Flokas, L., Vlatakis-Gkaragkounis, E.V., Lianeas, T., Mertikopoulos, P., Piliouras, G.: No-regret learning and mixed Nash equilibria: they do not mix. In: NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems (2020)
Giannou, A., Vlatakis-Gkaragkounis, E.V., Mertikopoulos, P.: The convergence rate of regularized learning in games: From bandits and uncertainty to optimism and beyond. In: NeurIPS ’21: Proceedings of the 35th International Conference on Neural Information Processing Systems (2021)
Giannou, A., Lotidis, K., Mertikopoulos, P., Vlatakis-Gkaragkounis, E.V.: On the convergence of policy gradient methods to Nash equilibria in general stochastic games. In: NeurIPS ’22: Proceedings of the 36th International Conference on Neural Information Processing Systems (2022)
Gidel, G., Berard, H., Vignoud, G., Vincent, P., Lacoste-Julien, S.: A variational inequality perspective on generative adversarial networks. In: ICLR ’19: Proceedings of the 2019 International Conference on Learning Representations (2019)
Hall, P., Heyde, C.C.: Martingale limit theory and its application. In: Probability and Mathematical Statistics. Academic Press, New York (1980)
Hart, S., Mas-Colell, A.: Uncoupled dynamics do not lead to Nash equilibrium. Am. Econ. Rev. 93(5), 1830–1836 (2003)
Google Scholar
Hart, S., Mas-Colell, A.: Stochastic uncoupled dynamics and Nash equilibrium. Games Econ. Behav. 57, 286–303 (2006)
MathSciNet Google Scholar
Héliou, A., Cohen, J., Mertikopoulos, P.: Learning with bandit feedback in potential games. In: NIPS ’17: Proceedings of the 31st International Conference on Neural Information Processing Systems (2017)
Héliou, A., Mertikopoulos, P., Zhou, Z.: Gradient-free online learning in continuous games with delayed rewards. In: ICML ’20: Proceedings of the 37th International Conference on Machine Learning (2020)
Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, Berlin (2001)
Google Scholar
Hofbauer, J., Sandholm, W.H.: On the global convergence of stochastic fictitious play. Econometrica 70(6), 2265–2294 (2002)
MathSciNet Google Scholar
Hofbauer, J., Sigmund, K.: Evolutionary game dynamics. Bull. Am. Math. Soc. 40(4), 479–519 (2003)
MathSciNet Google Scholar
Hsieh, Y.G., Iutzeler, F., Malick, J., Mertikopoulos, P.: On the convergence of single-call stochastic extra-gradient methods. In: NeurIPS ’19: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 6936–6946 (2019)
Hsieh, Y.G., Iutzeler, F., Malick, J., Mertikopoulos, P.: Explore aggressively, update conservatively: stochastic extragradient methods with variable stepsize scaling. In: NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems (2020)
Hsieh, Y.P., Mertikopoulos, P., Cevher, V.: The limits of min-max optimization algorithms: Convergence to spurious non-critical sets. In: ICML ’21: Proceedings of the 38th International Conference on Machine Learning (2021)
Juditsky, A., Nemirovski, A.S., Tauvel, C.: Solving variational inequalities with stochastic mirror-prox algorithm. Stoch. Syst. 1(1), 17–58 (2011)
MathSciNet Google Scholar
Kelly, F.P., Maulloo, A.K., Tan, D.K.H.: Rate control for communication networks: shadow prices, proportional fairness and stability. J. Oper. Res. Soc. 49(3), 237–252 (1998)
Google Scholar
Korpelevich, G.M.: The extragradient method for finding saddle points and other problems. Èkonom i Mat Metody 12, 747–756 (1976)
MathSciNet Google Scholar
Kushner, H.J., Yin, G.G.: Stochastic Approximation Algorithms and Applications. Springer-Verlag, New York (1997)
Google Scholar
Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)
Google Scholar
Leslie, D.S., Collins, E.J.: Individual $Q$-learning in normal form games. SIAM J. Control. Optim. 44(2), 495–514 (2005)
MathSciNet Google Scholar
Leslie, D.S., Collins, E.J.: Generalised weakened fictitious play. Games Econ. Behav. 56(2), 285–298 (2006)
MathSciNet Google Scholar
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)
MathSciNet Google Scholar
Maynard Smith, J., Price, G.R.: The logic of animal conflict. Nature 246, 15–18 (1973)
Google Scholar
Mertikopoulos, P., Sandholm, W.H.: Learning in games via reinforcement and regularization. Math. Oper. Res. 41(4), 1297–1324 (2016)
MathSciNet Google Scholar
Mertikopoulos, P., Zhou, Z.: Learning in games with continuous action sets and unknown payoff functions. Math. Program. 173(1–2), 465–507 (2019)
MathSciNet Google Scholar
Mertikopoulos, P., Papadimitriou, C.H., Piliouras, G.: Cycles in adversarial regularized learning. In: SODA ’18: Proceedings of the 29th annual ACM-SIAM Symposium on Discrete Algorithms (2018)
Mertikopoulos, P., Lecouat, B., Zenati, H., Foo, C.S., Chandrasekhar, V., Piliouras, G.: Optimistic mirror descent in saddle-point problems: going the extra (gradient) mile. In: ICLR ’19: Proceedings of the 2019 International Conference on Learning Representations (2019)
Mertikopoulos, P., Hallak, N., Kavis, A., Cevher, V.: On the almost sure convergence of stochastic gradient descent in non-convex problems. In: NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems (2020)
Monderer, D., Shapley, L.S.: Potential games. Games Econ. Behav. 14(1), 124–143 (1996)
MathSciNet Google Scholar
Nemirovski, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
Google Scholar
Nesterov, Y.: Primal–dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)
MathSciNet Google Scholar
Nevel’son, M.B., Khasminskii, R.Z.: Stochastic Approximation and Recursive Estimation. American Mathematical Society, Providence, RI (1976)
Google Scholar
Oliveira, T.R., Rodrigues, V.H.P., Kristić, M., Başar, T.: Nash equilibrium seeking with arbitrarily delayed player actions. In: CDC ’20: Proceedings of the 59th IEEE Annual Conference on Decision and Control (2019)
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
Google Scholar
Popov, L.D.: A modification of the Arrow–Hurwicz method for search of saddle points. Math. Notes Acad. Sci. USSR 28(5), 845–848 (1980)
ADS Google Scholar
Rakhlin, A., Sridharan, K.: Optimization, learning, and games with predictable sequences. In: NIPS ’13: Proceedings of the 27th International Conference on Neural Information Processing Systems (2013)
Ratliff, L.J., Burden, S.A., Sastry, S.S.: On the characterization of local Nash equilibria in continuous games. IEEE Trans. Autom. Control 61(8), 2301–2307 (2016)
MathSciNet Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
MathSciNet Google Scholar
Robinson, J.: An iterative method for solving a game. Ann. Math. 54, 296–301 (1951)
MathSciNet Google Scholar
Rosen, J.B.: Existence and uniqueness of equilibrium points for concave ${N}$-person games. Econometrica 33(3), 520–534 (1965)
MathSciNet Google Scholar
Rosenthal, R.W.: A class of games possessing pure-strategy Nash equilibria. Int. J. Game Theory 2, 65–67 (1973)
MathSciNet Google Scholar
Samuelson, L., Zhang, J.: Evolutionary stability in asymmetric games. J. Econ. Theory 57, 363–391 (1992)
MathSciNet Google Scholar
Scutari, G., Facchinei, F., Palomar, D.P., Pang, J.S.: Convex optimization, game theory, and variational inequality theory in multiuser communication systems. IEEE Signal Process. Mag. 27(3), 35–49 (2010)
ADS Google Scholar
Shalev-Shwartz, S., Singer, Y.: Convex repeated games and Fenchel duality. In: NIPS’ 06: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, pp. 1265–1272. MIT Press (2006)
Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control 37(3), 332–341 (1992)
MathSciNet Google Scholar
Syrgkanis, V., Agarwal, A., Luo, H., Schapire, R.E.: Fast convergence of regularized learning in games. In: NIPS ’15: Proceedings of the 29th International Conference on Neural Information Processing Systems, pp. 2989–2997 (2015)
Tatarenko, T., Kamgarpour, M.: Learning generalized Nash equilibria in a class of convex games. IEEE Trans. Autom. Control 64(4), 1426–1439 (2019)
MathSciNet Google Scholar
Tatarenko, T., Kamgarpour, M.: Learning Nash equilibria in monotone games. In: CDC ’19: Proceedings of the 58th IEEE Annual Conference on Decision and Control. https://doi.org/10.1109/CDC40024.2019.9029659 (2019b)
Taylor, P.D., Jonker, L.B.: Evolutionary stable strategies and game dynamics. Math. Biosci. 40(1–2), 145–156 (1978)
MathSciNet Google Scholar
Tullock, G.: Efficient rent seeking. In: Tullock, G. (ed.) Tollison JMBRD. Toward a Theory of the Rent-Seeking Society. Texas A &M University Press (1980)
Vovk, VG.: Aggregating strategies. In: COLT ’90: Proceedings of the 3rd Workshop on Computational Learning Theory, pp. 371–383 (1990)
Zhang, R., Ren, Z., Li, N.: Gradient play in multi-agent Markov stochastic games: stationary points and convergence. arXiv:2106.00198 (2021)

Download references

Author information

Authors and Affiliations

Univ. Grenoble Alpes, CNRS, Inria, LIG, 38000, Grenoble, France
Panayotis Mertikopoulos
Institute for Machine Learning, CAB G 69.3, Universitaetstrasse 6, 8092, Zurich, Switzerland
Ya-Ping Hsieh
Laboratory for Information and Inference Systems, IEL STI EPFL, 1015, Lausanne, Switzerland
Volkan Cevher

Authors

Panayotis Mertikopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Ya-Ping Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Volkan Cevher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Panayotis Mertikopoulos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

P. Mertikopoulos is grateful for financial support by the French National Research Agency (ANR) in the framework of the “Investissements d’avenir” program (ANR-15-IDEX-02), the LabEx PERSYVAL (ANR-11-LABX-0025-01), MIAI@Grenoble Alpes (ANR-19-P3IA-0003), and the bilateral ANR-NRF grant ALIAS (ANR-19-CE48-0018-01). This work has also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No 725594-TIME-DATA), from the Swiss National Science Foundation (SNSF) under Grant number 200021-205011, and project MIS 5154714 of the National Recovery and Resilience Plan Greece 2.0 funded by the European Union under the NextGenerationEU Program. The authors are grateful to the associate editor and two anonymous referees for many insightful comments and remarks. The authors are likewise grateful to Victor Boone, Pierre-Louis Cauvin, Angeliki Giannou, Kyriakos Lotidis, Sylvain Sorin, and Manolis Vlatakis for many fruitful discussions. Part of this work was done while P. Mertikopoulos was visiting the Simons Institute for the Theory of Computing.

Appendices

A Regularizers and mirror maps

In this appendix we present some basic properties of the mirror map $Q$. To state them, recall first that the subdifferential of a $h$ at $ x\in {\mathcal {X}}$ is defined as $ \partial h( x) {:}{=} \{ y\in {\mathcal {Y}}: h(x') \ge h( x) + \left\langle y \right. ,\left. x'- x\right\rangle \; \hbox { for all}\ x'\in {\mathcal {V}} \},$ the domain of subdifferentiability of $h$ is ${{\,\textrm{dom}\,}}\partial h{:}{=} \{ x\in {{\,\textrm{dom}\,}}h: \partial h\ne \varnothing \}$, and the convex conjugate of $h$ is defined as $h^{*}( y) = \max _{ x\in {\mathcal {X}}} \{ \left\langle y \right. ,\left. x\right\rangle - h( x) \}$ for all $ y\in {\mathcal {Y}}$. We then have the following basic results.

Lemma A.1

Let $h$ be a regularizer on ${\mathcal {X}}$, and let $Q:{\mathcal {Y}}\rightarrow {\mathcal {X}}$ be its induced mirror map. Then:

1.
$Q$ is single-valued on ${\mathcal {Y}}$: in particular, for all $ x\in {\mathcal {X}}$, $ y\in {\mathcal {Y}}$, we have $ x = Q( y) \iff y \in \partial h( x)$.
2.
The prox-domain ${\mathcal {X}}_{h}{:}{=} {{\,\textrm{im}\,}}Q$ of $h$ satisfies ${\mathcal {X}}_{h}= {{\,\textrm{dom}\,}}\partial h$ and, hence, ${{\,\textrm{ri}\,}}{\mathcal {X}}\subseteq {\mathcal {X}}_{h}\subseteq {\mathcal {X}}$.
3.
$Q$ is $(1/K)$-Lipschitz continuous and $Q= \nabla h^{*}$.
4.
For all $ x\in {{\,\textrm{ri}\,}}{\mathcal {X}}$, we have $ y,y'\in \partial h( x)$ if and only if $\left\langle y'- y \right. ,\left. x'- x\right\rangle = 0$ for all $x'\in {\mathcal {X}}$.

Our second basic result concerns the Fenchel coupling

$$\begin{aligned} F(p, y) = h(p) + h^{*}( y) - \left\langle y \right. ,\left. p\right\rangle \quad \hbox {for} p\in {\mathcal {X}}\hbox {,} y\in {\mathcal {Y}}. \end{aligned}$$

(A.1)

For our purposes, the most relevant properties of $F$ are as follows:

Lemma A.2

For all $p\in {\mathcal {X}}$ and all $ y,y'\in {\mathcal {Y}}$, we have:

$$\begin{aligned} \quad a)&\quad F(p, y) \ge 0 \quad \hbox { with equality if and only if}\ p= Q( y).&\end{aligned}$$

(A.2a)

$$\begin{aligned} \quad b)&\quad F(p, y) \ge \tfrac{1}{2} K\, \Vert Q( y) - p\Vert ^{2}.&\end{aligned}$$

(A.2b)

$$\begin{aligned} \quad c)&\quad F(p,y') \le F(p, y) + \left\langle y'- y \right. ,\left. Q( y) - p\right\rangle + \tfrac{1}{2K} \Vert y'- y\Vert _{*}^{2}. \end{aligned}$$

(A.2c)

In particular, if $h(0) = 0$, we have

$$\begin{aligned} (K/2) \Vert Q( y)\Vert ^{2}\le & {} h^{*}( y) \le -\min h+ \left\langle y \right. ,\left. Q( y)\right\rangle + (2/K) \Vert y\Vert _{*}^{2} \nonumber \\{} & {} \hbox {for all}\quad y\in {\mathcal {Y}}\end{aligned}$$

(A.3)

Variants of Lemmas A.1 and A.2 already exist in the literature (see e.g., [44] and references therein), so we do not provide a proof. Instead, we proceed below to show how the above extends to the setwise Fenchel coupling

$$\begin{aligned} F_{{\mathcal {S}}}( y) {:}{=} h^{*}( y) - h^{*}_{{\mathcal {S}}}( y) = \min _{p\in {\mathcal {S}}} \{ h(p) + h^{*}( y) - \left\langle y \right. ,\left. p\right\rangle \} = \min _{p\in {\mathcal {S}}} F(p, y)\qquad \end{aligned}$$

(A.4)

where ${\mathcal {S}}$ is a nonempty compact convex subset of ${\mathcal {X}}$ and $h^{*}_{{\mathcal {S}}}( y) = \max _{ x\in {\mathcal {S}}} \{\left\langle y \right. ,\left. x\right\rangle - h( y)\}$ denotes the convex conjugate of $h$ relative to ${\mathcal {S}}$. The most important properties of $F_{{\mathcal {S}}}$ are encoded in the following lemma.

Lemma A.3

With notation as above, we have:

1.
$F_{{\mathcal {S}}}( y) \ge 0$ with equality if and only if $Q( y) \in {\mathcal {S}}$. Moreover, under the reciprocity condition (R), we have $F_{{\mathcal {S}}}( y) \rightarrow 0$ if and only if $Q( y) \rightarrow {\mathcal {S}}$.
2.
$F_{{\mathcal {S}}}$ is differentiable and $\nabla F_{{\mathcal {S}}}( y) = Q( y) - Q_{{\mathcal {S}}}( y)$, where $Q_{{\mathcal {S}}}( y) = {{\,\mathrm{arg\,max}\,}}_{ x\in {\mathcal {S}}} \{ \left\langle y \right. ,\left. x\right\rangle - h( x) \}$.
3.
For all $y,y'\in {\mathcal {Y}}$ we have $\Vert \nabla F_{{\mathcal {S}}}(y') - \nabla F_{{\mathcal {S}}}( y)\Vert \le (2/K) \Vert y'- y\Vert _{*}$.

Proof

Since ${\mathcal {S}}\subseteq {\mathcal {X}}$, we have $h^{*}_{{\mathcal {S}}} \le h^{*}$ by definition, and hence $F_{{\mathcal {S}}} \ge 0$. Moreover, since the minimum in (A.4) must be attained in ${\mathcal {S}}$, we get $F_{{\mathcal {S}}}( y) = 0$ if and only if $h(p) - h^{*}( y) - \left\langle y \right. ,\left. p\right\rangle = 0$ for some $p\in {\mathcal {S}}$; by Lemma A.2, this occurs if and only if $Q( y) = p\in {\mathcal {S}}$, so our first claim follows.

Moving forward, to show that $F_{{\mathcal {S}}}( y)\rightarrow 0$ if and only if $Q( y) \rightarrow {\mathcal {S}}$, let $ y_{n}$ be a sequence in ${\mathcal {Y}}$, and let $ x_{n} = Q( y_{n})$. For the “if” part, since ${\mathcal {S}}$ is compact, we may assume without loss of generality (and by descending to a subsequence if necessary) that $ x_{n}$ converges to some $ x\in {\mathcal {S}}$. Observe now that (a) $0 \le F_{{\mathcal {S}}}( y_{n}) \le F( x, y_{n})$ by the minimum (A.4); and (b) $F( x, y_{n})\rightarrow 0$ by (R). Thus, by sandwiching, we conclude that $F_{{\mathcal {S}}}( y_{n})\rightarrow 0$. Conversely, if $F_{{\mathcal {S}}}( y_{n}) \rightarrow 0$, we may again assume by compactness (and by descending to a subsequence if necessary) that $ x_{n} = Q( y_{n})$ converges to some ${\hat{ x\in {\mathcal {X}}}}$. If ${\hat{ x\not \in {\mathcal {S}}}}$, then, by (R), we have $\lim _{n\rightarrow \infty } F( x, y_{n}) > 0$ for all $ x\in {\mathcal {S}}$. Since ${\mathcal {S}}$ is compact and $F( x, y)$ is continuous in x, we conclude that $\liminf _{n\rightarrow \infty } F_{{\mathcal {S}}}( y_{n}) > 0$, a contradiction which establishes our claim.

Our last two claims follow by applying Lemma A.1 to $h$ and $h+\delta _{{\mathcal {S}}}$ where $\delta _{{\mathcal {S}}}$ denotes the convex indicator of ${\mathcal {S}}$. $\square $

The next properties we discuss concern the way that different regions of ${\mathcal {Y}}$ are mapped to ${\mathcal {X}}$ under $Q$.

Lemma A.4

[43, Prop. A.1]. Let $h$ be a regularizer on the simplex ${{\,\mathrm{{{\,\mathrm{\Delta }\,}}}\,}}(\mathcal {A}) \subseteq \mathbb {R}^{\mathcal {A}}$. If $ y_{\alpha } - y_{\beta } \rightarrow -\infty $, then $Q_{\alpha }( y) \rightarrow 0$.

Lemma A.5

Let $h$ be a regularizer on ${\mathcal {X}}$, let $ y_{n}$, $n=1,2,\dotsc $ be a sequence in ${\mathcal {Y}}$, and fix some $ x\in {\mathcal {X}}$. If $\left\langle y_{n} \right. ,\left. z\right\rangle \rightarrow -\infty $ for every nonzero $z\in {{\,\textrm{TC}\,}}( x)$, we have $Q( y_{n}) \rightarrow x$.

Proof

Assume that $\limsup _{n} \Vert x_{n} - x\Vert > 0$. Then, given that $ y_{n} \in \partial h( x_{n})$, we get $ h( x) \ge h( x_{n}) + \left\langle y_{n} \right. ,\left. x - x_{n}\right\rangle \ge h( x_{n}) - \left\langle y_{n} \right. ,\left. z_{n}\right\rangle \Vert x_{n} - x\Vert , $ where we set $z_{n} = ( x_{n} - x) / \Vert x_{n} - x\Vert $. If we further assume (by descending to a subsequence if needed) that $z_{n}$ converges in the unit sphere of $\Vert \cdot \Vert $, there exists some $z\in {{\,\textrm{TC}\,}}( x)$ with $\Vert z\Vert = 1$ and such that $\left\langle y_{n} \right. ,\left. z_{n}\right\rangle \le (1+\varepsilon ) \left\langle y_{n} \right. ,\left. z\right\rangle $ for some $\varepsilon >0$. Thus, taking the $\limsup $ of the above estimate gives $h( x) \ge \infty $, a contradiction which proves our claim. $\square $

Lemma A.6

Let $h$ be a regularizer on a convex polytope ${\mathcal {P}}$ of ${\mathcal {V}}$, let ${\mathcal {S}}$ be a face of ${\mathcal {P}}$, and let ${\mathcal {Z}}= \{ z_{1},\dotsc ,z_{m} \}$ be a set of unit vectors of ${\mathcal {V}}$ such that every point $ x\in {\mathcal {P}}\setminus {\mathcal {S}}$ can be written as $ x = p+ \lambda z$ for some $p\in {\mathcal {S}}$, $z\in {\mathcal {Z}}$ and $\lambda >0$. If $\max _{z\in {\mathcal {Z}}} \left\langle y \right. ,\left. z\right\rangle \rightarrow -\infty $, then $Q( y) \rightarrow {\mathcal {S}}$.

Proof

By the compactness of ${\mathcal {P}}$ (and descending to a subsequence if necessary), we may assume that $ x_{n} = Q( y_{n})$ converges to some $ x\in {\mathcal {P}}$. If $ x\notin {\mathcal {S}}$, there exist $p\in {\mathcal {S}}$, $z\in {\mathcal {Z}}$ and $\lambda >0$ such that $ x = p+ \lambda z$. In turn, this gives $h(p) \ge h( x_{n}) + \left\langle y_{n} \right. ,\left. p- x_{n}\right\rangle = h( x_{n}) - \left\langle y_{n} \right. ,\left. z_{n}\right\rangle \Vert x_{n} - p\Vert $ where we set $z_{n} = ( x_{n} - p)/\Vert x_{n} - p\Vert $. Since $z_{n} \rightarrow z$, taking $n\rightarrow \infty $ yields $h(p) \ge \infty $, a contradiction which shows that $ x = \lim x_{n} \in {\mathcal {S}}$, as claimed. $\square $

We conclude this appendix with the dynamical properties of the Fenchel coupling under (MD). The heavy lifting will be provided by the following simple lemma:

Lemma A.7

Let $ x_{}(t) = Q(y_{}(t))$ be an orbit of (MD). Then, for every nonempty closed convex subset ${\mathcal {S}}$ of ${\mathcal {X}}$, we have

$$\begin{aligned} {\dot{F}}_{{\mathcal {S}}}( y) = \left\langle v( x) \right. ,\left. x - x_{{\mathcal {S}}}\right\rangle \end{aligned}$$

(A.5)

where $ x_{{\mathcal {S}}} = Q_{{\mathcal {S}}}( y)$ denotes the mirror image of y on ${\mathcal {S}}$. In particular, if ${\mathcal {S}}= \{p\}$, we have ${\dot{F}}(p, y) = \left\langle v( x) \right. ,\left. x - p\right\rangle $.

Proof

Simply note that ${\dot{ y}} = -v( x)$ and apply Lemma A.3. $\square $

B Omitted proofs and calculations

1.1 B.1 Error bounds for specific algorithms

Our aim in this appendix is to prove the bounds on the bias and magnitude of ${\hat{v}}_{n}$ reported in Proposition 3 and Table 1.

Proof of Proposition 3

We proceed in a method-by-method basis starting with the oracle-based methods of Sect. 3.2, that is, Algorithms 1–4. For this, we will make free use of the fact that we can take $M_{n}^{q} = 3^{q-1} (G^{q} + B_{n}^{q} + \sigma _{n}^{q})$ in (8), cf. the discussion after (6).

$\blacktriangleright $ Algorithm 1: Stochastic gradient ascent For (SGA), we have $U_{n} = {{\,\textrm{err}\,}}(X_{n};\theta _{n})$ and $b_{n} = 0$, so our claim follows immediately from the stated assumptions for (SFO).

$\blacktriangleright $ Algorithm 2: Extra-gradient For (EG), we have ${\hat{v}}_{n} = V(X_{n+1/2};\theta _{n+1/2})$ so . We thus get

(B.1)

and, analogously

(B.2)

so under (13), as claimed.

$\blacktriangleright $ Algorithm 3: Optimistic gradient For (OG), we have again , so the same series of arguments as above gives

(B.3)

under (13) with $q=\infty $. The noise term $U_{n}$ can be bounded in exactly the same way, so we omit the calculations.

$\blacktriangleright $ Algorithm 4: Exponential/multiplicative weights We consider two cases, based on the information available to the players. For the full information oracle (14a), we have ${\hat{v}}_{n} = v(X_{n})$ so $b_{n} = U_{n} = 0$ by definition (i.e., the oracle is perfect). Otherwise, under the realization-based oracle (14b), we have because $\alpha _{n}$ is sampled according to $X_{n}$. We thus get $b_{n} = 0$ and $U_{n} = {{\,\mathrm{{\mathcal {O}}}\,}}(1)$, which proves our assertion.

We now proceed with the payoff-based methods of Sect. 3.2, namely Algorithms 5–7.

$\blacktriangleright $ Algorithm 5: Single-point stochastic approximation Since $u_{i}$ is assumed bounded in the context of (SPSA), the bound for $M_{n}$ follows trivially. As for the bias of (SPSA), it will be convenient to set $V_{i}^{\delta }( x;w) = (d_{i}/\delta ) \, u_{i}( x+\delta w) \, w_{i}$ so, in obvious notation, ${\hat{v}}_{i,n} = V_{i}^{\delta _{n}}(X_{n};W_{n})$. Thus, if we fix a pivot point $ x\in {\mathcal {X}}$ and a query point $\hat{x}= x+ \delta w$ for some $w\in {\mathcal {E}}= \prod _{i} {\mathcal {E}}_{i}$, a first-order Taylor expansion of $u_{i}$ with integral remainder gives

$$\begin{aligned} V_{i}^{\delta }( x;w)&= \frac{d_{i}}{\delta } u_{i}(\hat{x}) \cdot w_{i} = \frac{d_{i}}{\delta } u_{i}( x) \cdot w_{i} + \frac{d_{i}}{\delta } \left\langle \nabla u_{i}( x) \right. ,\left. z\right\rangle \cdot w_{i} \end{aligned}$$

(B.4a)

$$\begin{aligned}&+ \int _{0}^{1} \left\langle \nabla u_{i}( x+\tau z) - \nabla u_{i}( x) \right. ,\left. z\right\rangle \,d\tau \cdot w_{i} \end{aligned}$$

(B.4b)

where we set $z= \hat{x}- x= \delta w$. Hence, if $w$ is drawn uniformly at random from ${\mathcal {E}}$, taking expectations yields

(B.5)

where we used the fact that for all $i\in \mathcal {N}$ and that $w_{i}$ and $w_{j}$ are independent for all $i,j\in \mathcal {N}$, $i\ne j$. As for the second term, Assumption 1 readily yields

(B.6)

Thus, by combining (B.5) and (B.6), we conclude that , which immediately yields the desired bound $B_{n} = {{\,\mathrm{{\mathcal {O}}}\,}}(\delta _{n}) = {{\,\mathrm{{\mathcal {O}}}\,}}(1/n^{\ell _{\delta }})$ for (SPSA).

$\blacktriangleright $ Algorithm 6: Dampened gradient approximation Recall that ${\hat{v}}_{i,n} = n\cdot \log ( 1 + (u_{i}(X_{n+1/2}) - u_{i}(X_{n})) W_{i,n} )$. Since $u_{i}(X_{n+1/2}) - u_{i}(X_{n}) = (1/n) v_{i}(X_{n}) W_{i,n} + {{\,\mathrm{{\mathcal {O}}}\,}}(1/n^{2})$ by the definition of $X_{n+1/2}$, expanding the logairthm readily yiels $B_{n} = {{\,\mathrm{{\mathcal {O}}}\,}}(1/n)$ and $M_{n} = {{\,\mathrm{{\mathcal {O}}}\,}}(1)$. Our claim then follows as above.

$\blacktriangleright $ Algorithm 7: Exponential weights for exploration and exploitation Since $\hat{\alpha }_{n}$ is sampled according to $\hat{X}_{n}$, we readily get , so $B_{n} = {{\,\mathrm{{\mathcal {O}}}\,}}(\Vert \hat{X}_{n} - X_{n}\Vert ) = {{\,\mathrm{{\mathcal {O}}}\,}}(\delta _{n}) = {{\,\mathrm{{\mathcal {O}}}\,}}(1/n^{\ell _{\delta }})$. Moreover, since $\hat{X}_{i\alpha _{i},n} \ge \delta _{n}/A_{i}$, it follows that $\Vert {\hat{v}}_{n}\Vert _{*} = {{\,\mathrm{{\mathcal {O}}}\,}}(1/\delta _{n}) = {{\,\mathrm{{\mathcal {O}}}\,}}(n^{\ell _{\delta }})$, and our proof is complete. $\square $

1.2 B.2 Energy function derivations

Our aim in this last appendix is to prove the energy properties of the Fenchel coupling as stated in Lemmas 1 and 2. For concision, we will prove both as a special case of the following general result:

Proposition B.12

Let ${\mathcal {S}}$ be a nonempty compact convex subset of ${\mathcal {X}}$, and assume that there exists a neighborhood ${\mathcal {U}}$ of ${\mathcal {S}}$ such that

$$\begin{aligned} \left\langle v( x) \right. ,\left. x - p\right\rangle \le 0 \quad \hbox {for all}\quad x\in {\mathcal {U}},\quad p\in {\mathcal {S}}, \end{aligned}$$

(B.7)

with equality if and only if $ x\in {\mathcal {S}}$. If (R) holds and $\varphi $ is defined as in (31), the function $E:{\mathcal {Y}}\rightarrow \mathbb {R}$ given by

$$\begin{aligned} E( y) = \varphi (F_{{\mathcal {S}}}( y)) \quad \hbox {for all}\quad y\in {\mathcal {Y}}\end{aligned}$$

(B.8)

is a local energy function for ${\mathcal {S}}$ under (MD). In addition, if ${\mathcal {U}}= {\mathcal {X}}$, $E$ is a global energy function for ${\mathcal {S}}$.

Proof

We will verify the requirements of Definition 5 in order.

1.
For the first (Lipschitz continuity and smoothness), note that $\nabla E( y) = \varphi '(F_{{\mathcal {S}}}( y)) \nabla F_{{\mathcal {S}}}( y)$, so, lettting $ x = Q( y)$ and $ x_{{\mathcal {S}}} = Q_{{\mathcal {S}}}( y)$, Lemma A.3 yields
$$\begin{aligned} \nabla E( y) = ( x - x_{{\mathcal {S}}}) \cdot {\left\{ \begin{array}{ll} 1 &{}\text {if}\quad F_{{\mathcal {S}}}( y) \le 1,\\ 1/\sqrt{F_{{\mathcal {S}}}( y)} &{}\text {otherwise.} \end{array}\right. } \end{aligned}$$
(B.9)
Furthermore, by Lemma A.2, we also have $F(p, y) \ge (K/2) \Vert x - p\Vert ^{2}$ for all $p\in {\mathcal {S}}$, so, by minimizing over $p\in {\mathcal {S}}$, we get
$$\begin{aligned} F_{{\mathcal {S}}}( y) \ge (K/2) {{\,\textrm{dist}\,}}( x,{\mathcal {S}})^{2} = (K/2) \Vert x - {{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x)\Vert ^{2} \end{aligned}$$
(B.10)
where ${{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x) {:}{=} {{\,\mathrm{arg\,min}\,}}_{p\in {\mathcal {S}}} \Vert x - p\Vert $. In turn, this gives $\Vert x - {{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x)\Vert \le \sqrt{2/K}$ whenever $F_{{\mathcal {S}}}( y) \le 1$, so we get
$$\begin{aligned} \Vert \nabla E( y)\Vert= & {} \Vert x - x_{{\mathcal {S}}}\Vert \le \Vert x - {{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x)\Vert + \Vert {{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x) - x_{{\mathcal {S}}}\Vert \nonumber \\\le & {} \sqrt{2/K} + {{\,\textrm{diam}\,}}({\mathcal {S}}) \end{aligned}$$
(B.11)
whenever $F_{{\mathcal {S}}}( y) \le 1$. On the other hand, if $F_{{\mathcal {S}}}( y) \ge 1$, we have
(B.12)
By the reciprocity condition (R), it follows that the set $\{ x = Q( y): F_{{\mathcal {S}}}( y) \ge 1 \}$ is well-separated from ${\mathcal {S}}$, so $\Vert x - {{\,\textrm{pr}\,}}_{{\mathcal {S}}}( x)\Vert $ is bounded away from zero if $F_{{\mathcal {S}}}( y) \ge 1$. Thus, by combining Eqs. (B.11) and (B.12), we conclude that $\Vert \nabla E( y)\Vert $ is bounded. Finally, again by Lemma A.2, $F( y)$ is $(1/K)$-Lipschitz smooth, so Eq. 35—which is equivalent to the Lipschitz smoothness of $E$—follows immediately from the concavity of $\varphi $.
2.
For the positive-definiteness requirement of Definition 5, note that Lemma A.3 and the reciprocity condition (R) yield $Q( y) \rightarrow {\mathcal {S}}$ if and only if $F_{{\mathcal {S}}}( y) \rightarrow 0$. Thus, given that $\varphi (z) = z$ for small z, the same will hold for $E= \varphi \circ F_{{\mathcal {S}}}$, and our claim follows.
3.
Finally, for the Lyapunov properties of $E$ under (MD), recall that Lemma A.7 gives ${\dot{F}}_{{\mathcal {S}}}( y) = \left\langle v( x) \right. ,\left. x - x_{{\mathcal {S}}}\right\rangle $, so
$$\begin{aligned} {\dot{E}}( y)= & {} \left\langle {\dot{ y}} \right. ,\left. \nabla E( y)\right\rangle = \varphi '(F( y)) \left\langle v( x) \right. ,\left. x - x_{{\mathcal {S}}}\right\rangle < 0 \nonumber \\{} & {} \hbox {whenever}\quad x \in {\mathcal {U}}\setminus {\mathcal {S}}\end{aligned}$$
(B.13)
where we used the defining property (B.7) of ${\mathcal {S}}$ (recall that $ x_{{\mathcal {S}}} \in {\mathcal {S}}$ by construction). Moving forward, by Lemma A.3, there exists some $E_{+}>0$ such that the sublevel set ${\mathcal {D}}= \{ y\in {\mathcal {Y}}: F_{{\mathcal {S}}}( y) \le E_{+} \}$ is mapped to ${\mathcal {U}}$ under $Q$, i.e., $Q( y) \in {\mathcal {U}}$ whenever $F_{{\mathcal {S}}}( y) \le E_{+}$. Thus, putting everythrpf ing together, we conclude that ${\dot{E}}( y)\rightarrow 0$ if and only if $F_{{\mathcal {S}}}( y)\rightarrow 0$, which implies that $\sup \{ {\dot{E}}( y): E_{-}< E( y)< E_{+} \} < 0$ for all $E_{-}\in (0,E_{+})$, and our proof is complete.

$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mertikopoulos, P., Hsieh, YP. & Cevher, V. A unified stochastic approximation framework for learning in games. Math. Program. 203, 559–609 (2024). https://doi.org/10.1007/s10107-023-02001-y

Download citation

Received: 31 January 2021
Accepted: 13 June 2023
Published: 04 August 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10107-023-02001-y

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A unified stochastic approximation framework for learning in games

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A Crash Course in Differential Games and Applications

Notes

References