A J-symmetric quasi-newton method for minimax problems

Asl, Azam; Lu, Haihao; Yang, Jinwen

doi:10.1007/s10107-023-01957-1

A J-symmetric quasi-newton method for minimax problems

Full Length Paper
Series A
Published: 20 April 2023

Volume 204, pages 207–254, (2024)
Cite this article

Mathematical Programming Submit manuscript

431 Accesses
Explore all metrics

Abstract

Minimax problems have gained tremendous attentions across the optimization and machine learning community recently. In this paper, we introduce a new quasi-Newton method for the minimax problems, which we call J-symmetric quasi-Newton method. The method is obtained by exploiting the J-symmetric structure of the second-order derivative of the objective function in minimax problem. We show that the Hessian estimation (as well as its inverse) can be updated by a rank-2 operation, and it turns out that the update rule is a natural generalization of the classic Powell symmetric Broyden method from minimization problems to minimax problems. In theory, we show that our proposed quasi-Newton algorithm enjoys local Q-superlinear convergence to a desirable solution under standard regularity conditions. Furthermore, we introduce a trust-region variant of the algorithm that enjoys global R-superlinear convergence. Finally, we present numerical experiments that verify our theory and show the effectiveness of our proposed algorithms compared to Broyden’s method and the extragradient method on three classes of minimax problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gradient-based iterative approach for solving constrained systems of linear matrix equations

Article 26 April 2024

A modified subgradient extragradient method with non-monotonic step sizes for solving quasimonotone variational inequalities

Article 24 April 2024

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Notes

see Definition A.6 in the appendix for a formal definition of uniform linear independence.

References

Abdi, F., Shakeri, F.: A globally convergent BFGS method for pseudo-monotone variational inequality problems. Optim. Methods Softw. 34(1), 25–36 (2019)
Article MathSciNet Google Scholar
Applegate, D., Díaz, M., Hinder, O., Lu, H., Lubin, M., O’Donoghue, B., Schudy, W.: Practical large-scale linear programming using primal-dual hybrid gradient. arXiv preprint arXiv:2106.04756 (2021)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: PMLR, International Conference on Machine Learning, pp. 214–223 (2017)
Asl, A., Overton, M.L.: Analysis of the gradient method with an Armijo-Wolfe line search on a class of non-smooth convex functions. Optim. Methods Softw. 35(2), 223–242 (2020)
Article MathSciNet Google Scholar
Atkinson, D.S., Vaidya, P.M.: A cutting plane algorithm for convex programming that uses analytic centers. Math. Program. 69(1), 1–43 (1995)
Article MathSciNet Google Scholar
Benzi, M., Golub, G.H.: A preconditioner for generalized saddle point problems. SIAM J. Matrix Anal. Appl. 26(1), 20–41 (2004)
Article MathSciNet Google Scholar
Benzi, M., Golub, G.H., Liesen, J.: Numerical solution of saddle point problems. Acta Numer 14, 1–137 (2005)
Article MathSciNet ADS Google Scholar
Berthold, T., Perregaard, M., Mészáros, C.: Four good reasons to use an interior point solver within a mip solver, Operations Research Proceedings. Springer 2018, 159–164 (2017)
Google Scholar
Bertsekas, D. P.: Constrained optimization and lagrange multiplier methods. Computer Science and Applied Mathematics (1982)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Broyden, C.G.: A class of methods for solving nonlinear simultaneous equations. Math. Comput. 19(92), 577–593 (1965)
Article MathSciNet Google Scholar
Broyden, C.G.: The convergence of single-rank quasi-Newton methods. Math. Comput. 24, 365–382 (1970)
Article MathSciNet Google Scholar
Broyden, C.G., Dennis, J.E., Moré, J.J.: On the local and superlinear convergence of quasi-newton methods. IMA J. Appl. Math. 12(3), 223–245 (1973)
Article MathSciNet Google Scholar
Burke, J.V., Qian, M.: On the superlinear convergence of the variable metric proximal point algorithm using Broyden and BFGS matrix secant updating. Math. Program. 88, 157–181 (1997)
Article MathSciNet Google Scholar
Burke, J.V., Qian, M.: A variable metric proximal point algorithm for monotone operators. SIAM J. Control. Optim. 37(2), 353–375 (1999)
Article MathSciNet Google Scholar
Byrd, R.H., Nocedal, J., Yuan, Y.X.: Global convergence of a class of quasi-Newton methods on convex problems. SIAM J. Numer. Anal. 24(5), 1171–1190 (1987)
Article MathSciNet ADS Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MathSciNet Google Scholar
Chen, X., Fukushima, M.: Proximal quasi-Newton methods for nondifferentiable convex optimization. Math. Program. Publ. Math. Program. Soc. Ser. A 85(2), 313–334 (1999)
Article MathSciNet Google Scholar
Dai, B., Shaw, A., Li, L., Xiao, L., He, N., Liu, Z., Chen, J., Song, L.: SBEED: Convergent reinforcement learning with nonlinear function approximation. In: Jennifer, D., Andreas, K. (eds.), Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, PMLR, 10–15 (2018), pp. 1125–1134
Daskalakis, C., Ilyas, A., Syrgkanis, V., Zeng, H.: Training GANs with optimism. In: International Conference on Learning Representations, (2018)
Davidon, W.C.: Variable metric method for minimization. SIAM J. Optim. 1(1), 1–17 (1991)
Article MathSciNet Google Scholar
Dennis, J.E., Moré, J.J.: A characterization of superlinear convergence and its application to quasi-newton methods. Math. Comput. 28(126), 549–560 (1974)
Article MathSciNet Google Scholar
Dennis, J.E., Moré, J.J.: Quasi-newton methods, motivation and theory. SIAM Rev. 19(1), 46–89 (1977)
Article MathSciNet Google Scholar
Du, S.S, Hu, W.: Linear convergence of the primal-dual gradient method for convex-concave saddle point problems without strong convexity. In: The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, pp. 196–205 (2019)
Essid, M., Tabak, E., Trigila, G.: An implicit gradient-descent procedure for minimax problems (2019)
Fletcher, R., Powell, M.J.D.: A rapidly convergent descent method for minimization. Comput. J. 6, 163–168 (1963)
Article MathSciNet Google Scholar
Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970)
Article Google Scholar
Fletcher, R., Powell, M.J.D.: A rapidly convergent descent method for minimization. Comput. J. 6(2), 163–168 (1963)
Article MathSciNet Google Scholar
Gidel, G., Berard, H., Vignoud, G., Vincent, P., Lacoste-Julien, S.: A variational inequality perspective on generative adversarial networks. arXiv preprint arXiv:1802.10551 (2018)
Gidel, G., Berard, H., Vignoud, G., Vincent, P., Lacoste-Julien, S.: A variational inequality perspective on generative adversarial networks (2020)
Gidel, G., Hemmat, R.A., Pezeshki, M., Le Priol, R., Huang, G., Lacoste-Julien, S., Mitliagkas, I.: Negative momentum for improved game dynamics. In: The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, pp. 1802–1811 (2019)
Goffin, J.L., Sharifi-Mokhtarian, F.: Primal-dual-infeasible newton approach for the analytic center deep-cutting plane method. J. Optim. Theory Appl. 101(1), 35–58 (1999)
Article MathSciNet Google Scholar
Goffin, J.-L., Vial, J.-P.: On the computation of weighted analytic centers and dual ellipsoids with the projective algorithm. Math. Program. 60(1), 81–92 (1993)
Article MathSciNet Google Scholar
Goldfarb, D.: A family of variable-metric methods derived by variational means. Math. Comput. 24(109), 23–26 (1970)
Article MathSciNet Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Bengio, Y.: Generative adversarial networks. Sherjil Ozair (2014)
Grimmer, B., Lu, H., Worah, P., Mirrokni, V.: The landscape of the proximal point method for nonconvex-nonconcave minimax optimization. arXiv preprint arXiv:2006.08667 (2020)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (2012)
Book Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation (2018)
Korpelevich, The extragradient method for finding saddle points and other problems. Matecon 12 (1976)
Ledig, C., Theis, L., Huszar, F., Caballero, J., Cunningham, A., Acosta, A., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. CoRR abs/1609.04802 (2016)
Liang, T., Stokes, J.: Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks. In: The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, pp. 907–915 (2019)
Lu, H.: An $o(s^r)$-resolution ode framework for understanding discrete-time algorithms and applications to the linear convergence of minimax problems (2021)
Haihao, L.: An o (sr)-resolution ode framework for understanding discrete-time algorithms and applications to the linear convergence of minimax problems. Math. Program. 194(1), 1061–1112 (2022)
MathSciNet Google Scholar
Steven Mackey, D., Mackey, N., Tisseur, F.: Structured tools for structured matrices. Electron. J. Linear. Algebra 10, 106–145 (2003)
MathSciNet Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations (2018)
Mescheder, L., Nowozin, S., Geiger, A.: The numerics of gans. Adv. Neural Inf. Process. Syst. 30 (2017)
Mokhtari, A., Ozdaglar, A., Pattathil, S.: A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: Proximal point approach. In: International Conference on Artificial Intelligence and Statistics (2020)
Moré, J.J., Trangenstein, J.A.: On the global convergence of Broyden’s method. Math. Comput. 30(135), 523–540 (1976)
MathSciNet Google Scholar
Nemirovski, A.: Prox-method with rate of convergence O(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Article MathSciNet Google Scholar
Nesterov, Yu.: Complexity estimates of some cutting plane methods based on the analytic barrier. Math. Program. 69(1), 149–176 (1995)
Article MathSciNet Google Scholar
Nesterov, Y., Nemirovskii, A.: Interior-point polynomial algorithms in convex programming. In: SIAM (1994)
Nicholas J. Higham on the top 10 algorithms in applied mathematics, https://press.princeton.edu/ideas/nicholas-higham-on-the-top-10-algorithms-in-applied-mathematics. Accessed 01 Oct 2022
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
Google Scholar
Ortega, J. M., Rheinboldt, W. C.: Iterative solution of nonlinear equations in several variables, Classics in Applied Mathematics, Society for Industrial and Applied Mathematics (1970)
Osborne, M.J., Rubinstein, A.: A course in game theory. MIT Press, Cambridge (1994)
Pearlmutter, B.A.: Fast exact multiplication by the hessian. Neural Comput. 6(1), 147–160 (1994)
Article Google Scholar
Powell, M.J.D.: A hybrid method for nonlinear equations, Numerical methods for nonlinear algebraic equations. In: Proceedings of the Conference on University, Essex, Colchester, 1969, pp. 87–114 (1970)
Powell, M.J.D.: A new algorithm for unconstrained optimization. In: Nonlinear Programming Proceedings of the Symposium, University of Wisconsin, Madison, WI. Academic Press. New York 1970, 31–65 (1970)
Powell, M.J.D.: Some global convergence properties of a variable metric algorithm for minimization without exact line searches. In: Nonlinear Programming (Providence), Am. Math. Soc., SIAM-AMS Proc., Vol. IX, pp. 53–72 (1976)
Tyrrell Rockafellar, R.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)
Article MathSciNet Google Scholar
Schraudolph, N.N.: Fast curvature matrix-vector products for second-order gradient descent. Neural Comput. 14(7), 1723–1738 (2002)
Article PubMed Google Scholar
Shanno, D.F.: Conditioning of quasi-newton methods for function minimization. Math. Comput. 24(11), 647–656 (1970)
Article MathSciNet Google Scholar
Sidi, A.: A zero-cost preconditioning for a class of indefinite linear systems. WSEAS Trans. Math. 2 (2003)
Ye, Y.: Complexity analysis of the analytic center cutting plane method that uses multiple cuts. Math. Program. 78(1), 85–104 (1996)
Article MathSciNet Google Scholar
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. IEEE Int. Conf. Comput. Vis. (ICCV) 2017, 2242–2251 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Chicago Booth School of Business, Chicago, USA
Azam Asl & Haihao Lu
Department of Statistics, University of Chicago, Chicago, USA
Jinwen Yang

Authors

Azam Asl
View author publications
You can also search for this author in PubMed Google Scholar
Haihao Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jinwen Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haihao Lu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Existing definitions and results used in the proofs

Lemma A.1

(Sherman–Woodbury Formula) ( [37, page 19]) Suppose $A\in {\mathbb {R}}^{n\times n}$ is an invertible matrix and vectors $u,v\in {\mathbb {R}}^{n}$. Then $A+uv^T$ is invertible if and only is $1+v^TA^{-1}u\ne 0$. In this case,

$$\begin{aligned} (A+uv^T)^{-1}=A^{-1}-\frac{A^{-1}uv^TA^{-1}}{1+v^TA^{-1}u} . \end{aligned}$$

Lemma A.2

(Banach Perturbation Lemma) ( [54, page 45]) Consider square matrices $A,B \in {\mathbb {R}}^{d\times d}$. Suppose that A is invertible with $\Vert A^{-1}\Vert \le a$. If $\Vert A-B\Vert \le b$ and $ab < 1$, then B is also invertible and

$$\begin{aligned} \Vert B^{-1}\Vert \le \frac{a}{1-ab } . \end{aligned}$$

Lemma A.3

([23, Eq. (1.2)]) Consider square matrices $A,B \in {\mathbb {R}}^{d\times d}$. Then

$$\begin{aligned} \Vert AB \Vert _F \le \min \{ ~\Vert A \Vert _F\Vert B \Vert ~,~ \Vert A \Vert \Vert B \Vert _F~ \}. \end{aligned}$$

Definition A.4

(R-superlinear and Q-superlinear convergence rates [48]) We say the sequence $\{z_k\}$ is converging to $z^*$ R-superlinearly, if

$$\begin{aligned} \lim _{k \rightarrow \infty } \Vert z_k -z^*\Vert ^{1/k} =0, \end{aligned}$$

and $\{z_k\}$ is converging to $z^*$ Q-superlinearly, if there exists a sequence $\{q_k\}$ converging to zero such that

$$\begin{aligned} \lim _{k \rightarrow \infty } \frac{\Vert z_{k+1} -z^*\Vert }{\Vert z_k -z^*\Vert } \le q_k . \end{aligned}$$

Theorem A.5

(Dennis-Moré Q-superlinear Characterization Identity) ([22, Theorem 2.2]) Let the mapping F be differentiable in the open convex set $\mathbb {D}$ and assume that for some $z^* \in \mathbb {D}$, $\nabla F$ is continuous at $z^*$ and $\nabla F(z^*)$ is invertible. Let $\{B_k \}$ be a sequence of invertible matrices and suppose $\{z_k\}$, with $z_{k+1} = z_k -B_k^{-1}F(z_k)$, remains in $\mathbb {D}$ and converges to $z^*$. Then $\{z_k\}$ converges Q-superlinearly to $z^*$ and $F(z^*) = 0$ iff

$$\begin{aligned} \lim _{k\rightarrow \infty } \dfrac{\Big \Vert \big (B_k - \nabla F(z^*)\big )(z_{k+1}-z_k) \Big \Vert }{\Vert z_{k+1}-z_k\Vert } = 0 . \end{aligned}$$

Definition A.6

(Uniform linear independence) ( [48, Definition 5.1.]) A sequence of unit vectors $\{u_j\}$ in ${\mathbb {R}}^{n+m}$ is uniformly linearly independent if there is $\beta >0$, $k_0\ge 0$ and $t\ge n+m$, such that for $k\ge k_0$ and $\Vert x\Vert =1$, we have:

Theorem A.7

([48, Theorem 5.3.]) Let $\{ u_k\}$ be a sequence of unit vectors in ${\mathbb {R}}^{n+m}$. Then the following options are equivalent.

The sequence $\{ u_k\}$ is uniformly linearly independent.
For any ${\hat{\beta }} \in [0,1) $ there is a constant $\theta \in (0,1)$ such that if $|\beta _j-1| \le {\hat{\beta }}$ then:
$$\begin{aligned} \left\| \prod _{j=k+1}^{k+t} \big ( I -\beta _j u_ju_j^T \big )\right\| \le \theta , ~~\textrm{for}~~k\ge k_0 \mathrm {~~and~~} t\ge n+m. \end{aligned}$$

Lemma A.8

( [48, Lemma 5.5.]) Let $\{ \phi _k\}$ and $\{ \delta _k\}$ be sequences of nonnegative numbers such that $\phi _{k+t} \le \theta \phi _k + \delta _k$ for some fixed integer $t \ge 1$ and $\theta \in (0,1)$. If $\{ \delta _k\}$ is bounded then $\{ \phi _k\}$ is also bounded, and if in addition, $\{ \delta _k\}$ converges to zero, then $\{ \phi _k\}$ converges to zero.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Asl, A., Lu, H. & Yang, J. A J-symmetric quasi-newton method for minimax problems. Math. Program. 204, 207–254 (2024). https://doi.org/10.1007/s10107-023-01957-1

Download citation

Received: 18 April 2022
Accepted: 07 March 2023
Published: 20 April 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10107-023-01957-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A J-symmetric quasi-newton method for minimax problems

Abstract

Access this article

Similar content being viewed by others

Gradient-based iterative approach for solving constrained systems of linear matrix equations

A modified subgradient extragradient method with non-monotonic step sizes for solving quasimonotone variational inequalities

Random Gradient-Free Minimization of Convex Functions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Existing definitions and results used in the proofs

Lemma A.1

Lemma A.2

Lemma A.3

Definition A.4

Theorem A.5

Definition A.6

Theorem A.7

Lemma A.8

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A J-symmetric quasi-newton method for minimax problems

Abstract

Access this article

Similar content being viewed by others

Gradient-based iterative approach for solving constrained systems of linear matrix equations

A modified subgradient extragradient method with non-monotonic step sizes for solving quasimonotone variational inequalities

Random Gradient-Free Minimization of Convex Functions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Existing definitions and results used in the proofs

Existing definitions and results used in the proofs

Lemma A.1

Lemma A.2

Lemma A.3

Definition A.4

Theorem A.5

Definition A.6

Theorem A.7

Lemma A.8

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation