Abstract
We study the ridge method for min-max problems, and investigate its convergence without any convexity, differentiability or qualification assumption. The central issue is to determine whether the “parametric optimality formula” provides a conservative gradient, a notion of generalized derivative well suited for optimization. The answer to this question is positive in a semi-algebraic, and more generally definable, context. As a consequence, the ridge method applied to definable objectives is proved to have a minimizing behavior and to converge to a set of equilibria which satisfy an optimality condition. Definability is key to our proof: we show that for a more general class of nonsmooth functions, conservativity of the parametric optimality formula may fail, resulting in an absurd behavior of the ridge method.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Similar content being viewed by others
References
Ablin, P., Peyré, G., Moreau, T.: Super-efficiency of automatic differentiation for functions defined as a minimum. Presented at the (2020)
Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, Z. (2019). Differentiable convex optimization layers. Advances in neural information processing systems
Aliprantis C.D., Border K.C. (2005) Infinite Dimensional Analysis (3rd edition) Springer
Ambrosio L., Gigli N. and Savaré G. (2008). Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media
Amos, B., Kolter, J. Z. (2017). Optnet: Differentiable optimization as a layer in neural networks. In International Conference on Machine Learning
Arjovsky, Chintala, Bottou (2017). Wasserstein GAN. International Conference on Machine Learning
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical Programming 137(1), 91–129 (2013)
Benaïm, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization 44(1), 328–348 (2005)
Bianchi, P., Hachem, W. and Schechtman, S. (2020). Convergence of constant step stochastic gradient descent for non-smooth non-convex functions. arXiv preprint arXiv:2005.08513
Berthet, Q., Blondel, M., Teboul, O., Cuturi, M., Vert, J. P., Bach, F. (2020). Learning with differentiable perturbed optimizers. Advances in neural information processing systems
Bertsekas D. P. (1971). Control of uncertain systems with a set-membership description of the uncertainty. Doctoral dissertation, Massachusetts Institute of Technology
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM Journal on Optimization 18(2), 556–572 (2007)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146(1), 459–494 (2014)
Bolte, J. and Pauwels, E. (2020). Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Mathematical Programming
Bolte, J. and Pauwels, E. (2020). A mathematical model for automatic differentiation in machine learning. Proceedings of the conference on neural information processing systems
Borwein, J.M., Moors, W.B.: A chain rule for essentially smooth Lipschitz functions. SIAM Journal on Optimization 8(2), 300–308 (1998)
Borwein, J., Moors, W., Wang, X.: Generalized subdifferentials: a Baire categorical approach. Transactions of the American Mathematical Society 353(10), 3875–3893 (2001)
Borwein, J.M.: Generalisations, Examples, and Counter-examples in Analysis and Optimisation. Set-Valued and Variational Analysis 25(3), 467–479 (2017)
C. Castera, J. Bolte, C. Févotte and E. Pauwels (2019). An inertial newton algorithm for deep learning. arXiv preprint http://arxiv.org/abs/1905.12278arXiv:1905.12278
Clarke F. H. (1983). Optimization and nonsmooth analysis. Siam
Coste, M.: An introduction to o-minimal geometry. RAAG notes, Institut de Recherche Mathématique de Rennes (1999)
Danskin, J.M.: The theory of max-min, with applications. SIAM Journal on Applied Mathematics 14(4), 641–664 (1966)
Davis D., Drusvyatskiy D., Kakade S., and Lee J. D. (2020). Stochastic subgradient method converges on tame functions, 20(1), 119-154. Foundations of Computational Mathematics
Davis, D. and Drusvyatskiy, D. (2021). Conservative and semismooth derivatives are equivalent for semialgebraic maps. arXiv preprint http://arxiv.org/abs/2102.08484arXiv:2102.08484
Dem’Yanov, V.F.: On the solution of several minimax problems. I. Cybernetics 2(6), 47–53 (1966)
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J 84(2), 497–540 (1996)
Evans L. C. and Gariepy R. F. (2015). Measure theory and fine properties of functions. Revised Edition. Chapman and Hall/CRC
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. Generative adversarial nets. Advances in neural information processing systems
Goodfellow, I. J., Shlens, J., Szegedy, C. (2015). Explaining and harnessing adversarial examples. International conference on learning representations
Jin, C., Netrapalli, P. and Jordan, M. (2020). What is local optimality in nonconvex-nonconcave minimax optimization? In International Conference on Machine Learning (pp. 4880-4889). PMLR
Kong, W. and Monteiro, R. D. (2019). An accelerated inexact proximal point method for solving nonconvex-concave min-max problems. arXiv preprint http://arxiv.org/abs/1905.13433arXiv:1905.13433
Lewis, A., Tian, T.: The structure of conservative gradient fields. SIAM Journal on Optimization 31(3), 2080–2083 (2021)
Lin, T., Jin, C. and Jordan, M. (2020). On gradient descent ascent for nonconvex-concave minimax problems. In International Conference on Machine Learning (pp. 6083-6093). PMLR
Ostrovskii, D.M., Lowy, A., Razaviyayn, M.: Efficient search of first-order nash equilibria in nonconvex-concave smooth min-max problems. SIAM Journal on Optimization 31(4), 2508–2538 (2021)
Rafique, H., Liu, M., Lin, Q. and Yang, T. (2021). Weakly-convex-concave min-max optimization: provable algorithms and applications in machine learning. Optimization Methods and Software, 1-35
Rios-Zertuche R. (2020). Examples of pathological dynamics of the subgradient method for Lipschitz path-differentiable functions. arXiv preprint http://arxiv.org/abs/2007.11699arXiv:2007.11699
Rockafellar, R.T.: Extensions of subgradient calculus with applications to optimization. Nonlinear Analysis: Theory, Methods & Applications 9(7), 665–698 (1985)
Rockafellar R. T. and Wets R. J. B. (1998). Variational analysis (Vol. 317). Springer Science & Business Media
H. Royden, P. Fitzpatrick (2010) Real Analysis Prentice Hall
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R. (2014). Intriguing properties of neural networks. International Conference on Learning Representations
Thekumparampil, K. K., Jain, P., Netrapalli, P. and Oh, S. (2019). Efficient algorithms for smooth minimax optimization. arXiv preprint http://arxiv.org/abs/1907.01543arXiv:1907.01543
Valadier, M.: Entraînement unilatéral, lignes de descente, fonctions lipschitziennes non pathologiques. Comptes rendus de l’Académie des Sciences 308, 241–244 (1989)
Wang X. (1995). Pathological Lipschitz functions in \(\mathbb{R}_n\). Master Thesis, Simon Fraser University
Wang Y. and Zhang G. and Ba J. (2020) On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach International Conference on Learning Representations
Whitney, H.: A function not constant on a connected set of critical points. Duke Mathematical Journal 1(4), 514–517 (1935)
Wilkie, A.J.: A theorem of the complement and some new o-minimal structures. Selecta Mathematica 5(4), 397–421 (1999)
Acknowledgements
The author would like to thank Jérôme Bolte and Rodolfo Rios-Zeruche for interesting discussions which helped putting this work together. The author acknowledge the support of ANR-3IA Artificial and Natural Intelligence Toulouse Institute under the grant agreement ANR-19-PI3A-0004, Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant numbers FA9550-19-1-7026, FA8655-22-1-7012, and ANR MaSDOL - 19-CE23-0017-01.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pauwels, E. Conservative Parametric Optimality and the Ridge Method for Tame Min-Max Problems. Set-Valued Var. Anal 31, 19 (2023). https://doi.org/10.1007/s11228-023-00682-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11228-023-00682-3