No dimension-free deterministic algorithm computes approximate stationarities of Lipschitzians

Tian, Lai; So, Anthony Man-Cho

doi:10.1007/s10107-023-02031-6

No dimension-free deterministic algorithm computes approximate stationarities of Lipschitzians

Full Length Paper
Series A
Published: 06 January 2024

(2024)
Cite this article

Mathematical Programming Submit manuscript

506 Accesses
Explore all metrics

Abstract

We consider the oracle complexity of computing an approximate stationary point of a Lipschitz function. When the function is smooth, it is well known that the simple deterministic gradient method has finite dimension-free oracle complexity. However, when the function can be nonsmooth, it is only recently that a randomized algorithm with finite dimension-free oracle complexity has been developed. In this paper, we show that no deterministic algorithm can do the same. Moreover, even without the dimension-free requirement, we show that any finite-time deterministic method cannot be general zero-respecting. In particular, this implies that a natural derandomization of the aforementioned randomized algorithm cannot have finite-time complexity. Our results reveal a fundamental hurdle in modern large-scale nonconvex nonsmooth optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bernstein–Jackson Inequalities on Gaussian Hilbert Spaces

Article Open access 12 September 2023

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Upper and Lower Bounds for the p-Numerical Radii of Operators

Article 11 January 2024

Notes

Here \(\partial f({\varvec{x}})\) is the Clarke subdifferential of f at \({\varvec{x}}\); see Definition 1 for details.
Recall that f is \(\rho \)-weakly convex if \({\varvec{x}} \mapsto f({\varvec{x}}) + \frac{\rho }{2} \Vert {\varvec{x}}\Vert ^2\) is convex.
In this paper, we assume that an algorithm will always start from \({\varvec{x}}^{(1)}={\varvec{0}}\). This is without loss of generality due to the lack of information about f before querying \({\varvec{x}}^{(1)}\) and the translational invariance of all considered function classes.
We became aware of these concurrent, independent developments when a preliminary version of our manuscript was being reviewed for possible publication in the proceedings of a conference.

References

Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units. In: International Conference on Learning Representations (2018)
Benaïm, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1), 328–348 (2005)
Article MathSciNet Google Scholar
Blyth, T.S.: Set Theory and Abstract Algebra. Longman Publishing Group, New York (1975)
Google Scholar
Böckenhauer, H.J., Hromkovič, J., Komm, D., Krug, S., Smula, J., Sprock, A.: The string guessing problem as a method to prove lower bounds on the advice complexity. Theor. Comput. Sci. 554, 95–108 (2014)
Article MathSciNet Google Scholar
Braun, G., Guzmán, C., Pokutta, S.: Lower bounds on the oracle complexity of nonsmooth convex optimization via information theory. IEEE Trans. Inf. Theory 63(7), 4709–4724 (2017)
Article MathSciNet Google Scholar
Burke, J.V., Curtis, F.E., Lewis, A.S., Overton, M.L., Simões, L.E.: Gradient sampling methods for nonsmooth optimization. In: Numerical Nonsmooth Optimization: State of the Art Algorithms, pp. 201–225. Springer, Cham (2020)
Chapter Google Scholar
Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)
Article MathSciNet Google Scholar
Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: “Convex until proven guilty”: dimension-free acceleration of gradient descent on non-convex functions. In: Proceedings of the 34th International Conference on Machine Learning, pp. 654–663 (2017)
Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Lower bounds for finding stationary points I. Math. Program. 184(1–2), 71–120 (2020)
Article MathSciNet Google Scholar
Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Lower bounds for finding stationary points II: first-order methods. Math. Program. 185(1), 315–355 (2021)
Article MathSciNet Google Scholar
Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)
Article MathSciNet Google Scholar
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, Philadelphia (1990)
Book Google Scholar
Conn, A.R., Gould, N.I., Toint, P.L.: Trust Region Methods. SIAM, Philadelphia (2000)
Book Google Scholar
Cui, Y., Pang, J.S.: Modern Nonconvex Nondifferentiable Optimization. SIAM, Philadelphia (2021)
Book Google Scholar
Daniilidis, A., Drusvyatskiy, D.: Pathological subgradient dynamics. SIAM J. Optim. 30(2), 1327–1338 (2020)
Article MathSciNet Google Scholar
Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)
Article MathSciNet Google Scholar
Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found. Comput. Math. 20(1), 119–154 (2020)
Article MathSciNet Google Scholar
Davis, D., Drusvyatskiy, D., Lee, Y.T., Padmanabhan, S., Ye, G.: A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions. In: Advances in Neural Information Processing Systems, vol. 35, pp. 6692–6703 (2022)
Davis, D., Grimmer, B.: Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM J. Optim. 29(3), 1908–1930 (2019)
Article MathSciNet Google Scholar
Dyer, M., Frieze, A.: Computing the volume of convex bodies: a case where randomness provably helps. Probab. Comb. Appl. 44, 123–170 (1991)
MathSciNet Google Scholar
Dyer, M., Frieze, A., Kannan, R.: A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM 38(1), 1–17 (1991)
Article MathSciNet Google Scholar
Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Article MathSciNet Google Scholar
Goldstein, A.: Optimization of Lipschitz continuous functions. Math. Program. 13(1), 14–22 (1977)
Article MathSciNet Google Scholar
Hager, W.W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2(1), 35–58 (2006)
MathSciNet Google Scholar
Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer Science & Business Media, Berlin (2004)
Google Scholar
Jin, C., Netrapalli, P., Ge, R., Kakade, S.M., Jordan, M.I.: On nonconvex optimization for machine learning: gradients, stochasticity, and saddle points. J. ACM 68(2), 11 (2021)
Article MathSciNet Google Scholar
Jordan, M., Kornowski, G., Lin, T., Shamir, O., Zampetakis, M.: Deterministic nonsmooth nonconvex optimization. In: Proceedings of the 36th Conference on Learning Theory, pp. 4570–4597 (2023)
Kakade, S.M., Lee, J.D.: Provably correct automatic subdifferentiation for qualified programs. In: Advances in Neural Information Processing Systems, vol. 31, pp. 7125–7135 (2018)
Kiwiel, K.C.: Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 18(2), 379–388 (2007)
Article MathSciNet Google Scholar
Kiwiel, K.C.: A nonderivative version of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 20(4), 1983–1994 (2010)
Article MathSciNet Google Scholar
Kong, S., Lewis, A.: The cost of nonconvexity in deterministic nonsmooth optimization. arXiv preprint arXiv:2210.00652 (2022)
Kornowski, G., Shamir, O.: On the complexity of finding small subgradients in nonsmooth optimization. arXiv preprint arXiv:2209.10346 (2022)
Kornowski, G., Shamir, O.: Oracle complexity in nonsmooth nonconvex optimization. J. Mach. Learn. Res. 23(314), 1–44 (2022)
MathSciNet Google Scholar
Li, J., So, A.M.C., Ma, W.K.: Understanding notions of stationarity in nonsmooth optimization: a guided tour of various constructions of subdifferential for nonsmooth functions. IEEE Signal Process. Mag. 37(5), 18–31 (2020)
Article Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)
Article MathSciNet Google Scholar
Majewski, S., Miasojedow, B., Moulines, E.: Analysis of nonsmooth stochastic approximation: the differential inclusion approach. arXiv preprint arXiv:1805.01916 (2018)
Metel, M.R., Takeda, A.: Perturbed iterate SGD for Lipschitz continuous loss functions. J. Optim. Theory Appl. 195(2), 504–547 (2022)
Munkres, J.R.: Topology. Pearson Prentice Hall, Hoboken (2000)
Google Scholar
Nemirovskij, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, Hoboken (1983)
Google Scholar
Nesterov, Yu.: A method of solving a convex programming problem with convergence rate O\((1/k^2)\). Dokl. Akad. Nauk 269, 543–547 (1983)
MathSciNet Google Scholar
Nesterov, Yu.: How to make the gradients small. Optima 88, 10–11 (2012)
Google Scholar
Nesterov, Yu.: Lectures on Convex Optimization. Springer, Berlin (2018)
Book Google Scholar
Nesterov, Yu., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer Science & Business Media, Berlin (2009)
Google Scholar
Tian, L., So, A.M.C.: On the hardness of computing near-approximate stationary points of Clarke regular nonsmooth nonconvex problems and certain DC programs. In: ICML Workshop on Beyond First-Order Methods in ML Systems (2021)
Tian, L., Zhou, K., So, A.M.C.: On the finite-time complexity and practical computation of approximate stationarity concepts of Lipschitz functions. In: Proceedings of the 39th International Conference on Machine Learning, pp. 21360–21379 (2022)
Vavasis, S.A.: Black-box complexity of local minimization. SIAM J. Optim. 3(1), 60–80 (1993)
Article MathSciNet Google Scholar
Woodworth, B., Srebro, N.: Tight complexity bounds for optimizing composite objectives. In: Advances in Neural Information Processing Systems, vol. 29, pp. 3646–3654 (2016)
Zhang, J., Lin, H., Jegelka, S., Jadbabaie, A., Sra, S.: Complexity of finding stationary points of nonsmooth nonconvex functions. In: Proceedings of the 37th International Conference on Machine Learning, pp. 11173–11182 (2020)

Download references

Author information

Authors and Affiliations

Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong
Lai Tian & Anthony Man-Cho So

Authors

Lai Tian
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Man-Cho So
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anthony Man-Cho So.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported in part by the Hong Kong Research Grants Council (RGC) General Research Fund (GRF) project CUHK 14216122.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tian, L., So, A.MC. No dimension-free deterministic algorithm computes approximate stationarities of Lipschitzians. Math. Program. (2024). https://doi.org/10.1007/s10107-023-02031-6

Download citation

Received: 18 January 2023
Accepted: 30 October 2023
Published: 06 January 2024
DOI: https://doi.org/10.1007/s10107-023-02031-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

No dimension-free deterministic algorithm computes approximate stationarities of Lipschitzians

Abstract

Access this article

Similar content being viewed by others

Bernstein–Jackson Inequalities on Gaussian Hilbert Spaces

Random Gradient-Free Minimization of Convex Functions

Upper and Lower Bounds for the p-Numerical Radii of Operators

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

No dimension-free deterministic algorithm computes approximate stationarities of Lipschitzians

Abstract

Access this article

Similar content being viewed by others

Bernstein–Jackson Inequalities on Gaussian Hilbert Spaces

Random Gradient-Free Minimization of Convex Functions

Upper and Lower Bounds for the p-Numerical Radii of Operators

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation