Skip to main content
Log in

No dimension-free deterministic algorithm computes approximate stationarities of Lipschitzians

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We consider the oracle complexity of computing an approximate stationary point of a Lipschitz function. When the function is smooth, it is well known that the simple deterministic gradient method has finite dimension-free oracle complexity. However, when the function can be nonsmooth, it is only recently that a randomized algorithm with finite dimension-free oracle complexity has been developed. In this paper, we show that no deterministic algorithm can do the same. Moreover, even without the dimension-free requirement, we show that any finite-time deterministic method cannot be general zero-respecting. In particular, this implies that a natural derandomization of the aforementioned randomized algorithm cannot have finite-time complexity. Our results reveal a fundamental hurdle in modern large-scale nonconvex nonsmooth optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Here \(\partial f({\varvec{x}})\) is the Clarke subdifferential of f at \({\varvec{x}}\); see Definition 1 for details.

  2. Recall that f is \(\rho \)-weakly convex if \({\varvec{x}} \mapsto f({\varvec{x}}) + \frac{\rho }{2} \Vert {\varvec{x}}\Vert ^2\) is convex.

  3. In this paper, we assume that an algorithm will always start from \({\varvec{x}}^{(1)}={\varvec{0}}\). This is without loss of generality due to the lack of information about f before querying \({\varvec{x}}^{(1)}\) and the translational invariance of all considered function classes.

  4. We became aware of these concurrent, independent developments when a preliminary version of our manuscript was being reviewed for possible publication in the proceedings of a conference.

References

  1. Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units. In: International Conference on Learning Representations (2018)

  2. Benaïm, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1), 328–348 (2005)

    Article  MathSciNet  Google Scholar 

  3. Blyth, T.S.: Set Theory and Abstract Algebra. Longman Publishing Group, New York (1975)

    Google Scholar 

  4. Böckenhauer, H.J., Hromkovič, J., Komm, D., Krug, S., Smula, J., Sprock, A.: The string guessing problem as a method to prove lower bounds on the advice complexity. Theor. Comput. Sci. 554, 95–108 (2014)

    Article  MathSciNet  Google Scholar 

  5. Braun, G., Guzmán, C., Pokutta, S.: Lower bounds on the oracle complexity of nonsmooth convex optimization via information theory. IEEE Trans. Inf. Theory 63(7), 4709–4724 (2017)

    Article  MathSciNet  Google Scholar 

  6. Burke, J.V., Curtis, F.E., Lewis, A.S., Overton, M.L., Simões, L.E.: Gradient sampling methods for nonsmooth optimization. In: Numerical Nonsmooth Optimization: State of the Art Algorithms, pp. 201–225. Springer, Cham (2020)

    Chapter  Google Scholar 

  7. Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)

    Article  MathSciNet  Google Scholar 

  8. Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: “Convex until proven guilty”: dimension-free acceleration of gradient descent on non-convex functions. In: Proceedings of the 34th International Conference on Machine Learning, pp. 654–663 (2017)

  9. Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Lower bounds for finding stationary points I. Math. Program. 184(1–2), 71–120 (2020)

    Article  MathSciNet  Google Scholar 

  10. Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Lower bounds for finding stationary points II: first-order methods. Math. Program. 185(1), 315–355 (2021)

    Article  MathSciNet  Google Scholar 

  11. Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)

    Article  MathSciNet  Google Scholar 

  12. Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, Philadelphia (1990)

    Book  Google Scholar 

  13. Conn, A.R., Gould, N.I., Toint, P.L.: Trust Region Methods. SIAM, Philadelphia (2000)

    Book  Google Scholar 

  14. Cui, Y., Pang, J.S.: Modern Nonconvex Nondifferentiable Optimization. SIAM, Philadelphia (2021)

    Book  Google Scholar 

  15. Daniilidis, A., Drusvyatskiy, D.: Pathological subgradient dynamics. SIAM J. Optim. 30(2), 1327–1338 (2020)

    Article  MathSciNet  Google Scholar 

  16. Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)

    Article  MathSciNet  Google Scholar 

  17. Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found. Comput. Math. 20(1), 119–154 (2020)

    Article  MathSciNet  Google Scholar 

  18. Davis, D., Drusvyatskiy, D., Lee, Y.T., Padmanabhan, S., Ye, G.: A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions. In: Advances in Neural Information Processing Systems, vol. 35, pp. 6692–6703 (2022)

  19. Davis, D., Grimmer, B.: Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM J. Optim. 29(3), 1908–1930 (2019)

    Article  MathSciNet  Google Scholar 

  20. Dyer, M., Frieze, A.: Computing the volume of convex bodies: a case where randomness provably helps. Probab. Comb. Appl. 44, 123–170 (1991)

    MathSciNet  Google Scholar 

  21. Dyer, M., Frieze, A., Kannan, R.: A random polynomial-time algorithm for approximating the volume of convex bodies. J. ACM 38(1), 1–17 (1991)

    Article  MathSciNet  Google Scholar 

  22. Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)

    Article  MathSciNet  Google Scholar 

  23. Goldstein, A.: Optimization of Lipschitz continuous functions. Math. Program. 13(1), 14–22 (1977)

    Article  MathSciNet  Google Scholar 

  24. Hager, W.W., Zhang, H.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2(1), 35–58 (2006)

    MathSciNet  Google Scholar 

  25. Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer Science & Business Media, Berlin (2004)

    Google Scholar 

  26. Jin, C., Netrapalli, P., Ge, R., Kakade, S.M., Jordan, M.I.: On nonconvex optimization for machine learning: gradients, stochasticity, and saddle points. J. ACM 68(2), 11 (2021)

    Article  MathSciNet  Google Scholar 

  27. Jordan, M., Kornowski, G., Lin, T., Shamir, O., Zampetakis, M.: Deterministic nonsmooth nonconvex optimization. In: Proceedings of the 36th Conference on Learning Theory, pp. 4570–4597 (2023)

  28. Kakade, S.M., Lee, J.D.: Provably correct automatic subdifferentiation for qualified programs. In: Advances in Neural Information Processing Systems, vol. 31, pp. 7125–7135 (2018)

  29. Kiwiel, K.C.: Convergence of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 18(2), 379–388 (2007)

    Article  MathSciNet  Google Scholar 

  30. Kiwiel, K.C.: A nonderivative version of the gradient sampling algorithm for nonsmooth nonconvex optimization. SIAM J. Optim. 20(4), 1983–1994 (2010)

    Article  MathSciNet  Google Scholar 

  31. Kong, S., Lewis, A.: The cost of nonconvexity in deterministic nonsmooth optimization. arXiv preprint arXiv:2210.00652 (2022)

  32. Kornowski, G., Shamir, O.: On the complexity of finding small subgradients in nonsmooth optimization. arXiv preprint arXiv:2209.10346 (2022)

  33. Kornowski, G., Shamir, O.: Oracle complexity in nonsmooth nonconvex optimization. J. Mach. Learn. Res. 23(314), 1–44 (2022)

    MathSciNet  Google Scholar 

  34. Li, J., So, A.M.C., Ma, W.K.: Understanding notions of stationarity in nonsmooth optimization: a guided tour of various constructions of subdifferential for nonsmooth functions. IEEE Signal Process. Mag. 37(5), 18–31 (2020)

    Article  Google Scholar 

  35. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)

    Article  MathSciNet  Google Scholar 

  36. Majewski, S., Miasojedow, B., Moulines, E.: Analysis of nonsmooth stochastic approximation: the differential inclusion approach. arXiv preprint arXiv:1805.01916 (2018)

  37. Metel, M.R., Takeda, A.: Perturbed iterate SGD for Lipschitz continuous loss functions. J. Optim. Theory Appl. 195(2), 504–547 (2022)

  38. Munkres, J.R.: Topology. Pearson Prentice Hall, Hoboken (2000)

    Google Scholar 

  39. Nemirovskij, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience, Hoboken (1983)

    Google Scholar 

  40. Nesterov, Yu.: A method of solving a convex programming problem with convergence rate O\((1/k^2)\). Dokl. Akad. Nauk 269, 543–547 (1983)

    MathSciNet  Google Scholar 

  41. Nesterov, Yu.: How to make the gradients small. Optima 88, 10–11 (2012)

    Google Scholar 

  42. Nesterov, Yu.: Lectures on Convex Optimization. Springer, Berlin (2018)

    Book  Google Scholar 

  43. Nesterov, Yu., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)

    Article  MathSciNet  Google Scholar 

  44. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer Science & Business Media, Berlin (2009)

    Google Scholar 

  45. Tian, L., So, A.M.C.: On the hardness of computing near-approximate stationary points of Clarke regular nonsmooth nonconvex problems and certain DC programs. In: ICML Workshop on Beyond First-Order Methods in ML Systems (2021)

  46. Tian, L., Zhou, K., So, A.M.C.: On the finite-time complexity and practical computation of approximate stationarity concepts of Lipschitz functions. In: Proceedings of the 39th International Conference on Machine Learning, pp. 21360–21379 (2022)

  47. Vavasis, S.A.: Black-box complexity of local minimization. SIAM J. Optim. 3(1), 60–80 (1993)

    Article  MathSciNet  Google Scholar 

  48. Woodworth, B., Srebro, N.: Tight complexity bounds for optimizing composite objectives. In: Advances in Neural Information Processing Systems, vol. 29, pp. 3646–3654 (2016)

  49. Zhang, J., Lin, H., Jegelka, S., Jadbabaie, A., Sra, S.: Complexity of finding stationary points of nonsmooth nonconvex functions. In: Proceedings of the 37th International Conference on Machine Learning, pp. 11173–11182 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anthony Man-Cho So.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported in part by the Hong Kong Research Grants Council (RGC) General Research Fund (GRF) project CUHK 14216122.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, L., So, A.MC. No dimension-free deterministic algorithm computes approximate stationarities of Lipschitzians. Math. Program. (2024). https://doi.org/10.1007/s10107-023-02031-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10107-023-02031-6

Keywords

Mathematics Subject Classification

Navigation