Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points

Balasubramanian, Krishnakumar; Ghadimi, Saeed

doi:10.1007/s10208-021-09499-8

Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points

Published: 19 March 2021

Volume 22, pages 35–76, (2022)
Cite this article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Krishnakumar Balasubramanian¹^na1 &
Saeed Ghadimi²^na1

1705 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional setting, and saddle point avoiding. To handle constrained optimization, we first propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. To facilitate zeroth-order optimization in high dimensions, we explore the advantages of structural sparsity assumptions. Specifically, (i) we highlight an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step size and (ii) propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality. We next focus on avoiding saddle points in nonconvex setting. Toward that, we interpret the Gaussian smoothing technique for estimating gradient based on zeroth-order information as an instantiation of first-order Stein’s identity. Based on this, we provide a novel linear-(in dimension) time estimator of the Hessian matrix of a function using only zeroth-order information, which is based on second-order Stein’s identity. We then provide a zeroth-order variant of cubic regularized Newton method for avoiding saddle points and discuss its rate of convergence to local minima.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sum-of-Squares Relaxations for Information Theory and Variational Inference

Article 05 April 2024

Francis Bach

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Yurii Nesterov & Vladimir Spokoiny

Finding global minima via kernel approximations

Article 04 April 2024

Alessandro Rudi, Ulysse Marteau-Ferey & Francis Bach

Notes

We remark that our step size choice requires knowledge of a rough upper bound on the true sparsity parameter.
For a definition of almost-differentiable function, we refer the reader to Definition 1 in [75].

References

Agarwal, A., Dekel, O., Xiao, L.: Optimal algorithms for online convex optimization with multi-point bandit feedback. In: Proceedings of The 23rd Conference on Learning Theory, pp. 28–40 (2010)
Akhavan, A., Pontil, M., Tsybakov, A.: Exploiting higher order smoothness in derivative-free optimization and continuous bandits. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Allen-Zhu, Z.: Natasha 2: Faster non-convex optimization than SGD. In: Advances in Neural Information Processing Systems, pp. 2680–2691 (2018)
Bach, F., Perchet, V.: Highly-smooth zero-th order online optimization. In: V. Feldman, A. Rakhlin, O. Shamir (eds.) 29th Annual Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 49, pp. 257–283. PMLR (2016)
Beck, A.: First-Order Methods in Optimization, vol. 25. Society for Industrial and Applied Mathematics (SIAM) (2017)
Belloni, A., Liang, T., Narayanan, H., Rakhlin, A.: Escaping the local minima via simulated annealing: Optimization of approximately convex functions. In: P. Grunwald, E. Hazan, S. Kale (eds.) Proceedings of The 28th Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 40, pp. 240–265. PMLR (2015)
Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization: analysis, algorithms, and engineering applications, vol. 2. Society for Industrial and Applied Mathematics (SIAM) (2001)
Bertsekas, D.P.: Nonlinear programming. Athena scientific Belmont (2016)
Bertsekas, D.P., Scientific, A.: Convex optimization algorithms. Athena Scientific Belmont (2015)
Bhojanapalli, S., Neyshabur, B., Srebro, N.: Global optimality of local search for low rank matrix recovery. In: Advances in Neural Information Processing Systems, pp. 3873–3881 (2016)
Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press (2004)
Book Google Scholar
Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends^® in Machine Learning 5(1), 1–122 (2012)
Bubeck, S., Lee, Y.T., Eldan, R.: Kernel-based methods for bandit convex optimization. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 72–85 (2017)
Cai, H., Mckenzie, D., Yin, W., Zhang, Z.: Zeroth-order regularized optimization (ZORO): Approximately sparse gradients and adaptive sampling (2020)
Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM Journal on Optimization 28(2), 1751–1772 (2018)
Article MathSciNet Google Scholar
Cartis, C., Gould, N.I., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization, Part I: Motivation, convergence and numerical results. Mathematical Programming 127(2), 245–295 (2011)
Article MathSciNet Google Scholar
Cartis, C., Gould, N.I., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization, Part II: Worst-case function-and derivative-evaluation complexity. Mathematical programming 130(2), 295–319 (2011)
Article MathSciNet Google Scholar
Cartis, C., Gould, N.I., Toint, P.L.: Second-order optimality and beyond: Characterization and evaluation complexity in convexly constrained nonlinear optimization. Foundations of Computational Mathematics 18(5), 1073–1107 (2018)
Article MathSciNet Google Scholar
Chen, L., Zhang, M., Hassani, H., Karbasi, A.: Black box submodular maximization: Discrete and continuous settings. In: S. Chiappa, R. Calandra (eds.) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 108, pp. 1058–1070 (2020)
Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: ZOO: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. ACM (2017)
Choromanski, K., Rowland, M., Sindhwani, V., Turner, R., Weller, A.: Structured evolution with compact architectures for scalable policy optimization. In: Proceedings of the 35th International Conference on Machine Learning. PMLR (2018)
Conn, A., Scheinberg, K., Vicente, L.: Introduction to derivative-free optimization, vol. 8. Society of Industrial and Applied Mathematics (SIAM) (2009)
Dani, V., Kakade, S.M., Hayes, T.P.: The price of bandit information for online optimization. In: Advances in Neural Information Processing Systems, pp. 345–352 (2008)
Demyanov, V., Rubinov, A.: Approximate methods in optimization problems. American Elsevier Publishing (1970)
DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constructive Approximation 33(1), 125–143 (2011)
Article MathSciNet Google Scholar
Donoho, D.L.: Compressed sensing. IEEE Transactions on information theory 52(4), 1289–1306 (2006)
Article MathSciNet Google Scholar
Duchi, J., Jordan, M., Wainwright, M., Wibisono, A.: Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory 61(5), 2788–2806 (2015)
Article MathSciNet Google Scholar
Elibol, M., Lei, L., Jordan, M.I.: Variance reduction with sparse gradients. In: Proceedings of the 8th International Conference on Learning Representations (ICLR), pp. 1058–1070 (2020)
Erdogdu, M.A.: Newton-Stein method: an optimization method for GLMs via Stein’s lemma. The Journal of Machine Learning Research 17(1), 7565–7616 (2016)
MathSciNet MATH Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Research Logistics Quarterly 3, 95–110 (1956)
Article MathSciNet Google Scholar
Gasnikov, A.V., Krymova, E.A., Lagunovskaya, A.A., Usmanova, I.N., Fedorenko, F.A.: Stochastic online optimization. single-point and multi-point non-linear multi-armed bandits. convex and strongly-convex case. Automation and remote control 78(2), 224–234 (2017)
Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points: Online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015)
Ge, R., Lee, J.D., Ma, T.: Matrix completion has no spurious local minimum. In: Advances in Neural Information Processing Systems, pp. 2973–2981 (2016)
Ghadimi, S.: Conditional gradient type methods for composite nonlinear and stochastic optimization. Mathematical Programming (2018). https://doi.org/10.1007/s10107-017-1225-5
Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 23(4), 2341–2368 (2013)
Article MathSciNet Google Scholar
Han, C., Yuan, M.: Information based complexity for high dimensional sparse functions. Journal of Complexity 57, 101443 (2020)
Article MathSciNet Google Scholar
Hazan, E., Kale, S.: Projection-free online learning. In: Proceedings of the 29th International Conference on International Conference on Machine Learning, pp. 1843–1850 (2012)
Hazan, E., Levy, K.: Bandit convex optimization: Towards tight bounds. In: Advances in Neural Information Processing Systems, pp. 784–792 (2014)
Hazan, E., Luo, H.: Variance-reduced and projection-free stochastic optimization. In: International Conference on Machine Learning, pp. 1263–1271 (2016)
Hearn, D.: The gap function of a convex program. Operations Research Letters 2, 95–110 (1982)
MATH Google Scholar
Hu, X., Prashanth, L.A., György, A., Szepesvari, C.: (Bandit) Convex Optimization with Biased Noisy Gradient Oracles. In: The 19th International Conference on Artificial Intelligence and Statistics, pp. 3420–3428 (2016)
Jaggi, M.: Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, pp. 427–435 (2013)
Jain, P., Kar, P.: Non-convex optimization for machine learning.Foundations and Trends^® in Machine Learning 10(3-4), 142–336 (2017)
Jain, P., Tewari, A., Kar, P.: On iterative hard thresholding methods for high-dimensional m-estimation. In: Advances in Neural Information Processing Systems, pp. 685–693 (2014)
Jamieson, K., Nowak, R., Recht, B.: Query complexity of derivative-free optimization. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2012)
Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. In: International Conference on Machine Learning, pp. 1724–1732 (2017)
Kawaguchi, K., Kaelbling, L.P.: Elimination of all bad local minima in deep learning. arXiv:1901.00279
Lan, G., Zhou, Y.: Conditional gradient sliding for convex optimization. SIAM Journal on Optimization 26(2), 1379–1409 (2016)
Article MathSciNet Google Scholar
Lattimore, T.: Improved regret for zeroth-order adversarial bandit convex optimisation. arXiv:2006.00475
Li, J., Balasubramanian, K., Ma, S.: Stochastic zeroth-order riemannian derivative estimation and optimization. arXiv:2003.11238 (2020)
Mania, H., Guy, A., Recht, B.: Simple random search provides a competitive approach to reinforcement learning. In: Advances in Neural Information Processing Systems (2018)
Minsker, S.: Sub-gaussian estimators of the mean of a random matrix with heavy-tailed entries. The Annals of Statistics 46(6A), 2871–2903 (2018)
Article MathSciNet Google Scholar
Mockus, J.: Bayesian approach to global optimization: theory and applications, vol. 37. Springer Science & Business Media (2012)
Mokhtari, A., Hassani, H., Karbasi, A.: Conditional gradient method for stochastic submodular maximization: Closing the gap. In: International Conference on Artificial Intelligence and Statistics, pp. 1886–1895 (2018)
Mokhtari, A., Hassani, H., Karbasi, A.: Stochastic conditional gradient methods: From convex minimization to submodular maximization. Journal of Machine Learning Research 21, 1–49 (2020)
MathSciNet MATH Google Scholar
Murty, K.G., Kabadi, S.N.: Some NP-complete problems in quadratic and nonlinear programming. Mathematical programming 39(2), 117–129 (1987)
Article MathSciNet Google Scholar
Nemirovski, A.S., Yudin, D.: Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley, XV (1983)
Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: a basic course. Kluwer Academic Publishers, Massachusetts (2004)
Book Google Scholar
Nesterov, Y.: Introductory lectures on convex optimization: A basic course, vol. 87. Springer Science & Business Media (2013)
Nesterov, Y., Polyak, B.: Cubic regularization of newton method and its global performance. Mathematical Programming 108(1), 177–205 (2006)
Article MathSciNet Google Scholar
Nesterov, Y., Spokoiny, V.: Random gradient-free minimization of convex functions. Foundations of Computational Mathematics 17, 527–566 (2017)
Article MathSciNet Google Scholar
Nestrov, Y.: Implementable tensor methods in unconstrained convex optimization. Mathematical Programming 186, 157–183 (2021)
Article MathSciNet Google Scholar
Nocedal, J., Wright, S.J.: Numerical optimization. Springer Science & Business Media (2006)
Raskutti, G., Wainwright, M.J., Yu, B.: Minimax-optimal rates for sparse additive models over kernel classes via convex programming. The Journal of Machine Learning Research 13(1), 389–427 (2012)
MathSciNet MATH Google Scholar
Reddi, S., Sra, S., Póczos, B., Smola, A.: Stochastic Frank-Wolfe Methods for Nonconvex Optimization. In: Proceedings of the 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1244–1251 (2016)
Reddi, S., Zaheer, M., Sra, S., Poczos, B., Bach, F., Salakhutdinov, R., Smola, A.: A generic approach for escaping saddle points. In: International Conference on Artificial Intelligence and Statistics, pp. 1233–1242 (2018)
Rio, E.: Moment inequalities for sums of dependent random variables under projective conditions. Journal of Theoretical Probability 22(1), 146–163 (2009)
Article MathSciNet Google Scholar
Rubinstein, R., Kroese, D.: Simulation and the Monte Carlo method, vol. 10. John Wiley & Sons, New Jersey (2016)
Book Google Scholar
Saha, A., Tewari, A.: Improved regret guarantees for online smooth convex optimization with bandit feedback. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 636–642 (2011)
Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864
Shamir, O.: On the complexity of bandit and derivative-free stochastic convex optimization. In: Conference on Learning Theory, pp. 3–24 (2013)
Snoek, J., Larochelle, H., Adams, R.: Practical bayesian optimization of machine learning algorithms. In: Advances in neural information processing systems, pp. 2951–2959 (2012)
Spall, J.: Introduction to stochastic search and optimization: estimation, simulation, and control, vol. 65. John Wiley & Sons, New Jersey (2005)
MATH Google Scholar
Stein, C.: A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory. The Regents of the University of California (1972)
Stein, C.M.: Estimation of the mean of a multivariate normal distribution. The annals of Statistics pp. 1135–1151 (1981)
Sun, J., Qu, Q., Wright, J.: When are nonconvex problems not scary? arXiv:1510.06096
Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. Foundations of Computational Mathematics 18(5), 1131–1198 (2018)
Article MathSciNet Google Scholar
Tripuraneni, N., Stern, M., Jin, C., Regier, J., Jordan, M.: Stochastic cubic regularization for fast nonconvex optimization. In: Advances in Neural Information Processing Systems, pp. 2899–2908 (2018)
Tropp, J.A.: The expected norm of a sum of independent random matrices: An elementary approach. In: High Dimensional Probability VII, pp. 173–202. Springer (2016)
Tyagi, H., Kyrillidis, A., Gärtner, B., Krause, A.: Algorithms for learning sparse additive models with interactions in high dimensions. Information and Inference: A Journal of the IMA 7(2), 183–249 (2018)
Article MathSciNet Google Scholar
Wang, Y., Du, S., Balakrishnan, S., Singh, A.: Stochastic zeroth-order optimization in high dimensions. In: A. Storkey, F. Perez-Cruz (eds.) Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 84, pp. 1356–1365 (2018)
Wojtaszczyk, P.: Complexity of approximation of functions of few variables in high dimensions. Journal of Complexity 27(2), 141–150 (2011)
Article MathSciNet Google Scholar
Xu, P., Roosta-Khorasani, F., Mahoney, M.W.: Newton-type methods for non-convex optimization under inexact hessian information. Mathematical Programming 184, 35–70 (2020)
Article MathSciNet Google Scholar

Download references

Author information

Krishnakumar Balasubramanian and Saeed Ghadimi have contributed equally to this work.

Authors and Affiliations

Department of Statistics, University of California, Davis, CA, USA
Krishnakumar Balasubramanian
Department of Management Sciences, University of Waterloo, Waterloo, ON, Canada
Saeed Ghadimi

Authors

Krishnakumar Balasubramanian
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Ghadimi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeed Ghadimi.

Additional information

Communicated by Francis Bach.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balasubramanian, K., Ghadimi, S. Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points. Found Comput Math 22, 35–76 (2022). https://doi.org/10.1007/s10208-021-09499-8

Download citation

Received: 16 January 2019
Revised: 13 January 2021
Accepted: 11 February 2021
Published: 19 March 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10208-021-09499-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points

Abstract

Access this article

Similar content being viewed by others

Sum-of-Squares Relaxations for Information Theory and Variational Inference

Random Gradient-Free Minimization of Convex Functions

Finding global minima via kernel approximations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points

Abstract

Access this article

Similar content being viewed by others

Sum-of-Squares Relaxations for Information Theory and Variational Inference

Random Gradient-Free Minimization of Convex Functions

Finding global minima via kernel approximations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation