On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization

Milzarek, Andre; Xiao, Xiantao; Wen, Zaiwen; Ulbrich, Michael

doi:10.1007/s11425-020-1865-1

On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization

Articles
Published: 30 March 2022

Volume 65, pages 2151–2170, (2022)
Cite this article

Science China Mathematics Aims and scope Submit manuscript

Andre Milzarek^1,2,3,
Xiantao Xiao⁴,
Zaiwen Wen⁵ &
…
Michael Ulbrich⁶

314 Accesses
3 Citations
Explore all metrics

Abstract

In this work, we present probabilistic local convergence results for a stochastic semismooth Newton method for a class of stochastic composite optimization problems involving the sum of smooth nonconvex and nonsmooth convex terms in the objective function. We assume that the gradient and Hessian information of the smooth part of the objective function can only be approximated and accessed via calling stochastic first- and second-order oracles. The approach combines stochastic semismooth Newton steps, stochastic proximal gradient steps and a globalization strategy based on growth conditions. We present tail bounds and matrix concentration inequalities for the stochastic oracles that can be utilized to control the approximation errors via appropriately adjusting or increasing the sampling rates. Under standard local assumptions, we prove that the proposed algorithm locally turns into a pure stochastic semismooth Newton method and converges r-linearly or r-superlinearly with high probability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gradient-Free Two-Point Methods for Solving Stochastic Nonsmooth Convex Optimization Problems with Small Non-Random Noises

Article 14 August 2018

General Convergence Analysis of Stochastic First-Order Methods for Composite Optimization

Article 22 February 2021

A globally convergent proximal Newton-type method in nonsmooth convex optimization

Article 22 March 2022

References

Agarwal N, Bullins B, Hazan E. Second-order stochastic optimization for machine learning in linear time. J Mach Learn Res, 2017, 18: 1–40
MathSciNet MATH Google Scholar
Asmussen S, Glynn P W. Stochastic Simulation: Algorithms and Analysis. Stochastic Modelling and Applied Probability, vol. 57. New York: Springer, 2007
Google Scholar
Bach F, Jenatton R, Mairal J, et al. Optimization with sparsity-inducing penalties. Found Trends Mach Learn, 2011, 4: 1–106
Article MATH Google Scholar
Bauschke H H, Combettes P L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Cham: Springer, 2011
Google Scholar
Bhattacharya R, Waymire E C. A Basic Course in Probability Theory, 2nd ed. Cham: Springer, 2016
Book MATH Google Scholar
Bollapragada R, Byrd R, Nocedal J. Exact and inexact subsampled Newton methods for optimization. IMA J Numer Anal, 2019, 39: 545–578
Article MathSciNet MATH Google Scholar
Bottou L, Curtis F E, Nocedal J. Optimization methods for large-scale machine learning. SIAM Rev, 2018, 60: 223–311
Article MathSciNet MATH Google Scholar
Byrd R H, Chin G M, Neveitt W, et al. On the use of stochastic Hessian information in optimization methods for machine learning. SIAM J Optim, 2011, 21: 977–995
Article MathSciNet MATH Google Scholar
Byrd R H, Chin G M, Nocedal J, et al. Sample size selection in optimization methods for machine learning. Math Program, 2012, 134: 127–155
Article MathSciNet MATH Google Scholar
Byrd R H, Hansen S L, Nocedal J, et al. A stochastic quasi-Newton method for large-scale optimization. SIAM J Optim, 2016, 26: 1008–1031
Article MathSciNet MATH Google Scholar
Clarke F H. Optimization and Nonsmooth Analysis, 2nd ed. Classics in Applied Mathematics, vol. 5. Philadelphia: SIAM, 1990
Google Scholar
Combettes P L, Wajs V R. Signal recovery by proximal forward-backward splitting. Multiscale Model Simul, 2005, 4: 1168–1200
Article MathSciNet MATH Google Scholar
Dang C D, Lan G. Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J Optim, 2015, 25: 856–881
Article MathSciNet MATH Google Scholar
Deng L, Yu D. Deep learning: Methods and applications. Found Trends Signal Process, 2014, 7: 197–387
Article MathSciNet MATH Google Scholar
Eisen M, Mokhtari A, Ribeiro A. Large scale empirical risk minimization via truncated adaptive Newton method. In: Proceedings of Machine Learning Research. International Conference on Artificial Intelligence and Statistics, vol. 84. Boston: Microtome Publishing, 2018, 1–9
Google Scholar
Erdogdu M A, Montanari A. Convergence rates of sub-sampled Newton methods. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 2. Cambridge: MIT Press, 2015, 3052C3060
Google Scholar
Facchinei F, Pang J-S. Finite-Dimensional Variational Inequalities and Complementarity Problems, Volume II. New York: Springer-Verlag, 2003
MATH Google Scholar
Fu M C. Optimization for simulation: Theory vs. practice. Informs J Comput, 2002, 14: 192–215
Article MathSciNet MATH Google Scholar
Ghadimi S, Lan G. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J Optim, 2013, 23: 2341–2368
Article MathSciNet MATH Google Scholar
Ghadimi S, Lan G. Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math Program, 2016, 156: 59–99
Article MathSciNet MATH Google Scholar
Ghadimi S, Lan G, Zhang H. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math Program, 2016, 155: 267–305
Article MathSciNet MATH Google Scholar
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning, 2nd ed. Springer Series in Statistics. New York: Springer, 2009
Book MATH Google Scholar
Hiriart-Urruty J-B, Strodiot J-J, Nguyen V H. Generalized Hessian matrix and second-order optimality conditions for problems with C^1,1 data. Appl Math Optim, 1984, 11: 43–56
Article MathSciNet MATH Google Scholar
Iusem A N, Jofré A, Oliveira R I, et al. Extragradient method with variance reduction for stochastic variational inequalities. SIAM J Optim, 2017, 27: 686–724
Article MathSciNet MATH Google Scholar
Juditsky A B, Nemirovski A S. Large deviations of vector-valued martingales in 2-smooth normed spaces. arXiv: 0809.0813, 2008
Koh K, Kim S-J, Boyd S. An interior-point method for large-scale l₁-regularized logistic regression. J Mach Learn Res, 2007, 8: 1519–1555
MathSciNet MATH Google Scholar
Kohler J M, Lucchi A. Sub-sampled cubic regularization for non-convex optimization. In: Proceedings of Machine Learning Research. International Conference on Machine Learning, vol. 70. Cambridge: JMLR, 2017, 1895–1904
Google Scholar
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444
Article Google Scholar
Lee J D, Sun Y, Saunders M A. Proximal Newton-type methods for minimizing composite functions. SIAM J Optim, 2014, 24: 1420–1443
Article MathSciNet MATH Google Scholar
Mairal J, Bach F, Ponce J, et al. Online dictionary learning for sparse coding. In: Proceedings of the 26th Annual International Conference on Machine Learning. New York: Association for Computing Machinery, 2009, 689–696
Google Scholar
Mason L, Baxter J, Bartlett P, et al. Boosting algorithms as gradient descent in function space. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. Denver: NIPS, 1999, 512–518
Google Scholar
Meng F, Sun D, Zhao G. Semismoothness of solutions to generalized equations and the Moreau-Yosida regularization. Math Program, 2005, 104: 561–581
Article MathSciNet MATH Google Scholar
Milzarek A. Numerical methods and second order theory for nonsmooth problems. PhD Dissertation. München: Technische Universität München, 2016
Google Scholar
Milzarek A, Ulbrich M. A semismooth Newton method with multidimensional filter globalization for l₁-optimization. SIAM J Optim, 2014, 24: 298–333
Article MathSciNet MATH Google Scholar
Milzarek A, Xiao X, Cen S, et al. A stochastic semismooth Newton method for nonsmooth nonconvex optimization. SIAM J Optim, 2019, 29: 2916–2948
Article MathSciNet MATH Google Scholar
Moré J J, Sorensen D C. Computing a trust region step. SIAM J Sci Stat Comput, 1983, 4: 553–572
Article MathSciNet MATH Google Scholar
Moreau J-J. Proximité et dualité dans un espace hilbertien. Bull Soc Math France, 1965, 93: 273–299
Article MathSciNet MATH Google Scholar
Mutný M. Stochastic second-order optimization via von Neumann series. arXiv:1612.04694, 2016
Mutný M, Richtárik P. Parallel stochastic Newton method. J Comput Math, 2018, 36: 404–425
Article MathSciNet MATH Google Scholar
Pang J-S, Qi L. Nonsmooth equations: Motivation and algorithms. SIAM J Optim, 1993, 3: 443–465
Article MathSciNet MATH Google Scholar
Patrinos P, Stella L, Bemporad A. Forward-backward truncated Newton methods for large-scale convex composite optimization. arXiv:1402.6655, 2014
Pilanci M, Wainwright M J. Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM J Optim, 2017, 27: 205–245
Article MathSciNet MATH Google Scholar
Qi L. Convergence analysis of some algorithms for solving nonsmooth equations. Math Oper Res, 1993, 18: 227–244
Article MathSciNet MATH Google Scholar
Qi L, Sun J. A nonsmooth version of Newton’s method. Math Program, 1993, 58: 353–367
Article MathSciNet MATH Google Scholar
Robbins H, Monro S. A stochastic approximation method. Ann Math Statist, 1951, 22: 400–407
Article MathSciNet MATH Google Scholar
Roosta-Khorasani F, Mahoney M W. Sub-sampled Newton methods. Math Program, 2019, 174: 293–326
Article MathSciNet MATH Google Scholar
Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw, 2015, 61: 85–117
Article Google Scholar
Shalev-Shwartz S, Ben-David S. Understanding Machine Learning: From Theory to Algorithms. New York: Cambridge University Press, 2014
Book MATH Google Scholar
Shalev-Shwartz S, Tewari A. Stochastic methods for ℓ₁-regularized loss minimization. J Mach Learn Res, 2011, 12: 1865–1892
MathSciNet MATH Google Scholar
Shi J, Yin W, Osher S, et al. A fast hybrid algorithm for large-scale ℓ₁-regularized logistic regression. J Mach Learn Res, 2010, 11: 713–741
MathSciNet MATH Google Scholar
Shi Z, Liu R. Large scale optimization with proximal stochastic Newton-type gradient descent. In: Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 9284. Cham: Springer, 2015, 691–704
Chapter Google Scholar
Tropp J A. User-friendly tail bounds for sums of random matrices. Found Comput Math, 2012, 12: 389–434
Article MathSciNet MATH Google Scholar
Wang J, Zhang T. Improved optimization of finite sums with minibatch stochastic variance reduced proximal iterations. In: Proceedings of 10th NIPS Workshop on Optimization for Machine Learning. Long Beach: NIPS, 2017, 1–6
Google Scholar
Wang X, Ma S, Goldfarb D, et al. Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J Optim, 2017, 27: 927–956
Article MathSciNet MATH Google Scholar
Weisstein E W. Infinite product. From MathWorld—A Wolfram Web Resource, http://mathworld.wolfram.com/InfiniteProduct.html, 2017
Williams D. Probability with Martingales. Cambridge Mathematical Textbooks. Cambridge: Cambridge University Press, 1991
Google Scholar
Xiao X, Li Y, Wen Z, et al. A regularized semi-smooth Newton method with projection steps for composite convex programs. J Sci Comput, 2018, 76: 364–389
Article MathSciNet MATH Google Scholar
Xu P, Roosta F, Mahoney M W. Newton-type methods for non-convex optimization under inexact Hessian information. Math Program, 2020, 184: 35–70
Article MathSciNet MATH Google Scholar
Xu P, Roosta F, Mahoney M W. Second-order optimization for non-convex machine learning: An empirical study. In: Proceedings of the 2020 SIAM International Conference on Data Mining. Philadelphia: SIAM, 2020, 199–207
Chapter Google Scholar
Xu P, Yang J, Roosta-Khorasani F, et al. Sub-sampled Newton methods with non-uniform sampling. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. New York: Curran Associates, 2016, 3000–3008
Google Scholar
Xu Y, Yin W. Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J Optim, 2015, 25: 1686–1716
Article MathSciNet MATH Google Scholar
Yao Z, Xu P, Roosta F, et al. Inexact non-convex Newton-type methods. Informs J Optim, 2021, 3: 154–182
Article MathSciNet Google Scholar
Ye H, Luo L, Zhang Z. Approximate Newton methods and their local convergence. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70. Cambridge: PMLR, 2017, 3931–3939
Google Scholar

Download references

Acknowledgements

This work was supported by the Fundamental Research Fund—Shenzhen Research Institute for Big Data Startup Fund (Grant No. JCYJ-AM20190601), the Shenzhen Institute of Artificial Intelligence and Robotics for Society, National Natural Science Foundation of China (Grant Nos. 11831002 and 11871135), the Key-Area Research and Development Program of Guangdong Province (Grant No. 2019B121204008) and Beijing Academy of Artificial Intelligence. The authors are grateful to the anonymous referees for their helpful comments and suggestions.

Author information

Authors and Affiliations

School of Data Science, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, China
Andre Milzarek
Shenzhen Research Institute of Big Data, Shenzhen, 518172, China
Andre Milzarek
Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518172, China
Andre Milzarek
School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, China
Xiantao Xiao
Beijing International Center for Mathematical Research, Peking University, Beijing, 100871, China
Zaiwen Wen
Department of Mathematics, Technical University of Munich, Garching bei München, 85748, Germany
Michael Ulbrich

Authors

Andre Milzarek
View author publications
You can also search for this author in PubMed Google Scholar
Xiantao Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Zaiwen Wen
View author publications
You can also search for this author in PubMed Google Scholar
Michael Ulbrich
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zaiwen Wen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Milzarek, A., Xiao, X., Wen, Z. et al. On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization. Sci. China Math. 65, 2151–2170 (2022). https://doi.org/10.1007/s11425-020-1865-1

Download citation

Received: 11 July 2020
Accepted: 26 March 2021
Published: 30 March 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11425-020-1865-1

Keywords

MSC(2020)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization

Abstract

Access this article

Similar content being viewed by others

Gradient-Free Two-Point Methods for Solving Stochastic Nonsmooth Convex Optimization Problems with Small Non-Random Noises

General Convergence Analysis of Stochastic First-Order Methods for Composite Optimization

A globally convergent proximal Newton-type method in nonsmooth convex optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

MSC(2020)

Navigation

On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization

Abstract

Access this article

Similar content being viewed by others

Gradient-Free Two-Point Methods for Solving Stochastic Nonsmooth Convex Optimization Problems with Small Non-Random Noises

General Convergence Analysis of Stochastic First-Order Methods for Composite Optimization

A globally convergent proximal Newton-type method in nonsmooth convex optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

MSC(2020)

Search

Navigation