Skip to main content
Log in

On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization

  • Articles
  • Published:
Science China Mathematics Aims and scope Submit manuscript

Abstract

In this work, we present probabilistic local convergence results for a stochastic semismooth Newton method for a class of stochastic composite optimization problems involving the sum of smooth nonconvex and nonsmooth convex terms in the objective function. We assume that the gradient and Hessian information of the smooth part of the objective function can only be approximated and accessed via calling stochastic first- and second-order oracles. The approach combines stochastic semismooth Newton steps, stochastic proximal gradient steps and a globalization strategy based on growth conditions. We present tail bounds and matrix concentration inequalities for the stochastic oracles that can be utilized to control the approximation errors via appropriately adjusting or increasing the sampling rates. Under standard local assumptions, we prove that the proposed algorithm locally turns into a pure stochastic semismooth Newton method and converges r-linearly or r-superlinearly with high probability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal N, Bullins B, Hazan E. Second-order stochastic optimization for machine learning in linear time. J Mach Learn Res, 2017, 18: 1–40

    MathSciNet  MATH  Google Scholar 

  2. Asmussen S, Glynn P W. Stochastic Simulation: Algorithms and Analysis. Stochastic Modelling and Applied Probability, vol. 57. New York: Springer, 2007

    Google Scholar 

  3. Bach F, Jenatton R, Mairal J, et al. Optimization with sparsity-inducing penalties. Found Trends Mach Learn, 2011, 4: 1–106

    Article  MATH  Google Scholar 

  4. Bauschke H H, Combettes P L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Cham: Springer, 2011

    Google Scholar 

  5. Bhattacharya R, Waymire E C. A Basic Course in Probability Theory, 2nd ed. Cham: Springer, 2016

    Book  MATH  Google Scholar 

  6. Bollapragada R, Byrd R, Nocedal J. Exact and inexact subsampled Newton methods for optimization. IMA J Numer Anal, 2019, 39: 545–578

    Article  MathSciNet  MATH  Google Scholar 

  7. Bottou L, Curtis F E, Nocedal J. Optimization methods for large-scale machine learning. SIAM Rev, 2018, 60: 223–311

    Article  MathSciNet  MATH  Google Scholar 

  8. Byrd R H, Chin G M, Neveitt W, et al. On the use of stochastic Hessian information in optimization methods for machine learning. SIAM J Optim, 2011, 21: 977–995

    Article  MathSciNet  MATH  Google Scholar 

  9. Byrd R H, Chin G M, Nocedal J, et al. Sample size selection in optimization methods for machine learning. Math Program, 2012, 134: 127–155

    Article  MathSciNet  MATH  Google Scholar 

  10. Byrd R H, Hansen S L, Nocedal J, et al. A stochastic quasi-Newton method for large-scale optimization. SIAM J Optim, 2016, 26: 1008–1031

    Article  MathSciNet  MATH  Google Scholar 

  11. Clarke F H. Optimization and Nonsmooth Analysis, 2nd ed. Classics in Applied Mathematics, vol. 5. Philadelphia: SIAM, 1990

    Google Scholar 

  12. Combettes P L, Wajs V R. Signal recovery by proximal forward-backward splitting. Multiscale Model Simul, 2005, 4: 1168–1200

    Article  MathSciNet  MATH  Google Scholar 

  13. Dang C D, Lan G. Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J Optim, 2015, 25: 856–881

    Article  MathSciNet  MATH  Google Scholar 

  14. Deng L, Yu D. Deep learning: Methods and applications. Found Trends Signal Process, 2014, 7: 197–387

    Article  MathSciNet  MATH  Google Scholar 

  15. Eisen M, Mokhtari A, Ribeiro A. Large scale empirical risk minimization via truncated adaptive Newton method. In: Proceedings of Machine Learning Research. International Conference on Artificial Intelligence and Statistics, vol. 84. Boston: Microtome Publishing, 2018, 1–9

    Google Scholar 

  16. Erdogdu M A, Montanari A. Convergence rates of sub-sampled Newton methods. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 2. Cambridge: MIT Press, 2015, 3052C3060

    Google Scholar 

  17. Facchinei F, Pang J-S. Finite-Dimensional Variational Inequalities and Complementarity Problems, Volume II. New York: Springer-Verlag, 2003

    MATH  Google Scholar 

  18. Fu M C. Optimization for simulation: Theory vs. practice. Informs J Comput, 2002, 14: 192–215

    Article  MathSciNet  MATH  Google Scholar 

  19. Ghadimi S, Lan G. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J Optim, 2013, 23: 2341–2368

    Article  MathSciNet  MATH  Google Scholar 

  20. Ghadimi S, Lan G. Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math Program, 2016, 156: 59–99

    Article  MathSciNet  MATH  Google Scholar 

  21. Ghadimi S, Lan G, Zhang H. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math Program, 2016, 155: 267–305

    Article  MathSciNet  MATH  Google Scholar 

  22. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning, 2nd ed. Springer Series in Statistics. New York: Springer, 2009

    Book  MATH  Google Scholar 

  23. Hiriart-Urruty J-B, Strodiot J-J, Nguyen V H. Generalized Hessian matrix and second-order optimality conditions for problems with C1,1 data. Appl Math Optim, 1984, 11: 43–56

    Article  MathSciNet  MATH  Google Scholar 

  24. Iusem A N, Jofré A, Oliveira R I, et al. Extragradient method with variance reduction for stochastic variational inequalities. SIAM J Optim, 2017, 27: 686–724

    Article  MathSciNet  MATH  Google Scholar 

  25. Juditsky A B, Nemirovski A S. Large deviations of vector-valued martingales in 2-smooth normed spaces. arXiv: 0809.0813, 2008

  26. Koh K, Kim S-J, Boyd S. An interior-point method for large-scale l1-regularized logistic regression. J Mach Learn Res, 2007, 8: 1519–1555

    MathSciNet  MATH  Google Scholar 

  27. Kohler J M, Lucchi A. Sub-sampled cubic regularization for non-convex optimization. In: Proceedings of Machine Learning Research. International Conference on Machine Learning, vol. 70. Cambridge: JMLR, 2017, 1895–1904

    Google Scholar 

  28. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436–444

    Article  Google Scholar 

  29. Lee J D, Sun Y, Saunders M A. Proximal Newton-type methods for minimizing composite functions. SIAM J Optim, 2014, 24: 1420–1443

    Article  MathSciNet  MATH  Google Scholar 

  30. Mairal J, Bach F, Ponce J, et al. Online dictionary learning for sparse coding. In: Proceedings of the 26th Annual International Conference on Machine Learning. New York: Association for Computing Machinery, 2009, 689–696

    Google Scholar 

  31. Mason L, Baxter J, Bartlett P, et al. Boosting algorithms as gradient descent in function space. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. Denver: NIPS, 1999, 512–518

    Google Scholar 

  32. Meng F, Sun D, Zhao G. Semismoothness of solutions to generalized equations and the Moreau-Yosida regularization. Math Program, 2005, 104: 561–581

    Article  MathSciNet  MATH  Google Scholar 

  33. Milzarek A. Numerical methods and second order theory for nonsmooth problems. PhD Dissertation. München: Technische Universität München, 2016

    Google Scholar 

  34. Milzarek A, Ulbrich M. A semismooth Newton method with multidimensional filter globalization for l1-optimization. SIAM J Optim, 2014, 24: 298–333

    Article  MathSciNet  MATH  Google Scholar 

  35. Milzarek A, Xiao X, Cen S, et al. A stochastic semismooth Newton method for nonsmooth nonconvex optimization. SIAM J Optim, 2019, 29: 2916–2948

    Article  MathSciNet  MATH  Google Scholar 

  36. Moré J J, Sorensen D C. Computing a trust region step. SIAM J Sci Stat Comput, 1983, 4: 553–572

    Article  MathSciNet  MATH  Google Scholar 

  37. Moreau J-J. Proximité et dualité dans un espace hilbertien. Bull Soc Math France, 1965, 93: 273–299

    Article  MathSciNet  MATH  Google Scholar 

  38. Mutný M. Stochastic second-order optimization via von Neumann series. arXiv:1612.04694, 2016

  39. Mutný M, Richtárik P. Parallel stochastic Newton method. J Comput Math, 2018, 36: 404–425

    Article  MathSciNet  MATH  Google Scholar 

  40. Pang J-S, Qi L. Nonsmooth equations: Motivation and algorithms. SIAM J Optim, 1993, 3: 443–465

    Article  MathSciNet  MATH  Google Scholar 

  41. Patrinos P, Stella L, Bemporad A. Forward-backward truncated Newton methods for large-scale convex composite optimization. arXiv:1402.6655, 2014

  42. Pilanci M, Wainwright M J. Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM J Optim, 2017, 27: 205–245

    Article  MathSciNet  MATH  Google Scholar 

  43. Qi L. Convergence analysis of some algorithms for solving nonsmooth equations. Math Oper Res, 1993, 18: 227–244

    Article  MathSciNet  MATH  Google Scholar 

  44. Qi L, Sun J. A nonsmooth version of Newton’s method. Math Program, 1993, 58: 353–367

    Article  MathSciNet  MATH  Google Scholar 

  45. Robbins H, Monro S. A stochastic approximation method. Ann Math Statist, 1951, 22: 400–407

    Article  MathSciNet  MATH  Google Scholar 

  46. Roosta-Khorasani F, Mahoney M W. Sub-sampled Newton methods. Math Program, 2019, 174: 293–326

    Article  MathSciNet  MATH  Google Scholar 

  47. Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw, 2015, 61: 85–117

    Article  Google Scholar 

  48. Shalev-Shwartz S, Ben-David S. Understanding Machine Learning: From Theory to Algorithms. New York: Cambridge University Press, 2014

    Book  MATH  Google Scholar 

  49. Shalev-Shwartz S, Tewari A. Stochastic methods for 1-regularized loss minimization. J Mach Learn Res, 2011, 12: 1865–1892

    MathSciNet  MATH  Google Scholar 

  50. Shi J, Yin W, Osher S, et al. A fast hybrid algorithm for large-scale 1-regularized logistic regression. J Mach Learn Res, 2010, 11: 713–741

    MathSciNet  MATH  Google Scholar 

  51. Shi Z, Liu R. Large scale optimization with proximal stochastic Newton-type gradient descent. In: Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 9284. Cham: Springer, 2015, 691–704

    Chapter  Google Scholar 

  52. Tropp J A. User-friendly tail bounds for sums of random matrices. Found Comput Math, 2012, 12: 389–434

    Article  MathSciNet  MATH  Google Scholar 

  53. Wang J, Zhang T. Improved optimization of finite sums with minibatch stochastic variance reduced proximal iterations. In: Proceedings of 10th NIPS Workshop on Optimization for Machine Learning. Long Beach: NIPS, 2017, 1–6

    Google Scholar 

  54. Wang X, Ma S, Goldfarb D, et al. Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J Optim, 2017, 27: 927–956

    Article  MathSciNet  MATH  Google Scholar 

  55. Weisstein E W. Infinite product. From MathWorld—A Wolfram Web Resource, http://mathworld.wolfram.com/InfiniteProduct.html, 2017

  56. Williams D. Probability with Martingales. Cambridge Mathematical Textbooks. Cambridge: Cambridge University Press, 1991

    Google Scholar 

  57. Xiao X, Li Y, Wen Z, et al. A regularized semi-smooth Newton method with projection steps for composite convex programs. J Sci Comput, 2018, 76: 364–389

    Article  MathSciNet  MATH  Google Scholar 

  58. Xu P, Roosta F, Mahoney M W. Newton-type methods for non-convex optimization under inexact Hessian information. Math Program, 2020, 184: 35–70

    Article  MathSciNet  MATH  Google Scholar 

  59. Xu P, Roosta F, Mahoney M W. Second-order optimization for non-convex machine learning: An empirical study. In: Proceedings of the 2020 SIAM International Conference on Data Mining. Philadelphia: SIAM, 2020, 199–207

    Chapter  Google Scholar 

  60. Xu P, Yang J, Roosta-Khorasani F, et al. Sub-sampled Newton methods with non-uniform sampling. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. New York: Curran Associates, 2016, 3000–3008

    Google Scholar 

  61. Xu Y, Yin W. Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J Optim, 2015, 25: 1686–1716

    Article  MathSciNet  MATH  Google Scholar 

  62. Yao Z, Xu P, Roosta F, et al. Inexact non-convex Newton-type methods. Informs J Optim, 2021, 3: 154–182

    Article  MathSciNet  Google Scholar 

  63. Ye H, Luo L, Zhang Z. Approximate Newton methods and their local convergence. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70. Cambridge: PMLR, 2017, 3931–3939

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Fundamental Research Fund—Shenzhen Research Institute for Big Data Startup Fund (Grant No. JCYJ-AM20190601), the Shenzhen Institute of Artificial Intelligence and Robotics for Society, National Natural Science Foundation of China (Grant Nos. 11831002 and 11871135), the Key-Area Research and Development Program of Guangdong Province (Grant No. 2019B121204008) and Beijing Academy of Artificial Intelligence. The authors are grateful to the anonymous referees for their helpful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zaiwen Wen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Milzarek, A., Xiao, X., Wen, Z. et al. On the local convergence of a stochastic semismooth Newton method for nonsmooth nonconvex optimization. Sci. China Math. 65, 2151–2170 (2022). https://doi.org/10.1007/s11425-020-1865-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11425-020-1865-1

Keywords

MSC(2020)

Navigation