Skip to main content

Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization

Abstract

We consider the problem of minimizing composite functions of the form \(f(g(x))+h(x)\), where f and h are convex functions (which can be nonsmooth) and g is a smooth vector mapping. In addition, we assume that g is the average of finite number of component mappings or the expectation over a family of random component mappings. We propose a class of stochastic variance-reduced prox-linear algorithms for solving such problems and bound their sample complexities for finding an \(\epsilon \)-stationary point in terms of the total number of evaluations of the component mappings and their Jacobians. When g is a finite average of N components, we obtain sample complexity \({\mathcal {O}}(N+ N^{4/5}\epsilon ^{-1})\) for both mapping and Jacobian evaluations. When g is a general expectation, we obtain sample complexities of \({\mathcal {O}}(\epsilon ^{-5/2})\) and \({\mathcal {O}}(\epsilon ^{-3/2})\) for component mappings and their Jacobians respectively. If in addition f is smooth, then improved sample complexities of \({\mathcal {O}}(N+N^{1/2}\epsilon ^{-1})\) and \({\mathcal {O}}(\epsilon ^{-3/2})\) are derived for g being a finite average and a general expectation respectively, for both component mapping and Jacobian evaluations.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.

  2. 2.

    http://yann.lecun.com/exdb/mnist/.

  3. 3.

    http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.

References

  1. 1.

    Asi, H., Duchi, J.C.: The importance of better models in stochastic optimization. Proc. Natl. Acad. Sci. 116(46), 22924–22930 (2019)

    MathSciNet  Article  Google Scholar 

  2. 2.

    Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: convergence, optimality, and adaptivity. SIAM J. Optim. 29(3), 2257–2290 (2019)

    MathSciNet  Article  Google Scholar 

  3. 3.

    Bai, Y., Duchi, J., Mei, S.: Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs. arXiv preprint arXiv:1903.00184, (2019)

  4. 4.

    Bertsekas, D.P.: Approximation procedures based on the method of multipliers. J. Optim. Theory Appl. 23(4), 487–510 (1977)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Bertsekas, D.P.: Nonlinear programming. J. Oper. Res. Soc. 48(3), 334 (1997)

    MathSciNet  Article  Google Scholar 

  6. 6.

    Blanchet, J., Goldfarb, D., Iyengar, G., Li, F., Zhou, C.: Unbiased simulation for optimizing stochastic function compositions. arXiv preprint arXiv:1711.07564 (2017)

  7. 7.

    Burke, J.V.: Descent methods for composite nondifferentiable optimization problems. Math. Program. 33(3), 260–279 (1985)

    MathSciNet  Article  Google Scholar 

  8. 8.

    Burke, J.V., Engle, A.: Line search methods for convex-composite optimization. arXiv preprint arXiv:1806.05218 (2018)

  9. 9.

    Burke, J.V., Engle, A.: Strong metric (sub) regularity of KKT mappings for piecewise linear-quadratic convex-composite optimization. arXiv preprint arXiv:1805.01073 (2018)

  10. 10.

    Burke, J.V., Ferris, M.C.: A Gauss-Newton method for convex composite optimization. Math. Program. 71(2), 179–194 (1995)

    MathSciNet  Article  Google Scholar 

  11. 11.

    Cartis, C., Gould, N.I.M., Toint, P.L.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)

    MathSciNet  Article  Google Scholar 

  12. 12.

    Charisopoulos, V., Davis, D., Díaz, M., Drusvyatskiy, D.: Composite optimization for robust blind deconvolution. arXiv preprint arXiv:1901.01624 (2019)

  13. 13.

    Crane, R., Roosta, F.: DINGO: distributed newton-type method for gradient-norm optimization. In: Advances in Neural Information Processing Systems, vol. 32, pp. 9498–9508. Curran Associates Inc. (2019)

  14. 14.

    Cutkosky, A., Orabona, F.: Momentum-based variance reduction in non-convex sgd. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

  15. 15.

    Dann, C., Neumann, G., Peters, J.: Policy evaluation with temporal differences: a survey and comparison. J. Mach. Learn. Res. 15(1), 809–883 (2014)

    MathSciNet  MATH  Google Scholar 

  16. 16.

    Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)

    MathSciNet  Article  Google Scholar 

  17. 17.

    Davis, D., Grimmer, B.: Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM J. Optim. 29(3), 1908–1930 (2019)

    MathSciNet  Article  Google Scholar 

  18. 18.

    Dekel, O., Gilad-Bachrach, R., Shamir, O., Xiao, L.: Optimal distributed online prediction using mini-batches. J. Mach. Learn. Res. 13, 165–202 (2012)

    MathSciNet  MATH  Google Scholar 

  19. 19.

    Drusvyatskiy, D.: The proximal point method revisited. SIAG/OPT Views and News (A Forum for the SIAM Activity Group on Optimization) 26(1), 1–8 (2017)

    MathSciNet  Google Scholar 

  20. 20.

    Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. (2018)

  21. 21.

    Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Programm. 178, 503–558 (2019)

    MathSciNet  Article  Google Scholar 

  22. 22.

    Duchi, J.C.: Ruan, F.: Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4), 3229–3259 (2018)

  23. 23.

    Duchi, J.C., Ruan, F.: Solving (most) of a set of quadratic equalities: composite optimization for robust phase retrieval. Inf. Inference A J. IMA 8(3), 471–529 (2019)

    MathSciNet  Article  Google Scholar 

  24. 24.

    Fang, C., Li, C.J., Lin, Z., Zhang, T.: Spider: near-optimal non-convex optimization via stochastic path-integrated differential estimator. In: Advances in Neural Information Processing Systems, pp. 687–697 (2018)

  25. 25.

    Fletcher, R., Watson, G.A.: First and second order conditions for a class of nondifferentiable optimization problems. Math. Program. 18(1), 291–307 (1980)

    MathSciNet  Article  Google Scholar 

  26. 26.

    Ghadimi, S., Ruszczyński, A., Wang, M.: A single time-scale stochastic approximation method for nested stochastic optimization. SIAM J. Optim. 30(1), 960–979 (2020)

    MathSciNet  Article  Google Scholar 

  27. 27.

    Hiriart-Urruty, J.-B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals, vol. 305. Springer, Berlin (2013)

    MATH  Google Scholar 

  28. 28.

    Huo, Z., Gu, B., Liu, J., Huang, H.: Accelerated method for stochastic composition optimization with nonsmooth regularization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp 3287–3294 (2018)

  29. 29.

    Iusem, A.N., Jofré, A., Oliveira, R.I., Phompson, P.: Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 27(2), 686–724 (2017)

    MathSciNet  Article  Google Scholar 

  30. 30.

    Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323

  31. 31.

    Koshal, J., Nedić, A., Shanbhag, U.B.: Regularized iterative stochastic approximation methods for stochastic variational inequality problems. IEEE Trans. Automatic Control 58(3), 594–609 (2013)

    MathSciNet  Article  Google Scholar 

  32. 32.

    Lan, G., Zhou, Z.: Algorithms for stochastic optimization with function or expectation constraints. Comput. Optim. Appl. 1–38,(2020)

  33. 33.

    Lewis, A.S., Wright, S.J.: A proximal method for composite minimization. Math. Program. 158(1–2), 501–546 (2016)

    MathSciNet  Article  Google Scholar 

  34. 34.

    Lian, X., Wang, M., Liu, J.: Finite-sum composition optimization via variance reduced gradient descent. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1159–1167 (2017)

  35. 35.

    Nesterov, Y.: Modified Gauss-Newton scheme with worst case guarantees for global performance. Optim. Methods Softw. 22(3), 469–483 (2007)

    MathSciNet  Article  Google Scholar 

  36. 36.

    Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2613–2621. JMLR.org (2017)

  37. 37.

    Nguyen, L.M., van Dijk, M., Phan, D.T., Nguyen, P.H., Weng, T.-W., Kalagnanam, J.R.: Optimal finite-sum smooth non-convex optimization with SARAH. arXiv preprint arXiv:1901.07648 (2019)

  38. 38.

    Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)

    MATH  Google Scholar 

  39. 39.

    Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181, 244–278 (2019)

    MathSciNet  Article  Google Scholar 

  40. 40.

    Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)

    MATH  Google Scholar 

  41. 41.

    Pham, N.H., Nguyen, L.M., Phan, D.T., Tran-Dinh, Q.: ProxSARAH: An efficient algorithmic framework for stochastic composite nonconvex optimization. J. Mach. Learn. Res. (JMLR) 21, 1–48 (2020)

    MathSciNet  MATH  Google Scholar 

  42. 42.

    Poljak, B.T.: On the Bertsekas’ method for minimization of composite functions. In: International Symposium on Systems Optimization and Analysis, pp. 179–186. Springer, Berlin (1979)

  43. 43.

    Polyak, B.T.: Introduction to Optimization. Optimization Software, Inc. (1987)

  44. 44.

    Qi, Q., Guo, Z., Xu, Y., Jin, R., Tianbao, Y.: A practicalonline method for distributionally deep robust optimization. arXiv preprint arXiv:2006.10138 (2020)

  45. 45.

    Rafique, H., Liu, M., Lin, Q., Yang, T.: Non-convex min–max optimization: provable algorithms and applications in machine learning. arXiv preprint arXiv:1810.02060 (2018)

  46. 46.

    Rahimian, H., Mehrotra, S.: Distributionally robust optimization: a review. arXiv preprint arXiv:1908.05659 (2019)

  47. 47.

    Rockafellar, R.T.: Coherent approaches to risk in optimization under uncertainty. In: INFORMS TutORials. Operations Research (2007)

  48. 48.

    Rockafellar, R.T., Uryasev, S., et al.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)

    Article  Google Scholar 

  49. 49.

    Roosta, F., Liu, Y., Xu, P., Mahoney, M.W.: Newton-MR: Newton’s method without smoothness or convexity. arXiv preprint arXiv:1810.00303 (2018)

  50. 50.

    Ruszczyński, A.: Advances in risk-averse optimization. In: INFORMS TutORials in Operation Research (2013)

  51. 51.

    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  52. 52.

    Tran-Dinh, Q., Pham, N., Nguyen, L.: Stochastic gauss-newton algorithms for nonconvex compositional optimization. In: International Conference on Machine Learning, pp. 9572–9582. PMLR (2020)

  53. 53.

    Wang, M., Fang, E.X., Liu, H.: Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Math. Program. 161(1–2), 419–449 (2017)

    MathSciNet  Article  Google Scholar 

  54. 54.

    Wang, M., Liu, J., Fang, E.: Accelerating stochastic composition optimization. In: Advances in Neural Information Processing Systems, pp. 1714–1722 (2016)

  55. 55.

    Wang, Z., Ji, K., Zhou, Y., Liang, Y., Tarokh, V.: SpiderBoost and momentum: Faster variance reduction algorithms. In: Advances in Neural Information Processing Systems, vol. 32, pp. 2406–2416. Curran Associates, Inc. (2019)

  56. 56.

    Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)

    MathSciNet  Article  Google Scholar 

  57. 57.

    Yu, Y., Huang, L.: Fast stochastic variance reduced ADMM for stochastic composition optimization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pp. 3364–3370 (2017)

  58. 58.

    Zhang, J., Xiao, L.: A composite randomized incremental gradient method. In: Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 7454–7462, Long Beach, California, USA (2019)

  59. 59.

    Zhang, J., Xiao, L.: Multi-level composite stochastic optimization via nested variance reduction. SIAM J. Optim. 31(2), 1131–1157 (2021)

    MathSciNet  Article  Google Scholar 

  60. 60.

    Zhang, J., Xiao, L.: A stochastic composite gradient method with incremental variance reduction. In: Advances in Neural Information Processing Systems, vol. 32, pp. 9078–9088. Curran Associates Inc. (2019)

  61. 61.

    Zhou, D., Xu, P., Gu, Q.: Stochastic variance-reduced cubic regularized Newton methods. In: Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 5990–5999. PMLR (2018)

Download references

Acknowledgements

The authors thank Dmitriy Drusvyatskiy for contributing the example of truncated stochastic gradient method in Sect. 1.1. We are also grateful to the two anonymous referees for their helpful comments and suggestions.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lin Xiao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Xiao, L. Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization. Math. Program. (2021). https://doi.org/10.1007/s10107-021-01709-z

Download citation

Keywords

  • Stochastic composite optimization
  • Nonsmooth optimization
  • Variance reduction
  • Prox-linear algorithm
  • Sample complexity

Mathematics Subject Classification

  • 68Q25
  • 68W20
  • 90C26