Advertisement

Mathematical Programming

, Volume 174, Issue 1–2, pp 327–358 | Cite as

A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo–Tseng error bound property

  • Man-Chung Yue
  • Zirui ZhouEmail author
  • Anthony Man-Cho So
Full Length Paper Series B
  • 309 Downloads

Abstract

We propose a new family of inexact sequential quadratic approximation (SQA) methods, which we call the inexact regularized proximal Newton (IRPN) method, for minimizing the sum of two closed proper convex functions, one of which is smooth and the other is possibly non-smooth. Our proposed method features strong convergence guarantees even when applied to problems with degenerate solutions while allowing the inner minimization to be solved inexactly. Specifically, we prove that when the problem possesses the so-called Luo–Tseng error bound (EB) property, IRPN converges globally to an optimal solution, and the local convergence rate of the sequence of iterates generated by IRPN is linear, superlinear, or even quadratic, depending on the choice of parameters of the algorithm. Prior to this work, such EB property has been extensively used to establish the linear convergence of various first-order methods. However, to the best of our knowledge, this work is the first to use the Luo–Tseng EB property to establish the superlinear convergence of SQA-type methods for non-smooth convex minimization. As a consequence of our result, IRPN is capable of solving regularized regression or classification problems under the high-dimensional setting with provable convergence guarantees. We compare our proposed IRPN with several empirically efficient algorithms by applying them to the \(\ell _1\)-regularized logistic regression problem. Experiment results show the competitiveness of our proposed method.

Keywords

Convex composite minimization Sequential quadratic approximation Proximal Newton method Error bound Superlinear convergence 

Mathematics Subject Classification

49M15 65K10 90C55 

Notes

Acknowledgements

We thank the anonymous reviewers for their detailed and helpful comments. Most of the work of the first and second authors was done when they were Ph.D. students at the Department of Systems Engineering and Engineering Management of The Chinese University of Hong Kong.

References

  1. 1.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Becker, S., Fadili, J.: A quasi-Newton proximal splitting method. In Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K. Q. (eds), Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference, pp. 2618–2626 (2012)Google Scholar
  3. 3.
    Bhatia, R.: Matrix Analysis, Volume 169 of Graduate. Springer, New York (1997)Google Scholar
  4. 4.
    Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for L-1 regularized optimization. Math. Program. Ser. B 157(2), 375–396 (2016)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. Springer Monographs in Mathematics. Springer, New York (2009)Google Scholar
  7. 7.
    Facchinei, F., Fischer, A., Herrich, M.: A family of Newton methods for nonsmooth constrained systems with nonisolated solutions. Math. Methods Oper. Res. 77(3), 433–443 (2013)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Facchinei, F., Fischer, A., Herrich, M.: An LP-Newton method: nonsmooth equations, KKT systems, and nonisolated solutions. Math. Program. Ser. A 146(1–2), 1–36 (2014)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Facchinei, F., Pang, J.-S.: Finite–Dimensional Variational Inequalities and Complementarity Problems, vol. 1. Springer, New York (2003)zbMATHGoogle Scholar
  10. 10.
    Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(Aug), 1871–1874 (2008)zbMATHGoogle Scholar
  11. 11.
    Fischer, A.: Local behavior of an iterative framework for generalized equations with nonisolated solutions. Math. Program. Ser. A 94(1), 91–124 (2002)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Fischer, A., Herrich, M., Izmailov, A.F., Solodov, M.V.: A globally convergent LP-Newton method. SIAM J. Optim. 26(4), 2012–2033 (2016)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Hou, K., Zhou, Z., So, A.M.-C., Luo, Z.-Q.: On the linear convergence of the proximal gradient method for trace norm regularization. In Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. Q., (eds), Advances in Neural Information Processing Systems 26: Proceedings of the 2013 Conference, pp. 710–718 (2013)Google Scholar
  15. 15.
    Hsieh, C.-J., Dhillon, I. S., Ravikumar, P. K., Sustik, M. A.: Sparse inverse covariance matrix estimation using quadratic approximation. In: Shawe-Taylor, J., Zemel, R. S., Bartlett, P., Pereira, F.C.N., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 24: Proceedings of the 2011 Conference, pp. 2330–2338 (2011)Google Scholar
  16. 16.
    Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient Methods under the Polyak-Łojasiewicz Condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds) Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016), Part I, Vol. 9851 of Lecture Notes in Artificial Intelligence, pp. 795–811. Springer International Publishing AG, Cham, Switzerland (2016)Google Scholar
  17. 17.
    Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Li, D.-H., Fukushima, M., Qi, L., Yamashita, N.: Regularized Newton methods for convex minimization problems with singular solutions. Comput. Optim. Appl. 28(2), 131–147 (2004)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. (2017).  https://doi.org/10.1007/s10208-017-9366-8
  20. 20.
    LIBSVM Data: Classification, Regression, and Multi-label. https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/
  21. 21.
    Liu, H., So, A. M.-C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Preprint (2017)Google Scholar
  22. 22.
    Luo, Z.-Q., Tseng, P.: Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 2(1), 43–54 (1992)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Luo, Z.-Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30(2), 408–425 (1992)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Moré, J.J.: The Levenberg–Marquardt algorithm: implementation and theory. In: Watson, G.A. (ed.) Numerical Analysis, Volume 630 of Lecture Notes in Mathematics, pp. 105–116. Springer, Berlin (1978)Google Scholar
  26. 26.
    Nesterov, Yu.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)zbMATHGoogle Scholar
  27. 27.
    Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, second edn. Springer, New York (2006)Google Scholar
  28. 28.
    O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Olsen, P.A., Oztoprak, F., Nocedal, J., Rennie, S.: Newton-like methods for sparse inverse covariance estimation. In: Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference, pp. 755–763 (2012)Google Scholar
  30. 30.
    Pang, J.-S.: A posteriori error bounds for the linearly-constrained variational inequality problem. Math. Oper. Res. 12(3), 474–484 (1987)MathSciNetGoogle Scholar
  31. 31.
    Pang, J.-S.: Error bounds in mathematical programming. Math. Program. 79(1–3), 299–332 (1997)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends\(\textregistered \) in Optimization 1(3), 127–239 (2014)Google Scholar
  33. 33.
    Qi, H., Sun, D.: A quadratically convergent newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl. 28(2), 360–385 (2006)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Sardy, S., Antoniadis, A., Tseng, P.: Automatic smoothing with wavelets for a wide class of distributions. J. Comput. Gr. Stat. 13(2), 399–421 (2004)MathSciNetGoogle Scholar
  35. 35.
    Scheinberg, K., Tang, X.: Practical inexact proximal quasi-newton method with global complexity analysis. Math. Program. Ser. A 160(1–2), 495–529 (2016)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Schmidt, M., van den Berg, E., Friedlander, M.P., Murphy, K.: Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS 2009), pp. 456–463 (2009)Google Scholar
  37. 37.
    Tseng, P.: Error bounds and superlinear convergence analysis of some Newton-type methods in optimization. In: Nonlinear Optimization and Related Topics, vol. 36 of Applied Optimization, pp. 445–462. Springer, Dordrecht (2000)Google Scholar
  38. 38.
    Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. Ser. B 125(2), 263–295 (2010)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117(1–2), 387–423 (2009)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Wen, B., Chen, X., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27(1), 124–145 (2017)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)MathSciNetzbMATHGoogle Scholar
  42. 42.
    Yamashita, N., Fukushima, M.: On the rate of convergence of the Levenberg–Marquardt method. In: Alefeld, G., Chen, X. (eds.) Topics in Numerical Analysis, Volume 15 of Computing Supplement, pp. 239–249. Springer, Wien (2001)Google Scholar
  43. 43.
    Yen, I. E.-H., Hsieh, C.-J., Ravikumar, P. K., Dhillon, I. S.: Constant nullspace strong convexity and fast convergence of proximal methods under high-dimensional settings. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 27: Proceedings of the 2014 Conference, pp. 1008–1016 (2014)Google Scholar
  44. 44.
    Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale L1-regularized linear classification. J. Mach. Learn. Res. 11(Nov), 3183–3234 (2010)MathSciNetzbMATHGoogle Scholar
  45. 45.
    Yuan, G.-X., Ho, C.-H., Lin, C.-J.: An improved GLMNET for L1-regularized logistic regression. J. Mach. Learn. Res. 13(1), 1999–2030 (2012)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Yun, S., Toh, K.-C.: A coordinate gradient descent method for \(\ell _1\)-regularized convex minimization. Comput. Optim. Appl. 48(2), 273–307 (2011)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Zhang, H., Jiang, J., Luo, Z.-Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. China 1(2), 163–186 (2013)zbMATHGoogle Scholar
  48. 48.
    Zhong, K., Yen, I.E.-H., Dhillon, I.S., Ravikumar, P.: Proximal quasi–Newton for computationally intensive \(\ell _1\)–regularized \(M\)-estimators. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems 27: Proceedings of the 2014 Conference, pp. 2375–2383 (2014)Google Scholar
  49. 49.
    Zhou, Z., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. Ser. A 165(2), 689–728 (2017)MathSciNetzbMATHGoogle Scholar
  50. 50.
    Zhou, Z., Zhang, Q., So, A.M.-C.: \(\ell _{1,p}\)-norm regularization: error bounds and convergence rate analysis of first-order methods. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1501–1510 (2015)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2018

Authors and Affiliations

  1. 1.Imperial College Business SchoolImperial College LondonLondonUK
  2. 2.Department of MathematicsHong Kong Baptist UniversityKowloon TongHong Kong
  3. 3.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongShatinHong Kong

Personalised recommendations