Skip to main content
Log in

A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo–Tseng error bound property

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

We propose a new family of inexact sequential quadratic approximation (SQA) methods, which we call the inexact regularized proximal Newton (IRPN) method, for minimizing the sum of two closed proper convex functions, one of which is smooth and the other is possibly non-smooth. Our proposed method features strong convergence guarantees even when applied to problems with degenerate solutions while allowing the inner minimization to be solved inexactly. Specifically, we prove that when the problem possesses the so-called Luo–Tseng error bound (EB) property, IRPN converges globally to an optimal solution, and the local convergence rate of the sequence of iterates generated by IRPN is linear, superlinear, or even quadratic, depending on the choice of parameters of the algorithm. Prior to this work, such EB property has been extensively used to establish the linear convergence of various first-order methods. However, to the best of our knowledge, this work is the first to use the Luo–Tseng EB property to establish the superlinear convergence of SQA-type methods for non-smooth convex minimization. As a consequence of our result, IRPN is capable of solving regularized regression or classification problems under the high-dimensional setting with provable convergence guarantees. We compare our proposed IRPN with several empirically efficient algorithms by applying them to the \(\ell _1\)-regularized logistic regression problem. Experiment results show the competitiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Some authors refer to this as a convex composite minimization problem.

  2. For instance, the exact Hessian \(H_k=\nabla ^2 f(x^k)\) is used in [17]. If f is not strongly convex, then neither is the quadratic model (2). As such, the inner problem can have multiple minimizers and the next iterate \(x^{k+1}\) is not well defined.

  3. In [16] the authors considered global versions of the Luo–Tseng EB and KL properties and showed that they are equivalent. However, none of the scenarios listed in Fact 3 except (S1) are known to possess the global Luo–Tseng EB property stated in [16].

  4. Note that Assumption 1(a) is not required for Corollary 2 to hold; cf. Proposition 1.

  5. A similar EB property has been studied by Pang [30] for linearly constrained variational inequalities.

  6. The code can be downloaded from https://github.com/ZiruiZhou/IRPN.

  7. \(\Vert A\Vert ^2\) is computed via the MATLAB code lambda = eigs(A*A’,1,’LM’).

References

  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Becker, S., Fadili, J.: A quasi-Newton proximal splitting method. In Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K. Q. (eds), Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference, pp. 2618–2626 (2012)

  3. Bhatia, R.: Matrix Analysis, Volume 169 of Graduate. Springer, New York (1997)

    Book  Google Scholar 

  4. Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for L-1 regularized optimization. Math. Program. Ser. B 157(2), 375–396 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  5. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. Springer Monographs in Mathematics. Springer, New York (2009)

  7. Facchinei, F., Fischer, A., Herrich, M.: A family of Newton methods for nonsmooth constrained systems with nonisolated solutions. Math. Methods Oper. Res. 77(3), 433–443 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  8. Facchinei, F., Fischer, A., Herrich, M.: An LP-Newton method: nonsmooth equations, KKT systems, and nonisolated solutions. Math. Program. Ser. A 146(1–2), 1–36 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. Facchinei, F., Pang, J.-S.: Finite–Dimensional Variational Inequalities and Complementarity Problems, vol. 1. Springer, New York (2003)

    MATH  Google Scholar 

  10. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(Aug), 1871–1874 (2008)

    MATH  Google Scholar 

  11. Fischer, A.: Local behavior of an iterative framework for generalized equations with nonisolated solutions. Math. Program. Ser. A 94(1), 91–124 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  12. Fischer, A., Herrich, M., Izmailov, A.F., Solodov, M.V.: A globally convergent LP-Newton method. SIAM J. Optim. 26(4), 2012–2033 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  13. Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  14. Hou, K., Zhou, Z., So, A.M.-C., Luo, Z.-Q.: On the linear convergence of the proximal gradient method for trace norm regularization. In Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. Q., (eds), Advances in Neural Information Processing Systems 26: Proceedings of the 2013 Conference, pp. 710–718 (2013)

  15. Hsieh, C.-J., Dhillon, I. S., Ravikumar, P. K., Sustik, M. A.: Sparse inverse covariance matrix estimation using quadratic approximation. In: Shawe-Taylor, J., Zemel, R. S., Bartlett, P., Pereira, F.C.N., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 24: Proceedings of the 2011 Conference, pp. 2330–2338 (2011)

  16. Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient Methods under the Polyak-Łojasiewicz Condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds) Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016), Part I, Vol. 9851 of Lecture Notes in Artificial Intelligence, pp. 795–811. Springer International Publishing AG, Cham, Switzerland (2016)

  17. Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  18. Li, D.-H., Fukushima, M., Qi, L., Yamashita, N.: Regularized Newton methods for convex minimization problems with singular solutions. Comput. Optim. Appl. 28(2), 131–147 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  19. Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. (2017). https://doi.org/10.1007/s10208-017-9366-8

  20. LIBSVM Data: Classification, Regression, and Multi-label. https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/

  21. Liu, H., So, A. M.-C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Preprint (2017)

  22. Luo, Z.-Q., Tseng, P.: Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 2(1), 43–54 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  23. Luo, Z.-Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30(2), 408–425 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  24. Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  25. Moré, J.J.: The Levenberg–Marquardt algorithm: implementation and theory. In: Watson, G.A. (ed.) Numerical Analysis, Volume 630 of Lecture Notes in Mathematics, pp. 105–116. Springer, Berlin (1978)

  26. Nesterov, Yu.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)

    Book  MATH  Google Scholar 

  27. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, second edn. Springer, New York (2006)

  28. O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  29. Olsen, P.A., Oztoprak, F., Nocedal, J., Rennie, S.: Newton-like methods for sparse inverse covariance estimation. In: Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference, pp. 755–763 (2012)

  30. Pang, J.-S.: A posteriori error bounds for the linearly-constrained variational inequality problem. Math. Oper. Res. 12(3), 474–484 (1987)

    Article  MathSciNet  Google Scholar 

  31. Pang, J.-S.: Error bounds in mathematical programming. Math. Program. 79(1–3), 299–332 (1997)

    MathSciNet  MATH  Google Scholar 

  32. Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends\(\textregistered \) in Optimization 1(3), 127–239 (2014)

  33. Qi, H., Sun, D.: A quadratically convergent newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl. 28(2), 360–385 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  34. Sardy, S., Antoniadis, A., Tseng, P.: Automatic smoothing with wavelets for a wide class of distributions. J. Comput. Gr. Stat. 13(2), 399–421 (2004)

    Article  MathSciNet  Google Scholar 

  35. Scheinberg, K., Tang, X.: Practical inexact proximal quasi-newton method with global complexity analysis. Math. Program. Ser. A 160(1–2), 495–529 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  36. Schmidt, M., van den Berg, E., Friedlander, M.P., Murphy, K.: Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS 2009), pp. 456–463 (2009)

  37. Tseng, P.: Error bounds and superlinear convergence analysis of some Newton-type methods in optimization. In: Nonlinear Optimization and Related Topics, vol. 36 of Applied Optimization, pp. 445–462. Springer, Dordrecht (2000)

  38. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. Ser. B 125(2), 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  39. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117(1–2), 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  40. Wen, B., Chen, X., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27(1), 124–145 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  41. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  42. Yamashita, N., Fukushima, M.: On the rate of convergence of the Levenberg–Marquardt method. In: Alefeld, G., Chen, X. (eds.) Topics in Numerical Analysis, Volume 15 of Computing Supplement, pp. 239–249. Springer, Wien (2001)

    Chapter  MATH  Google Scholar 

  43. Yen, I. E.-H., Hsieh, C.-J., Ravikumar, P. K., Dhillon, I. S.: Constant nullspace strong convexity and fast convergence of proximal methods under high-dimensional settings. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 27: Proceedings of the 2014 Conference, pp. 1008–1016 (2014)

  44. Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale L1-regularized linear classification. J. Mach. Learn. Res. 11(Nov), 3183–3234 (2010)

    MathSciNet  MATH  Google Scholar 

  45. Yuan, G.-X., Ho, C.-H., Lin, C.-J.: An improved GLMNET for L1-regularized logistic regression. J. Mach. Learn. Res. 13(1), 1999–2030 (2012)

    MathSciNet  MATH  Google Scholar 

  46. Yun, S., Toh, K.-C.: A coordinate gradient descent method for \(\ell _1\)-regularized convex minimization. Comput. Optim. Appl. 48(2), 273–307 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  47. Zhang, H., Jiang, J., Luo, Z.-Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. China 1(2), 163–186 (2013)

    Article  MATH  Google Scholar 

  48. Zhong, K., Yen, I.E.-H., Dhillon, I.S., Ravikumar, P.: Proximal quasi–Newton for computationally intensive \(\ell _1\)–regularized \(M\)-estimators. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems 27: Proceedings of the 2014 Conference, pp. 2375–2383 (2014)

  49. Zhou, Z., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. Ser. A 165(2), 689–728 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  50. Zhou, Z., Zhang, Q., So, A.M.-C.: \(\ell _{1,p}\)-norm regularization: error bounds and convergence rate analysis of first-order methods. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1501–1510 (2015)

Download references

Acknowledgements

We thank the anonymous reviewers for their detailed and helpful comments. Most of the work of the first and second authors was done when they were Ph.D. students at the Department of Systems Engineering and Engineering Management of The Chinese University of Hong Kong.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zirui Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is supported in part by the Hong Kong Research Grants Council (RGC) General Research Fund (GRF) Projects CUHK 14206814 and CUHK 14208117 and in part by a gift grant from Microsoft Research Asia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yue, MC., Zhou, Z. & So, A.MC. A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo–Tseng error bound property. Math. Program. 174, 327–358 (2019). https://doi.org/10.1007/s10107-018-1280-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-018-1280-6

Keywords

Mathematics Subject Classification

Navigation