Abstract
We propose a new family of inexact sequential quadratic approximation (SQA) methods, which we call the inexact regularized proximal Newton (IRPN) method, for minimizing the sum of two closed proper convex functions, one of which is smooth and the other is possibly non-smooth. Our proposed method features strong convergence guarantees even when applied to problems with degenerate solutions while allowing the inner minimization to be solved inexactly. Specifically, we prove that when the problem possesses the so-called Luo–Tseng error bound (EB) property, IRPN converges globally to an optimal solution, and the local convergence rate of the sequence of iterates generated by IRPN is linear, superlinear, or even quadratic, depending on the choice of parameters of the algorithm. Prior to this work, such EB property has been extensively used to establish the linear convergence of various first-order methods. However, to the best of our knowledge, this work is the first to use the Luo–Tseng EB property to establish the superlinear convergence of SQA-type methods for non-smooth convex minimization. As a consequence of our result, IRPN is capable of solving regularized regression or classification problems under the high-dimensional setting with provable convergence guarantees. We compare our proposed IRPN with several empirically efficient algorithms by applying them to the \(\ell _1\)-regularized logistic regression problem. Experiment results show the competitiveness of our proposed method.
Similar content being viewed by others
Notes
Some authors refer to this as a convex composite minimization problem.
A similar EB property has been studied by Pang [30] for linearly constrained variational inequalities.
The code can be downloaded from https://github.com/ZiruiZhou/IRPN.
\(\Vert A\Vert ^2\) is computed via the MATLAB code lambda = eigs(A*A’,1,’LM’).
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Becker, S., Fadili, J.: A quasi-Newton proximal splitting method. In Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K. Q. (eds), Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference, pp. 2618–2626 (2012)
Bhatia, R.: Matrix Analysis, Volume 169 of Graduate. Springer, New York (1997)
Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for L-1 regularized optimization. Math. Program. Ser. B 157(2), 375–396 (2016)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. Springer Monographs in Mathematics. Springer, New York (2009)
Facchinei, F., Fischer, A., Herrich, M.: A family of Newton methods for nonsmooth constrained systems with nonisolated solutions. Math. Methods Oper. Res. 77(3), 433–443 (2013)
Facchinei, F., Fischer, A., Herrich, M.: An LP-Newton method: nonsmooth equations, KKT systems, and nonisolated solutions. Math. Program. Ser. A 146(1–2), 1–36 (2014)
Facchinei, F., Pang, J.-S.: Finite–Dimensional Variational Inequalities and Complementarity Problems, vol. 1. Springer, New York (2003)
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(Aug), 1871–1874 (2008)
Fischer, A.: Local behavior of an iterative framework for generalized equations with nonisolated solutions. Math. Program. Ser. A 94(1), 91–124 (2002)
Fischer, A., Herrich, M., Izmailov, A.F., Solodov, M.V.: A globally convergent LP-Newton method. SIAM J. Optim. 26(4), 2012–2033 (2016)
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Hou, K., Zhou, Z., So, A.M.-C., Luo, Z.-Q.: On the linear convergence of the proximal gradient method for trace norm regularization. In Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. Q., (eds), Advances in Neural Information Processing Systems 26: Proceedings of the 2013 Conference, pp. 710–718 (2013)
Hsieh, C.-J., Dhillon, I. S., Ravikumar, P. K., Sustik, M. A.: Sparse inverse covariance matrix estimation using quadratic approximation. In: Shawe-Taylor, J., Zemel, R. S., Bartlett, P., Pereira, F.C.N., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 24: Proceedings of the 2011 Conference, pp. 2330–2338 (2011)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient Methods under the Polyak-Łojasiewicz Condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds) Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2016), Part I, Vol. 9851 of Lecture Notes in Artificial Intelligence, pp. 795–811. Springer International Publishing AG, Cham, Switzerland (2016)
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)
Li, D.-H., Fukushima, M., Qi, L., Yamashita, N.: Regularized Newton methods for convex minimization problems with singular solutions. Comput. Optim. Appl. 28(2), 131–147 (2004)
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. (2017). https://doi.org/10.1007/s10208-017-9366-8
LIBSVM Data: Classification, Regression, and Multi-label. https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/
Liu, H., So, A. M.-C., Wu, W.: Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods. Preprint (2017)
Luo, Z.-Q., Tseng, P.: Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 2(1), 43–54 (1992)
Luo, Z.-Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30(2), 408–425 (1992)
Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)
Moré, J.J.: The Levenberg–Marquardt algorithm: implementation and theory. In: Watson, G.A. (ed.) Numerical Analysis, Volume 630 of Lecture Notes in Mathematics, pp. 105–116. Springer, Berlin (1978)
Nesterov, Yu.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, second edn. Springer, New York (2006)
O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)
Olsen, P.A., Oztoprak, F., Nocedal, J., Rennie, S.: Newton-like methods for sparse inverse covariance estimation. In: Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 25: Proceedings of the 2012 Conference, pp. 755–763 (2012)
Pang, J.-S.: A posteriori error bounds for the linearly-constrained variational inequality problem. Math. Oper. Res. 12(3), 474–484 (1987)
Pang, J.-S.: Error bounds in mathematical programming. Math. Program. 79(1–3), 299–332 (1997)
Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends\(\textregistered \) in Optimization 1(3), 127–239 (2014)
Qi, H., Sun, D.: A quadratically convergent newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl. 28(2), 360–385 (2006)
Sardy, S., Antoniadis, A., Tseng, P.: Automatic smoothing with wavelets for a wide class of distributions. J. Comput. Gr. Stat. 13(2), 399–421 (2004)
Scheinberg, K., Tang, X.: Practical inexact proximal quasi-newton method with global complexity analysis. Math. Program. Ser. A 160(1–2), 495–529 (2016)
Schmidt, M., van den Berg, E., Friedlander, M.P., Murphy, K.: Optimizing costly functions with simple constraints: a limited-memory projected quasi-Newton algorithm. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS 2009), pp. 456–463 (2009)
Tseng, P.: Error bounds and superlinear convergence analysis of some Newton-type methods in optimization. In: Nonlinear Optimization and Related Topics, vol. 36 of Applied Optimization, pp. 445–462. Springer, Dordrecht (2000)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. Ser. B 125(2), 263–295 (2010)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117(1–2), 387–423 (2009)
Wen, B., Chen, X., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27(1), 124–145 (2017)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009)
Yamashita, N., Fukushima, M.: On the rate of convergence of the Levenberg–Marquardt method. In: Alefeld, G., Chen, X. (eds.) Topics in Numerical Analysis, Volume 15 of Computing Supplement, pp. 239–249. Springer, Wien (2001)
Yen, I. E.-H., Hsieh, C.-J., Ravikumar, P. K., Dhillon, I. S.: Constant nullspace strong convexity and fast convergence of proximal methods under high-dimensional settings. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds), Advances in Neural Information Processing Systems 27: Proceedings of the 2014 Conference, pp. 1008–1016 (2014)
Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale L1-regularized linear classification. J. Mach. Learn. Res. 11(Nov), 3183–3234 (2010)
Yuan, G.-X., Ho, C.-H., Lin, C.-J.: An improved GLMNET for L1-regularized logistic regression. J. Mach. Learn. Res. 13(1), 1999–2030 (2012)
Yun, S., Toh, K.-C.: A coordinate gradient descent method for \(\ell _1\)-regularized convex minimization. Comput. Optim. Appl. 48(2), 273–307 (2011)
Zhang, H., Jiang, J., Luo, Z.-Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. China 1(2), 163–186 (2013)
Zhong, K., Yen, I.E.-H., Dhillon, I.S., Ravikumar, P.: Proximal quasi–Newton for computationally intensive \(\ell _1\)–regularized \(M\)-estimators. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems 27: Proceedings of the 2014 Conference, pp. 2375–2383 (2014)
Zhou, Z., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. Ser. A 165(2), 689–728 (2017)
Zhou, Z., Zhang, Q., So, A.M.-C.: \(\ell _{1,p}\)-norm regularization: error bounds and convergence rate analysis of first-order methods. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 1501–1510 (2015)
Acknowledgements
We thank the anonymous reviewers for their detailed and helpful comments. Most of the work of the first and second authors was done when they were Ph.D. students at the Department of Systems Engineering and Engineering Management of The Chinese University of Hong Kong.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research is supported in part by the Hong Kong Research Grants Council (RGC) General Research Fund (GRF) Projects CUHK 14206814 and CUHK 14208117 and in part by a gift grant from Microsoft Research Asia.
Rights and permissions
About this article
Cite this article
Yue, MC., Zhou, Z. & So, A.MC. A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo–Tseng error bound property. Math. Program. 174, 327–358 (2019). https://doi.org/10.1007/s10107-018-1280-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-018-1280-6
Keywords
- Convex composite minimization
- Sequential quadratic approximation
- Proximal Newton method
- Error bound
- Superlinear convergence