Abstract
In this paper a robust second-order method is developed for the solution of strongly convex \(\ell _1\)-regularized problems. The main aim is to make the proposed method as inexpensive as possible, while even difficult problems can be efficiently solved. The proposed approach is a primal-dual Newton conjugate gradients (pdNCG) method. Convergence properties of pdNCG are studied and worst-case iteration complexity is established. Numerical results are presented on synthetic sparse least-squares problems and real world machine learning problems.
Similar content being viewed by others
References
Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Problems 10, 1217–1229 (1994)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Becker, S.: CoSaMP and OMP for Sparse Recovery. http://www.mathworks.co.uk/matlabcentral/fileexchange/32402-cosamp-and-omp-for-sparse-recovery (2012)
Becker, S.R., Bobin, J., Candès, E.J.: Nesta: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4(1), 1–39 (2011)
Becker, S.R., Candés, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165–218, (2011). http://tfocs.stanford.edu
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)
Chan, R.H., Chan, T.F., Zhou, H.M.: Advanced signal processing algorithms. In: Luk F.T. (ed) Proceedings of the International Society of Photo-Optical Instrumentation Engineers, SPIE, pp. 314–325 (1995)
Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6), 1964–1977 (1999)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: Coordinate descent method for large-scale \(\ell _2\)-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN: 0521540518
Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 408–415 (2008)
Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. In: Technical report, Department of Computer Science, National Taiwan University (2010)
Keerthi, S.S., DeCoste, D.: A modified finite newton method for fast solution of large scale linear svms. J. Mach. Learn. Res. 6, 341–361 (2005)
Kelly, C.T.: Iterative Methods for Linear and Nonlinear Equations. SIAM, Philadelphia (1995)
Lewis, D.D., Yiming, Yang, Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
McCallum, A.: Real-sim: real vs. simulated data for binary classification problem. http://www.cs.umass.edu/~mccallum/code-data.html
Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmonic Anal. 26(3), 301–321 (2009)
Renegar, J.: A Mathematical View of Interior-Point Methods in Convex Optimization. In: MOS-SIAM Series on Optimization, Cornell University, Ithaca, New York (2001)
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program 144(1–2), 1–38 (2014)
Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. In: Technical report, School of Mathematics, Edinburgh University, 2012. https://code.google.com/p/ac-dc/
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\)-regularized loss minimization. J. Mach. Learn. Res. 12(4), 1865–1892 (2011)
Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain. In: Technical report, Carnegie Mellon University Pittsburgh, PA, USA, (1994)
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2011)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theor. Appl. 109(3), 475–494 (2001)
Tseng, P.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)
Webb, S., Caverlee, J., Pu, C.: Introducing the webb spam corpus: using email spam to identify web spam automatically. In: Proceedings of the Third Conference on Email and Anti-Spam (CEAS), (2006)
Wright, S.J.: Accelerated block-coordinate relaxation for regularized optimization. SIAM J. Optim. 22(1), 159–186 (2012)
Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2(1), 224–244 (2008)
Yu, H.-F., Lo, H.-Y., Hsieh, H.-P., Lou, J.-K., McKenzie, T.G., Chou, J.-W., Chung, P.-H., Ho, C.-H., Chang, C.-F., Wei, Y.-H., Weng, J.-Y., Yan, E.-S., Chang, C.-W., Kuo, T.-T., Lo, Y.-C., Chang, P.-T., Po, C., Wang, C.-Y., Huang, Y.-H., Hung, C.-W., Ruan, Y.-X., Lin, Y.-S., Lin, S.-D., Lin, H.-T., Lin, C.-J.: Feature engineering and classifier ensemble for kdd cup 2010. In: JMLR Workshop and Conference Proceedings, (2011)
Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)
Author information
Authors and Affiliations
Corresponding author
Additional information
J. Gondzio is supported by EPSRC Grant EP/I017127/1.
Rights and permissions
About this article
Cite this article
Fountoulakis, K., Gondzio, J. A second-order method for strongly convex \(\ell _1\)-regularization problems. Math. Program. 156, 189–219 (2016). https://doi.org/10.1007/s10107-015-0875-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-015-0875-4
Keywords
- \(\ell _1\)-Regularization
- Strongly convex optimization
- Second-order methods
- Iteration complexity
- Newton conjugate-gradients method