Mathematical Programming

, Volume 156, Issue 1–2, pp 189–219 | Cite as

A second-order method for strongly convex \(\ell _1\)-regularization problems

  • Kimon FountoulakisEmail author
  • Jacek Gondzio
Full Length Paper Series A


In this paper a robust second-order method is developed for the solution of strongly convex \(\ell _1\)-regularized problems. The main aim is to make the proposed method as inexpensive as possible, while even difficult problems can be efficiently solved. The proposed approach is a primal-dual Newton conjugate gradients (pdNCG) method. Convergence properties of pdNCG are studied and worst-case iteration complexity is established. Numerical results are presented on synthetic sparse least-squares problems and real world machine learning problems.


\(\ell _1\)-Regularization Strongly convex optimization  Second-order methods Iteration complexity Newton conjugate-gradients method 

Mathematics Subject Classification

68W40 65K05 90C06 90C25 90C30 90C51 


  1. 1.
    Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Problems 10, 1217–1229 (1994)CrossRefMathSciNetzbMATHGoogle Scholar
  2. 2.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)CrossRefMathSciNetzbMATHGoogle Scholar
  3. 3.
  4. 4.
    Becker, S.R., Bobin, J., Candès, E.J.: Nesta: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4(1), 1–39 (2011)CrossRefMathSciNetzbMATHGoogle Scholar
  5. 5.
    Becker, S.R., Candés, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165–218, (2011).
  6. 6.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)CrossRefzbMATHGoogle Scholar
  7. 7.
    Chan, R.H., Chan, T.F., Zhou, H.M.: Advanced signal processing algorithms. In: Luk F.T. (ed) Proceedings of the International Society of Photo-Optical Instrumentation Engineers, SPIE, pp. 314–325 (1995)Google Scholar
  8. 8.
    Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6), 1964–1977 (1999)CrossRefMathSciNetzbMATHGoogle Scholar
  9. 9.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011).
  10. 10.
    Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: Coordinate descent method for large-scale \(\ell _2\)-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN: 0521540518CrossRefzbMATHGoogle Scholar
  12. 12.
    Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 408–415 (2008)Google Scholar
  13. 13.
    Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. In: Technical report, Department of Computer Science, National Taiwan University (2010)Google Scholar
  14. 14.
    Keerthi, S.S., DeCoste, D.: A modified finite newton method for fast solution of large scale linear svms. J. Mach. Learn. Res. 6, 341–361 (2005)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Kelly, C.T.: Iterative Methods for Linear and Nonlinear Equations. SIAM, Philadelphia (1995)CrossRefGoogle Scholar
  16. 16.
    Lewis, D.D., Yiming, Yang, Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)Google Scholar
  17. 17.
    McCallum, A.: Real-sim: real vs. simulated data for binary classification problem.
  18. 18.
    Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmonic Anal. 26(3), 301–321 (2009)CrossRefMathSciNetzbMATHGoogle Scholar
  19. 19.
    Renegar, J.: A Mathematical View of Interior-Point Methods in Convex Optimization. In: MOS-SIAM Series on Optimization, Cornell University, Ithaca, New York (2001)Google Scholar
  20. 20.
    Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program 144(1–2), 1–38 (2014)CrossRefMathSciNetzbMATHGoogle Scholar
  21. 21.
    Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. In: Technical report, School of Mathematics, Edinburgh University, 2012.
  22. 22.
    Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\)-regularized loss minimization. J. Mach. Learn. Res. 12(4), 1865–1892 (2011)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain. In: Technical report, Carnegie Mellon University Pittsburgh, PA, USA, (1994)Google Scholar
  24. 24.
    Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2011)Google Scholar
  25. 25.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theor. Appl. 109(3), 475–494 (2001)CrossRefzbMATHGoogle Scholar
  27. 27.
    Tseng, P.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)CrossRefMathSciNetGoogle Scholar
  28. 28.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)CrossRefMathSciNetzbMATHGoogle Scholar
  29. 29.
    Webb, S., Caverlee, J., Pu, C.: Introducing the webb spam corpus: using email spam to identify web spam automatically. In: Proceedings of the Third Conference on Email and Anti-Spam (CEAS), (2006)Google Scholar
  30. 30.
    Wright, S.J.: Accelerated block-coordinate relaxation for regularized optimization. SIAM J. Optim. 22(1), 159–186 (2012)CrossRefMathSciNetzbMATHGoogle Scholar
  31. 31.
    Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2(1), 224–244 (2008)CrossRefMathSciNetzbMATHGoogle Scholar
  32. 32.
    Yu, H.-F., Lo, H.-Y., Hsieh, H.-P., Lou, J.-K., McKenzie, T.G., Chou, J.-W., Chung, P.-H., Ho, C.-H., Chang, C.-F., Wei, Y.-H., Weng, J.-Y., Yan, E.-S., Chang, C.-W., Kuo, T.-T., Lo, Y.-C., Chang, P.-T., Po, C., Wang, C.-Y., Huang, Y.-H., Hung, C.-W., Ruan, Y.-X., Lin, Y.-S., Lin, S.-D., Lin, H.-T., Lin, C.-J.: Feature engineering and classifier ensemble for kdd cup 2010. In: JMLR Workshop and Conference Proceedings, (2011)Google Scholar
  33. 33.
    Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society 2015

Authors and Affiliations

  1. 1.School of Mathematics and Maxwell InstituteThe University of EdinburghEdinburghUK

Personalised recommendations