Skip to main content

A second-order method for strongly convex \(\ell _1\)-regularization problems

Abstract

In this paper a robust second-order method is developed for the solution of strongly convex \(\ell _1\)-regularized problems. The main aim is to make the proposed method as inexpensive as possible, while even difficult problems can be efficiently solved. The proposed approach is a primal-dual Newton conjugate gradients (pdNCG) method. Convergence properties of pdNCG are studied and worst-case iteration complexity is established. Numerical results are presented on synthetic sparse least-squares problems and real world machine learning problems.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. 1.

    Acar, R., Vogel, C.R.: Analysis of bounded variation penalty methods for ill-posed problems. Inverse Problems 10, 1217–1229 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  2. 2.

    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. 3.

    Becker, S.: CoSaMP and OMP for Sparse Recovery. http://www.mathworks.co.uk/matlabcentral/fileexchange/32402-cosamp-and-omp-for-sparse-recovery (2012)

  4. 4.

    Becker, S.R., Bobin, J., Candès, E.J.: Nesta: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4(1), 1–39 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. 5.

    Becker, S.R., Candés, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165–218, (2011). http://tfocs.stanford.edu

  6. 6.

    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)

    Book  MATH  Google Scholar 

  7. 7.

    Chan, R.H., Chan, T.F., Zhou, H.M.: Advanced signal processing algorithms. In: Luk F.T. (ed) Proceedings of the International Society of Photo-Optical Instrumentation Engineers, SPIE, pp. 314–325 (1995)

  8. 8.

    Chan, T.F., Golub, G.H., Mulet, P.: A nonlinear primal-dual method for total variation-based image restoration. SIAM J. Sci. Comput. 20(6), 1964–1977 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  9. 9.

    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011). http://www.csie.ntu.edu.tw/~cjlin/libsvm

  10. 10.

    Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: Coordinate descent method for large-scale \(\ell _2\)-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)

    MathSciNet  MATH  Google Scholar 

  11. 11.

    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN: 0521540518

    Book  MATH  Google Scholar 

  12. 12.

    Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 408–415 (2008)

  13. 13.

    Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. In: Technical report, Department of Computer Science, National Taiwan University (2010)

  14. 14.

    Keerthi, S.S., DeCoste, D.: A modified finite newton method for fast solution of large scale linear svms. J. Mach. Learn. Res. 6, 341–361 (2005)

    MathSciNet  MATH  Google Scholar 

  15. 15.

    Kelly, C.T.: Iterative Methods for Linear and Nonlinear Equations. SIAM, Philadelphia (1995)

    Book  Google Scholar 

  16. 16.

    Lewis, D.D., Yiming, Yang, Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)

    Google Scholar 

  17. 17.

    McCallum, A.: Real-sim: real vs. simulated data for binary classification problem. http://www.cs.umass.edu/~mccallum/code-data.html

  18. 18.

    Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmonic Anal. 26(3), 301–321 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  19. 19.

    Renegar, J.: A Mathematical View of Interior-Point Methods in Convex Optimization. In: MOS-SIAM Series on Optimization, Cornell University, Ithaca, New York (2001)

  20. 20.

    Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program 144(1–2), 1–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  21. 21.

    Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. In: Technical report, School of Mathematics, Edinburgh University, 2012. https://code.google.com/p/ac-dc/

  22. 22.

    Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\)-regularized loss minimization. J. Mach. Learn. Res. 12(4), 1865–1892 (2011)

    MathSciNet  MATH  Google Scholar 

  23. 23.

    Shewchuk, J.R.: An introduction to the conjugate gradient method without the agonizing pain. In: Technical report, Carnegie Mellon University Pittsburgh, PA, USA, (1994)

  24. 24.

    Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2011)

    Google Scholar 

  25. 25.

    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  26. 26.

    Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theor. Appl. 109(3), 475–494 (2001)

    Article  MATH  Google Scholar 

  27. 27.

    Tseng, P.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)

    Article  MathSciNet  Google Scholar 

  28. 28.

    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  29. 29.

    Webb, S., Caverlee, J., Pu, C.: Introducing the webb spam corpus: using email spam to identify web spam automatically. In: Proceedings of the Third Conference on Email and Anti-Spam (CEAS), (2006)

  30. 30.

    Wright, S.J.: Accelerated block-coordinate relaxation for regularized optimization. SIAM J. Optim. 22(1), 159–186 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  31. 31.

    Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2(1), 224–244 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  32. 32.

    Yu, H.-F., Lo, H.-Y., Hsieh, H.-P., Lou, J.-K., McKenzie, T.G., Chou, J.-W., Chung, P.-H., Ho, C.-H., Chang, C.-F., Wei, Y.-H., Weng, J.-Y., Yan, E.-S., Chang, C.-W., Kuo, T.-T., Lo, Y.-C., Chang, P.-T., Po, C., Wang, C.-Y., Huang, Y.-H., Hung, C.-W., Ruan, Y.-X., Lin, Y.-S., Lin, S.-D., Lin, H.-T., Lin, C.-J.: Feature engineering and classifier ensemble for kdd cup 2010. In: JMLR Workshop and Conference Proceedings, (2011)

  33. 33.

    Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Kimon Fountoulakis.

Additional information

J. Gondzio is supported by EPSRC Grant EP/I017127/1.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fountoulakis, K., Gondzio, J. A second-order method for strongly convex \(\ell _1\)-regularization problems. Math. Program. 156, 189–219 (2016). https://doi.org/10.1007/s10107-015-0875-4

Download citation

Keywords

  • \(\ell _1\)-Regularization
  • Strongly convex optimization
  • Second-order methods
  • Iteration complexity
  • Newton conjugate-gradients method

Mathematics Subject Classification

  • 68W40
  • 65K05
  • 90C06
  • 90C25
  • 90C30
  • 90C51