Skip to main content

An inexact successive quadratic approximation method for L-1 regularized optimization

Abstract

We study a Newton-like method for the minimization of an objective function \(\phi \) that is the sum of a smooth function and an \(\ell _1\) regularization term. This method, which is sometimes referred to in the literature as a proximal Newton method, computes a step by minimizing a piecewise quadratic model \(q_k\) of the objective function \(\phi \). In order to make this approach efficient in practice, it is imperative to perform this inner minimization inexactly. In this paper, we give inexactness conditions that guarantee global convergence and that can be used to control the local rate of convergence of the iteration. Our inexactness conditions are based on a semi-smooth function that represents a (continuous) measure of the optimality conditions of the problem, and that embodies the soft-thresholding iteration. We give careful consideration to the algorithm employed for the inner minimization, and report numerical results on two test sets originating in machine learning.

This is a preview of subscription content, access via your institution.

References

  1. Andrew, G., Gao, J.: Scalable training of \({L}_1\)-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning. ACM, pp. 33–40. (2007)

  2. Banerjee, O., El Ghaoui, L., d’Aspremont, A.: Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9, 485–516 (2008)

    MathSciNet  MATH  Google Scholar 

  3. Banerjee, O., El Ghaoui, L., d’Aspremont, A., Natsoulis, G.: Convex optimization techniques for fitting sparse Gaussian graphical models. In: Proceedings of the 23rd International Conference on Machine learning. ACM, pp. 89–96 (2006)

  4. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  5. Becker, S.R., Candés, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165–218 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  6. Byrd, R.H., Chin, G.M., Nocedal, J., Oztoprak, F.: A family of second-order methods for convex L1 regularized optimization. Technical report, Optimization Center Report 2012/2, Northwestern University (2012)

  7. Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134(1), 127–155 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  8. Byrd, R.H., Nocedal, J., Schnabel, R.: Representations of quasi-Newton matrices and their use in limited memory methods. Math. Program. 63(4), 129–156 (1994)

    MathSciNet  Article  MATH  Google Scholar 

  9. Dembo, R.S., Eisenstat, S.C., Steihaug, T.: Inexact-Newton methods. SIAM J. Numer. Anal. 19(2), 400–408 (1982)

    MathSciNet  Article  MATH  Google Scholar 

  10. Dontchev, A.L., Rockafellar, R.T.: Convergence of inexact Newton methods for generalized equations. Math. Program. 139, 115–137 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  11. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 2. Springer, Berlin (2003)

    MATH  Google Scholar 

  12. Hsieh, C.J., Sustik, M.A., Ravikumar, P., Dhillon, I.S.: Sparse inverse covariance matrix estimation using quadratic approximation. Adv. Neural Inf. Process. Syst. 24, 2330–2338 (2011)

    Google Scholar 

  13. Lee, J., Sun, Y., Saunders, M.: Proximal Newton-type methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 836–844 (2012)

  14. Li, L., Toh, K.C.: An inexact interior point method for L1-regularized sparse covariance selection. Math. Program. Comput. 2(3), 291–315 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  15. Le Roux, N., Schmidt, M.W., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: NIPS, pp. 1458–1466 (2011)

  16. Milzarek, A., Ulbrich, M.: A semismooth Newton method with multi-dimensional filter globalization for L1-optimization. SIAM J. Optim. 24(1), 298–333 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  17. Nocedal, Jorge, Wright, Stephen: Numerical Optimization, 2nd edn. Springer, New York (1999)

    Book  MATH  Google Scholar 

  18. Olsen, P., Oztoprak, F., Nocedal, J., Rennie, S.: Newton-like methods for sparse inverse covariance estimation. In: Bartlett, P., Pereira, F.C.N., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 764–772 (2012)

  19. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, London (1970)

    MATH  Google Scholar 

  20. Patriksson, M.: Cost approximation: a unified framework of descent algorithms for nonlinear programs. SIAM J. Optim. 8(2), 561–582 (1998)

    MathSciNet  Article  MATH  Google Scholar 

  21. Patriksson, M.: Nonlinear Programming and Variational Inequality Problems, a Unified Approach. Kluwer, Dordrecht (1998)

    MATH  Google Scholar 

  22. Picka, J.D.: Gaussian Markov random fields: theory and applications. Technometrics 48(1), 146–147 (2006)

    MathSciNet  Article  Google Scholar 

  23. Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 19(4), 1167–1192 (2012)

    MathSciNet  MATH  Google Scholar 

  24. Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. Mit Press, Cambridge (2011)

    Google Scholar 

  25. Tan, X., Scheinberg, K.: Complexity of Inexact Proximal Newton Method. Technical report, Dept. of ISE, Lehigh University (2013)

  26. Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. arXiv preprint arXiv:1304.5530 (2013)

  27. Yuan, G.-X., Chang, K., Hsie, C., Lin, C.-J.: A comparison of optimization methods and software for large-scale l1-regularized linear classification. J. Mach. Learn. Res. 11(1), 3183–3234 (2010)

    MathSciNet  MATH  Google Scholar 

  28. Yuan, G.-X., Ho, C.-H., Lin, C.-J.: An improved glmnet for l1-regularized logistic regression. J. Mach. Learn. Res. 13(1), 1999–2030 (2012)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge Nocedal.

Additional information

To Jong-Shi Pang for his important contributions to optimization and his constant support.

Richard H. Byrd was supported by National Science Foundation Grant DMS-1216554 and Department of Energy Grant DE-SC0001774.

Jorge Nocedal was supported by National Science Foundation Grant DMS-0810213, and by ONR Grant N00014-14-1-0313 P00002.

Figen Oztoprak was supported by US Department of Energy Grant DE-FG02-87ER25047 and by Scientific and Technological Research Council of Turkey Grant Number 113M500. Part of this work was completed while the author was at Istanbul Technical University.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Byrd, R.H., Nocedal, J. & Oztoprak, F. An inexact successive quadratic approximation method for L-1 regularized optimization. Math. Program. 157, 375–396 (2016). https://doi.org/10.1007/s10107-015-0941-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-015-0941-y

Keywords

  • Sparse optimization
  • Inexact proximal Newton
  • Orthant-based quasi-Newton

Mathematics Subject Classification

  • 49M15
  • 90C55