Skip to main content

Practical inexact proximal quasi-Newton method with global complexity analysis

Abstract

Recently several methods were proposed for sparse optimization which make careful use of second-order information (Hsieh et al. in Sparse inverse covariance matrix estimation using quadratic approximation. In: NIPS, 2011; Yuan et al. in An improved GLMNET for l1-regularized logistic regression and support vector machines. National Taiwan University, Taipei City, 2011; Olsen et al. in Newton-like methods for sparse inverse covariance estimation. In: NIPS, 2012; Byrd et al. in A family of second-order methods for convex l1-regularized optimization. Technical report, 2012) to improve local convergence rates. These methods construct a composite quadratic approximation using Hessian information, optimize this approximation using a first-order method, such as coordinate descent and employ a line search to ensure sufficient descent. Here we propose a general framework, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provide a novel global convergence rate analysis, which covers methods that solve subproblems via coordinate descent.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  2. Becker, S., Fadili, J.: A Quasi-Newton Proximal Splitting Method. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 2618–2626. Curran Associates, Inc., Red Hook (2012)

    Google Scholar 

  3. Byrd, R., Chin, G., Nocedal, J., Oztoprak, F.: A family of second-order methods for convex l1-regularized optimization. Technical report (2012)

  4. Byrd, R., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for convex l-1 regularized optimization. Technical report (2013)

  5. Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasi-newton matrices and their use in limited memory methods. Math. Program. 63, 129–156 (1994)

    MathSciNet  Article  MATH  Google Scholar 

  6. Cartis, C., Gould, N.I.M., Toint, P.L.: Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization. Optim. Methods Softw. 27, 197–219 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  7. Donoho, D.: De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41, 613–627 (1995)

    MathSciNet  Article  MATH  Google Scholar 

  8. Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostat. Oxf. Engl. 9, 432–441 (2008)

    Article  MATH  Google Scholar 

  9. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)

    Article  Google Scholar 

  10. Hsieh, C.-J., Sustik, M., Dhilon, I., Ravikumar, P.: Sparse inverse covariance matrix estimation using quadratic approximation. In: NIPS (2011)

  11. Jiang, K.F., Sun, D.F., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP. SIAM J. Optim. 3, 1042–1064 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  12. Lee, J.D., Sun, Y., Saunders, M.A.: Proximal newton-type methods for convex optimization. In: NIPS (2012)

  13. Lewis, A.S., Wright, S.J.: Identifying activity. SIAM J. Optim. 21, 597–614 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  14. Li, L., Toh, K.-C.: An inexact interior point method for L1-regularized sparse covariance selection. Math. Program. 2, 291–315 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  15. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  16. Nesterov, Y.: Gradient methods for minimizing composite objective function, CORE report (2007)

  17. Nesterov, Y.E.: Introductory lectures on convex optimization: a basic course 87, xviii+236 (2004)

    MathSciNet  Google Scholar 

  18. Nesterov, Y.E., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108, 177–205 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  19. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research, 2nd edn. Springer, New York (2006)

    MATH  Google Scholar 

  20. Olsen, P.A., Oztoprak, F., Nocedal, J., Rennie, S.J.: Newton-like methods for sparse inverse covariance estimation. In: NIPS (2012)

  21. Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5, 143–169 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  22. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math Program. 144(1–2), 1–38 (2014)

  23. Scheinberg, K., Ma, S., Goldfarb, D.: Sparse inverse covariance selection via alternating linearization methods. In: NIPS (2010)

  24. Scheinberg, K., Rish, I.: SINCO: a greedy coordinate ascent method for sparse inverse covariance selection problem, tech. rep. (2009)

  25. Schmidt, M., Kim, D., Sra, S.: Projected newton-type methods in machine learning. Optim. Mach. Learn., 305 (2012)

  26. Schmidt, M., Roux, N. L., Bach, F.: Supplementary material for the paper convergence rates of inexact proximal-gradient methods for convex optimization. In: Proceedings of the 25th annual conference on neural information processing systems (NIPS) (2011)

  27. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l1 regularized loss minimization. In: ICML, pp. 929–936 (2009)

  28. Tang, X.: Optimization in machine learning, Ph.D. thesis, Lehigh University (2015)

  29. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  30. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  31. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)

    MathSciNet  Article  Google Scholar 

  32. Wytock, M., Kolter, Z.: Sparse gaussian conditional random fields: algorithms, theory, and application to energy forecasting. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th international conference on machine learning (ICML-13), vol. 28, JMLR Workshop and Conference Proceedings, pp. 1265–1273 (May 2013)

  33. Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale l1-regularized linear classification. JMLR 11, 3183–3234 (2010)

    MathSciNet  MATH  Google Scholar 

  34. Yuan, G.-X., Ho, C.-H., Lin, C.-J.: An improved GLMNET for l1-regularized logistic regression and support vector machines. National Taiwan University, Taipei City (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katya Scheinberg.

Additional information

The work of Katya Scheinberg is partially supported by NSF Grants DMS 10-16571, DMS 13-19356, AFOSR Grant FA9550-11-1-0239, and DARPA Grant FA 9550-12-1-0406 negotiated by AFOSR. The work of Xiaocheng Tang is partially supported by DARPA Grant FA 9550-12-1-0406 negotiated by AFOSR.

Appendix

Appendix

Proof of Lemma 2.

Proof

Let \(p_{\phi }(v)\) denote \(p_{H,\phi }(v)\) for brevity. From (3.6) and (2.1), we have

$$\begin{aligned} F(u) - F(p_{\phi }(v))\ge & {} F(u) - Q(H, p_{\phi }(v),v) -\epsilon \nonumber \\= & {} F(u) - (f(v)+ g(p_{\phi }(v)) + \langle \nabla f(v),p_{\phi }(v)-v\rangle \nonumber \\&+ \frac{1}{2}\Vert p_{\phi }(v)-v\Vert _H^2)-\epsilon . \end{aligned}$$
(9.1)

Also

$$\begin{aligned} g(u) \ge g(p_{\phi }(v)) + \langle u-p_{\phi }(v), \gamma _g(p_{\phi }(v)) \rangle - \phi \end{aligned}$$
(9.2)

by the definition of \(\phi \)-subgradient, and

$$\begin{aligned} f(u) \ge f(v) + \langle u- v, \nabla f(v) \rangle , \end{aligned}$$
(9.3)

due to the convexity of f. Here \(\gamma _g(\cdot )\) is any subgradient of \(g(\cdot )\) and \(\gamma _g(p_{\phi }(v))\) is an \(\phi \)-subgradient, which satisfies the first-order optimality conditions for \(\phi \)-approximate minimizer from Lemma 1 with \(z=v-H^{-1}\nabla f(v)\), i.e.,

$$\begin{aligned} \gamma _g(p_{\phi }(v)) = H(v-p_{\phi }(v)) - \nabla f(v) - \eta , \text{ with } \frac{1}{2} \Vert \eta \Vert ^2_{H^{-1}} \le \phi . \end{aligned}$$
(9.4)

Summing (9.2) and (9.3) yields

$$\begin{aligned} F(u) \ge g(p_{\phi }(v)) + \langle u-p_{\phi }(v), \gamma _g(p_{\phi }(v)) \rangle - \phi + f(v) + \langle u- v, \nabla f(v)\rangle .\qquad \end{aligned}$$
(9.5)

Therefore, from (9.1), (9.4) and (9.5) it follows that

$$\begin{aligned} F(u) - F(p_{\phi }(v))&\ge \langle \nabla f(v)+\gamma _g(p_{\phi }(v)), u-p_{\phi }(v) \rangle - \frac{1}{2}\Vert p_{\phi }(v)-v\Vert _H^2 - \epsilon - \phi \\&= \langle -H(p_{\phi }(v)-v) - \eta , u-p_{\phi }(v)\rangle - \frac{1}{2}\Vert p_{\phi }(v)-v\Vert _H^2-\epsilon - \phi \\&= \frac{1}{2}\Vert p_{\phi }( v)-u\Vert _H^2 - \frac{1}{2} \Vert v-u\Vert _H^2-\epsilon -\phi - \langle \eta , u - p_{\phi }(v) \rangle . \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Scheinberg, K., Tang, X. Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160, 495–529 (2016). https://doi.org/10.1007/s10107-016-0997-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-016-0997-3

Keywords

  • Convex optimization
  • Proximal Newton methods
  • Convergence rates
  • Coordinate descent
  • Quasi-Newton methods

Mathematics Subject Classification

  • 90C25
  • 90C53