Journal of Scientific Computing

, Volume 61, Issue 1, pp 17–41 | Cite as

Nonmonotone Barzilai–Borwein Gradient Algorithm for \(\ell _{1}\)-Regularized Nonsmooth Minimization in Compressive Sensing

  • Yunhai XiaoEmail author
  • Soon-Yi Wu
  • Liqun Qi


This study aims to minimize the sum of a smooth function and a nonsmooth \(\ell _{1}\)-regularized term. This problem as a special case includes the \(\ell _{1}\)-regularized convex minimization problem in signal processing, compressive sensing, machine learning, data mining, and so on. However, the non-differentiability of the \(\ell _{1}\)-norm causes more challenges especially in large problems encountered in many practical applications. This study proposes, analyzes, and tests a Barzilai–Borwein gradient algorithm. At each iteration, the generated search direction demonstrates descent property and can be easily derived by minimizing a local approximal quadratic model and simultaneously taking the favorable structure of the \(\ell _{1}\)-norm. A nonmonotone line search technique is incorporated to find a suitable stepsize along this direction. The algorithm is easily performed, where each iteration requiring the values of the objective function and the gradient of the smooth term. Under some conditions, the proposed algorithm appears globally convergent. The limited experiments using some nonconvex unconstrained problems from the CUTEr library with additive \(\ell _{1}\)-regularization illustrate that the proposed algorithm performs quite satisfactorily. Extensive experiments for \(\ell _{1}\)-regularized least squares problems in compressive sensing verify that our algorithm compares favorably with several state-of-the-art algorithms that have been specifically designed in recent years.


Nonsmooth optimization Nonconvex optimization Barzilai–Borwein gradient algorithm Nonmonotone line search  \(\ell _{1}\) regularization Compressive sensing 

Mathematics Subject Classification

65L09 65K05 90C30 90C25 



We would like to thank two anonymous referees for their useful comments and suggestions which improved this paper greatly. The first version of the paper is finished during Y. Xiao’s stay as a postdoctoral research fellow in NCTS, National Cheng Kung University, Taiwan. The work of Y. Xiao is supported by Chinese Natural Science Foundation Grant 11001075, and the Natural Science Foundation of Henan Province Grant 2011GGJS030.


  1. 1.
    Andrew, G., Gao, J.: Scalable training of \(\ell _{1}\)-regularized log-linear models. In: Proceedings of the Twenty Fourth International Conference on Machine Learning, (ICML) (2007)Google Scholar
  2. 2.
    Barzilai, J., Borwein, J.M.: Two point step size gradient method. IMA J. Numer. Anal. 8, 141–148 (1988)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Becker, S., Bobin, J., Candès, E.: NESTA: a fast and accurate first-order method for sparse recovery. SIAM J. Imaging Sci. 4, 1–39 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1196–1211 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Bioucas-Dias, J.M., Figueiredo, M.: A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoratin. IEEE Trans. Image Process. 16, 2992–3004 (2007)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Cai, J.F., Candès, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956–1982 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Candès, E., Romberg, J.: Quantitative robust uncertainty principles and optimally sparse decompositions. Found. Comput. Math. 6, 227–254 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    Candès, E., Romberg, J., Tao, T.: Stable signal recovery from imcomplete and inaccurate information. Commun. Pure Appl. Math. 59, 1207–1233 (2005)CrossRefGoogle Scholar
  10. 10.
    Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequence information. IEEE Trans. Inf. Theory 52, 489–509 (2006)CrossRefzbMATHGoogle Scholar
  11. 11.
    Candès, E., Tao, T.: Near optimal signal recovery from random projections: universal encoding strategies. IEEE Trans. Inf. Theory 52, 5406–5425 (2004)CrossRefGoogle Scholar
  12. 12.
    Chang, K.W., Hsieh, C.J., Lin, C.J.: Coordinate descent method for large-scale L2-loss linear SVM. J. Mach. Learn. Res. 9, 1369–1398 (2008)zbMATHMathSciNetGoogle Scholar
  13. 13.
    Cheng, W., Li, D.H.: A derivative-free nonmonotone line search and its application to the spectral residual method. IMA J. Numer. Anal. 29, 814–825 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Conn, A.R., Gould, N.I.M., Toint, PhL: CUTE: constrained and unconstrained testing environment. ACM Trans. Math. Softw. 21, 123–160 (1995)CrossRefzbMATHGoogle Scholar
  15. 15.
    Dai, Y.H., Fletcher, R.: On the asymptotic behaviour of some new gradient methods. Math. Program. 103, 541–559 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Dai, Y.H., Hager, W.W., Schittkowski, K., Zhang, H.C.: The cyclic Barzilai–Borwein method for unconstrained optimization. IMA J. Numer. Anal. 26, 604–627 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 26, 1–10 (2002)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Duchi, J., Singer, Y.: Efficient online and batch learning using forward backword splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)zbMATHMathSciNetGoogle Scholar
  20. 20.
    Figueiredo, M., Nowak, R.D., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process 1, 586–597 (2007)CrossRefGoogle Scholar
  21. 21.
    Genkin, A., Lewis, D.D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrices 49, 291–304 (2007)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23, 707–716 (1986)CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for \(\ell _{1}\)-minimization: methodology and convergence. SIAM J. Optim. 19, 1107–1130 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  24. 24.
    Kim, S., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale \(\ell _{1}\)-regularized least squares. IEEE J. Sel. Top. Signal Process. 1, 606–617 (2007) Google Scholar
  25. 25.
    Koh, K., Kim, S., Boyd, S.: An interior-point method for large-scale \(\ell _{1}\)-regularized logistic regression. J. Mach. Learn. Res. 8, 1519–1555 (2007)zbMATHMathSciNetGoogle Scholar
  26. 26.
    Lin, C.J., Moré, J.J.: Newton’s method for large-scale bound constrained problems. SIAM J. Optim. 9, 1100–1127 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Lu, Z., Zhang, Y.: An augmented Lagrangian approach for sparse principal component analysis. Math. Program. 135, 149–193 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  28. 28.
    Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  29. 29.
    Nesterov, Y.: Gradient methods for minimizing composite objective function, ECORE Discussion Paper 2007/76. (2007)
  30. 30.
    Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 35, 773–782 (1980)CrossRefzbMATHMathSciNetGoogle Scholar
  31. 31.
    Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13, 321–326 (1993)CrossRefzbMATHMathSciNetGoogle Scholar
  32. 32.
    Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7, 26–33 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  33. 33.
    Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  34. 34.
    Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 60, 259–268 (1992)CrossRefzbMATHGoogle Scholar
  35. 35.
    Shalev-Shwartz, S., Tewari, A.: Stochastic method for l1 regularized loss minimization. In: Proceedings of the Twenty Sixth International Conference on Machine Learning (ICML) (2009)Google Scholar
  36. 36.
    Shi, J., Yin, W., Osher, S., Sajda, P.: A fast hybrid algorithm for large-scale \(\ell _{1}\)-regularized logistic regression. J. Mach. Learn. Res. 11, 713–741 (2010)zbMATHMathSciNetGoogle Scholar
  37. 37.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  38. 38.
    van den Berg, E., Friedlander, M.P.: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)Google Scholar
  39. 39.
    Wen, Z., Yin, W., Goldfarb, D., Zhang, Y.: A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation. SIAM J. Sci. Comput. 32, 1832–1857 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  40. 40.
    Wen, Z., Yin, W., Zhang, H., Goldfarb, D.: On the convergence of an active-set method for \(\ell _{1}\) minimization. Optim. Method Softw 27, 1127–1146 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  41. 41.
    Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, pp 3373–3376 (2008)Google Scholar
  42. 42.
    Wright, J., Ma, Y., Ganesh, A., Rao, S.: Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. J. ACM 5, 1–44 (2009)Google Scholar
  43. 43.
    Yang, J., Zhang, Y.: Alternating direction algorithms for \(\ell _{1}\)-problems in compressive sensing. SIAM J. Sci. Comput. 33, 250–278 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  44. 44.
    Yu, J., Vishwanathan, S.V.N., Günter, S., Schraudolph, N.N.: A quasi-Newton approach to nonsmooth convex optimization problems in machine learning. J. Mach. Learn. Res. 11, 1145–1200 (2010)zbMATHMathSciNetGoogle Scholar
  45. 45.
    Yuan, G.X., Chang, K.W., Hsieh, C.J., Lin, C.J.: A comparison of optimization methods and software for large-scale \(\ell _{1}\)-regularized linear classification. J. Mach. Learn. Res. 11, 3183–3234 (2010)zbMATHMathSciNetGoogle Scholar
  46. 46.
    Yuan, X.: Alternating direction method for covariance selection models. J. Sci. Comput. 51, 261–273 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  47. 47.
    Yun, S., Toh, K.C.: A coordinate gradient descent method for \(\ell _{1}\)-regularized convex minimization. Comput. Optim. Appl. 48, 273–307 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  48. 48.
    Zhang, H., Hager, W.W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14, 1043–1056 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  49. 49.
    Zhang, Y., Sun, W., Qi, L.: A nonmonotone filter Barzilai-Borwein method for optimization. Asia Pac. J. Oper. Res. 27, 55–69 (2010)CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Institute of Applied Mathematics, College of Mathematics and Information ScienceHenan UniversityKaifengChina
  2. 2.National Center for Theoretical Sciences (South)National Cheng Kung UniversityTainanTaiwan
  3. 3.Department of Applied MathematicsHong Kong Polytechnic UniversityHung Hom, KowloonHong Kong

Personalised recommendations