Skip to main content

A coordinate gradient descent method for nonsmooth separable minimization

Abstract

We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ1-regularization. We propose a (block) coordinate gradient descent method for solving this class of nonsmooth separable problems. We establish global convergence and, under a local Lipschitzian error bound assumption, linear convergence for this method. The local Lipschitzian error bound holds under assumptions analogous to those for constrained smooth optimization, e.g., the convex function is polyhedral and the smooth function is (nonconvex) quadratic or is the composition of a strongly convex function with a linear mapping. We report numerical experience with solving the ℓ1-regularization of unconstrained optimization problems from Moré et al. in ACM Trans. Math. Softw. 7, 17–41, 1981 and from the CUTEr set (Gould and Orban in ACM Trans. Math. Softw. 29, 373–394, 2003). Comparison with L-BFGS-B and MINOS, applied to a reformulation of the ℓ1-regularized problem as a bound-constrained optimization problem, is also reported.

This is a preview of subscription content, access via your institution.

References

  1. Auslender A. (1978). Minimisation de fonctions localement lipschitziennes: applications à la programmation mi-convexe, mi-différentiable. In: Mangasarian, O.L., Meyer, R.R. and Robinson, S.M. (eds) Nonlinear Programming, vol 3., pp 429–460. Academic, New York

    Google Scholar 

  2. Balakrishnan, S.: Private communication (2006)

  3. Bertsekas D.P. (1982). Constrained Optimization and Lagrange Multiplier Methods. Academic, New York

    MATH  Google Scholar 

  4. Bertsekas D.P. (1999). Nonlinear Programming, 2nd edn. Athena Scientific, Belmont

    Google Scholar 

  5. Bradley P.S., Fayyad U.M. and Mangasarian O.L. (1999). Mathematical programming for data mining: formulations and challenges. INFORMS J. Comput. 11: 217–238

    MATH  MathSciNet  Article  Google Scholar 

  6. Burke J.V. (1985). Descent methods for composite nondifferentiable optimization problems. Math. Program. 33: 260–279

    Article  MATH  MathSciNet  Google Scholar 

  7. Chen S., Donoho D. and Saunders M. (1999). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20: 33–61

    Article  MATH  MathSciNet  Google Scholar 

  8. Censor Y. and Zenios S.A. (1997). Parallel Optimization: Theory, Algorithms and Applications. Oxford University Press, New York

    MATH  Google Scholar 

  9. Conn A.R., Gould N.I.M. and Toint Ph.L. (2000). Trust-Region Methods. SIAM, Philadelphia

    MATH  Google Scholar 

  10. Donoho D.L. and Johnstone I.M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81: 425–455

    Article  MATH  MathSciNet  Google Scholar 

  11. Donoho D.L. and Johnstone I.M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90: 1200–1224

    Article  MATH  MathSciNet  Google Scholar 

  12. Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems 9. MIT Press, Cambridge (1997)

  13. Facchinei F., Fischer A. and Kanzow C. (1998). On the accurate identification of active constraints. SIAM J. Optim. 9: 14–32

    Article  MATH  MathSciNet  Google Scholar 

  14. Facchinei F. and Pang J.-S. (2003). Finite-Dimensional Variational Inequalities and Complementarity Problems, Vols. I and II. Springer, New York

    Google Scholar 

  15. Ferris M.C. and Mangasarian O.L. (1994). Parallel variable distribution. SIAM J. Optim. 4: 815–832

    Article  MATH  MathSciNet  Google Scholar 

  16. Fletcher R. (1982). A model algorithm for composite nondifferentiable optimization problems. Math. Program. Study 17: 67–76

    MATH  MathSciNet  Google Scholar 

  17. Fletcher R. (1987). Practical Methods of Optimization, 2nd edn. Wiley, Chichester

    Google Scholar 

  18. Fletcher R. (1994). An overview of unconstrained optimization. In: Spedicato, E. (eds) Algorithms for Continuous Optimization., pp 109–143. Kluwer, Dordrecht

    Google Scholar 

  19. Fukushima M. (1990). A successive quadratic programming method for a class of constrained nonsmooth optimization problems. Math. Program. 49: 231–251

    Article  MATH  MathSciNet  Google Scholar 

  20. Fukushima M. (1998). Parallel variable transformation in unconstrained optimization. SIAM J. Optim. 8: 658–672

    Article  MATH  MathSciNet  Google Scholar 

  21. Fukushima M. and Mine H. (1981). A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12: 989–1000

    Article  MATH  MathSciNet  Google Scholar 

  22. Gould N.I.M., Orban D. and Toint Ph.L. (2003). CUTEr, a constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 29: 373–394

    Article  MATH  MathSciNet  Google Scholar 

  23. Grippo L. and Sciandrone M. (2000). On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Oper. Res. Lett. 26: 127–136

    Article  MATH  MathSciNet  Google Scholar 

  24. Kelley C.T. (1999). Iterative Methods for Optimization. SIAM, Philadelphia

    MATH  Google Scholar 

  25. Kiwiel K.C. (1986). A method for minimizing the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 48: 437–449

    Article  MATH  MathSciNet  Google Scholar 

  26. Luo Z.-Q. and Tseng P. (1992). Error bounds and the convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 2: 43–54

    Article  MATH  MathSciNet  Google Scholar 

  27. Luo Z.-Q. and Tseng P. (1992). On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30: 408–425

    Article  MATH  MathSciNet  Google Scholar 

  28. Luo Z.-Q. and Tseng P. (1993). On the convergence rate of dual ascent methods for linearly constrained convex minimization. Math. Oper. Res. 18: 846–867

    MATH  MathSciNet  Google Scholar 

  29. Luo Z.-Q. and Tseng P. (1993). Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46: 157–178

    Article  MathSciNet  Google Scholar 

  30. Mangasarian O.L. (1984). Sparsity-preserving SOR algorithms for separable quadratic and linear programming. Comput. Oper. Res. 11: 105–112

    Article  MATH  MathSciNet  Google Scholar 

  31. Mangasarian O.L. (1995). Parallel gradient distribution in unconstrained optimization. SIAM J. Control Optim. 33: 1916–1925

    Article  MATH  MathSciNet  Google Scholar 

  32. Mangasarian O.L. and De Leone R. (1988). Parallel gradient projection successive overrelaxation for symmetric linear complementarity problems and linear programs. Ann. Oper. Res. 14: 41–59

    Article  MATH  MathSciNet  Google Scholar 

  33. Mangasarian O.L. and Musicant D.R. (1999). Successive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10: 1032–1037

    Article  Google Scholar 

  34. Mangasarian O.L. and Musicant D.R. (2002). Large scale kernel regression via linear programming. Mach. Learn. 46: 255–269

    Article  MATH  Google Scholar 

  35. Meier, L., van de Geer, S., Bühlmann, P.: The group Lasso for logistic regression. Report Seminar für Statistik, ETH Zürich, Zürich

  36. Mine H. and Fukushima M. (1981). A minimization method for the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 33: 9–23

    Article  MATH  MathSciNet  Google Scholar 

  37. Moré J.J., Garbow B.S. and Hillstrom K.E. (1981). Testing unconstrained optimization software. ACM Trans. Math. Softw. 7: 17–41

    Article  MATH  Google Scholar 

  38. Moré J.J. and Toraldo G. (1991). On the solution of large quadratic programming problems with bound constraints. SIAM J. Optim. 1: 93–113

    Article  MATH  MathSciNet  Google Scholar 

  39. Murtagh, B.A., Saunders, M.A.: MINOS 5.5 user’s guide. Report SOL 83-20R. Department of Operations Research, Stanford University, Stanford (1998)

  40. Nocedal J. (1980). Updating quasi-Newton matrices with limited storage. Math. Comp. 35: 773–782

    Article  MATH  MathSciNet  Google Scholar 

  41. Nocedal J. and Wright S.J. (1999). Numerical Optimization. Springer, New York

    MATH  Google Scholar 

  42. Ortega J.M. and Rheinboldt W.C. (2000). Iterative Solution of Nonlinear Equations in Several Variables. Reprinted by SIAM, Philadelphia

    MATH  Google Scholar 

  43. Powell M.J.D. (1973). On search directions for minimization algorithms. Math. Program. 4: 193–201

    Article  MATH  MathSciNet  Google Scholar 

  44. Robinson S.M. (1981). Some continuity properties of polyhedral multifunctions. Math. Program. Study 14: 206–214

    MATH  Google Scholar 

  45. Robinson S.M. (1999). Linear convergence of ε-subgradient descent methods for a class of convex functions. Math. Program. 86: 41–50

    Article  MATH  MathSciNet  Google Scholar 

  46. Robinson, S.M.: Calmness and Lipschitz continuity for multifunctions. Report, Department of Industrial Engineering, University of Wisconsin, Madison (2006)

  47. Rockafellar R.T. (1970). Convex Analysis. Princeton University Press, Princeton

    MATH  Google Scholar 

  48. Rockafellar R.T. and Wets R.J.-B. (1998). Variational Analysis. Springer, New York

    MATH  Google Scholar 

  49. Sardy S., Bruce A. and Tseng P. (2000). Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Stat. 9: 361–379

    Article  MathSciNet  Google Scholar 

  50. Sardy S., Bruce A. and Tseng P. (2001). Robust wavelet denoising. IEEE Trans. Signal Proc. 49: 1146–1152

    Article  Google Scholar 

  51. Sardy S. and Tseng P. (2004). AMlet, RAMlet and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets. J. Comput. Graph. Stat. 13: 283–309

    Article  MathSciNet  Google Scholar 

  52. Sardy S. and Tseng P. (2004). On the statistical analysis of smoothing by maximizing dirty Markov random field posterior distributions. J. Am. Stat. Assoc. 99: 191–204

    Article  MATH  MathSciNet  Google Scholar 

  53. Tseng P. (1991). On the rate of convergence of a partially asynchronous gradient projection algorithm. SIAM J. Optim. 1: 603–619

    Article  MATH  MathSciNet  Google Scholar 

  54. Tseng P. (1993). Dual coordinate ascent methods for non-strictly convex minimization. Math. Program. 59: 231–247

    Article  MathSciNet  Google Scholar 

  55. Tseng P. (2001). Convergence of block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109: 473–492

    Article  MathSciNet  Google Scholar 

  56. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Report, Department of Mathematics, University of Washington, Seattle; June 2006; revised February 2007. http://www.math.washington.edu/~tseng/papers.html

  57. Vapnik, V., Golowich, S.E., Smola, A.: Support vector method for function approximation, regression estimation, and signal processing. In: Mozer, M.C., Jordan, M.I., and Petsche, T., (eds.) Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)

  58. Yuan L. and Lin Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. 68: 49–67

    Article  MATH  MathSciNet  Google Scholar 

  59. Zhu C., Byrd R.H. and Nocedal J. (1997). L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization. ACM Trans. Math. Softw. 23: 550–560

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Tseng.

Additional information

In honor of Steve Robinson, for his many contributions to mathematical programming, including the subject of this paper: nonsmooth optimization and error bound.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Tseng, P., Yun, S. A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009). https://doi.org/10.1007/s10107-007-0170-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-007-0170-0

Keywords

  • Error bound
  • Global convergence
  • Linear convergence rate
  • Nonsmooth optimization
  • Coordinate descent

Mathematics Subject Classification (2000)

  • 49M27
  • 49M37
  • 65K05
  • 90C06
  • 90C25
  • 90C26
  • 90C30
  • 90C55