Mathematical Programming

, Volume 117, Issue 1–2, pp 387–423 | Cite as

A coordinate gradient descent method for nonsmooth separable minimization

FULL LENGTH PAPER

Abstract

We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ1-regularization. We propose a (block) coordinate gradient descent method for solving this class of nonsmooth separable problems. We establish global convergence and, under a local Lipschitzian error bound assumption, linear convergence for this method. The local Lipschitzian error bound holds under assumptions analogous to those for constrained smooth optimization, e.g., the convex function is polyhedral and the smooth function is (nonconvex) quadratic or is the composition of a strongly convex function with a linear mapping. We report numerical experience with solving the ℓ1-regularization of unconstrained optimization problems from Moré et al. in ACM Trans. Math. Softw. 7, 17–41, 1981 and from the CUTEr set (Gould and Orban in ACM Trans. Math. Softw. 29, 373–394, 2003). Comparison with L-BFGS-B and MINOS, applied to a reformulation of the ℓ1-regularized problem as a bound-constrained optimization problem, is also reported.

Keywords

Error bound Global convergence Linear convergence rate Nonsmooth optimization Coordinate descent 

Mathematics Subject Classification (2000)

49M27 49M37 65K05 90C06 90C25 90C26 90C30 90C55 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Auslender A. (1978). Minimisation de fonctions localement lipschitziennes: applications à la programmation mi-convexe, mi-différentiable. In: Mangasarian, O.L., Meyer, R.R. and Robinson, S.M. (eds) Nonlinear Programming, vol 3., pp 429–460. Academic, New York Google Scholar
  2. 2.
    Balakrishnan, S.: Private communication (2006)Google Scholar
  3. 3.
    Bertsekas D.P. (1982). Constrained Optimization and Lagrange Multiplier Methods. Academic, New York MATHGoogle Scholar
  4. 4.
    Bertsekas D.P. (1999). Nonlinear Programming, 2nd edn. Athena Scientific, Belmont Google Scholar
  5. 5.
    Bradley P.S., Fayyad U.M. and Mangasarian O.L. (1999). Mathematical programming for data mining: formulations and challenges. INFORMS J. Comput. 11: 217–238 MATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    Burke J.V. (1985). Descent methods for composite nondifferentiable optimization problems. Math. Program. 33: 260–279 CrossRefMATHMathSciNetGoogle Scholar
  7. 7.
    Chen S., Donoho D. and Saunders M. (1999). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20: 33–61 CrossRefMATHMathSciNetGoogle Scholar
  8. 8.
    Censor Y. and Zenios S.A. (1997). Parallel Optimization: Theory, Algorithms and Applications. Oxford University Press, New York MATHGoogle Scholar
  9. 9.
    Conn A.R., Gould N.I.M. and Toint Ph.L. (2000). Trust-Region Methods. SIAM, Philadelphia MATHGoogle Scholar
  10. 10.
    Donoho D.L. and Johnstone I.M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81: 425–455 CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Donoho D.L. and Johnstone I.M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90: 1200–1224 CrossRefMATHMathSciNetGoogle Scholar
  12. 12.
    Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems 9. MIT Press, Cambridge (1997)Google Scholar
  13. 13.
    Facchinei F., Fischer A. and Kanzow C. (1998). On the accurate identification of active constraints. SIAM J. Optim. 9: 14–32 CrossRefMATHMathSciNetGoogle Scholar
  14. 14.
    Facchinei F. and Pang J.-S. (2003). Finite-Dimensional Variational Inequalities and Complementarity Problems, Vols. I and II. Springer, New York Google Scholar
  15. 15.
    Ferris M.C. and Mangasarian O.L. (1994). Parallel variable distribution. SIAM J. Optim. 4: 815–832 CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Fletcher R. (1982). A model algorithm for composite nondifferentiable optimization problems. Math. Program. Study 17: 67–76 MATHMathSciNetGoogle Scholar
  17. 17.
    Fletcher R. (1987). Practical Methods of Optimization, 2nd edn. Wiley, Chichester Google Scholar
  18. 18.
    Fletcher R. (1994). An overview of unconstrained optimization. In: Spedicato, E. (eds) Algorithms for Continuous Optimization., pp 109–143. Kluwer, Dordrecht Google Scholar
  19. 19.
    Fukushima M. (1990). A successive quadratic programming method for a class of constrained nonsmooth optimization problems. Math. Program. 49: 231–251 CrossRefMATHMathSciNetGoogle Scholar
  20. 20.
    Fukushima M. (1998). Parallel variable transformation in unconstrained optimization. SIAM J. Optim. 8: 658–672 CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Fukushima M. and Mine H. (1981). A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12: 989–1000 CrossRefMATHMathSciNetGoogle Scholar
  22. 22.
    Gould N.I.M., Orban D. and Toint Ph.L. (2003). CUTEr, a constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 29: 373–394 CrossRefMATHMathSciNetGoogle Scholar
  23. 23.
    Grippo L. and Sciandrone M. (2000). On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Oper. Res. Lett. 26: 127–136 CrossRefMATHMathSciNetGoogle Scholar
  24. 24.
    Kelley C.T. (1999). Iterative Methods for Optimization. SIAM, Philadelphia MATHGoogle Scholar
  25. 25.
    Kiwiel K.C. (1986). A method for minimizing the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 48: 437–449 CrossRefMATHMathSciNetGoogle Scholar
  26. 26.
    Luo Z.-Q. and Tseng P. (1992). Error bounds and the convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 2: 43–54 CrossRefMATHMathSciNetGoogle Scholar
  27. 27.
    Luo Z.-Q. and Tseng P. (1992). On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30: 408–425 CrossRefMATHMathSciNetGoogle Scholar
  28. 28.
    Luo Z.-Q. and Tseng P. (1993). On the convergence rate of dual ascent methods for linearly constrained convex minimization. Math. Oper. Res. 18: 846–867 MATHMathSciNetGoogle Scholar
  29. 29.
    Luo Z.-Q. and Tseng P. (1993). Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46: 157–178 CrossRefMathSciNetGoogle Scholar
  30. 30.
    Mangasarian O.L. (1984). Sparsity-preserving SOR algorithms for separable quadratic and linear programming. Comput. Oper. Res. 11: 105–112 CrossRefMATHMathSciNetGoogle Scholar
  31. 31.
    Mangasarian O.L. (1995). Parallel gradient distribution in unconstrained optimization. SIAM J. Control Optim. 33: 1916–1925 CrossRefMATHMathSciNetGoogle Scholar
  32. 32.
    Mangasarian O.L. and De Leone R. (1988). Parallel gradient projection successive overrelaxation for symmetric linear complementarity problems and linear programs. Ann. Oper. Res. 14: 41–59 CrossRefMATHMathSciNetGoogle Scholar
  33. 33.
    Mangasarian O.L. and Musicant D.R. (1999). Successive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10: 1032–1037 CrossRefGoogle Scholar
  34. 34.
    Mangasarian O.L. and Musicant D.R. (2002). Large scale kernel regression via linear programming. Mach. Learn. 46: 255–269 CrossRefMATHGoogle Scholar
  35. 35.
    Meier, L., van de Geer, S., Bühlmann, P.: The group Lasso for logistic regression. Report Seminar für Statistik, ETH Zürich, ZürichGoogle Scholar
  36. 36.
    Mine H. and Fukushima M. (1981). A minimization method for the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 33: 9–23 CrossRefMATHMathSciNetGoogle Scholar
  37. 37.
    Moré J.J., Garbow B.S. and Hillstrom K.E. (1981). Testing unconstrained optimization software. ACM Trans. Math. Softw. 7: 17–41 CrossRefMATHGoogle Scholar
  38. 38.
    Moré J.J. and Toraldo G. (1991). On the solution of large quadratic programming problems with bound constraints. SIAM J. Optim. 1: 93–113 CrossRefMATHMathSciNetGoogle Scholar
  39. 39.
    Murtagh, B.A., Saunders, M.A.: MINOS 5.5 user’s guide. Report SOL 83-20R. Department of Operations Research, Stanford University, Stanford (1998)Google Scholar
  40. 40.
    Nocedal J. (1980). Updating quasi-Newton matrices with limited storage. Math. Comp. 35: 773–782 CrossRefMATHMathSciNetGoogle Scholar
  41. 41.
    Nocedal J. and Wright S.J. (1999). Numerical Optimization. Springer, New York MATHGoogle Scholar
  42. 42.
    Ortega J.M. and Rheinboldt W.C. (2000). Iterative Solution of Nonlinear Equations in Several Variables. Reprinted by SIAM, Philadelphia MATHGoogle Scholar
  43. 43.
    Powell M.J.D. (1973). On search directions for minimization algorithms. Math. Program. 4: 193–201 CrossRefMATHMathSciNetGoogle Scholar
  44. 44.
    Robinson S.M. (1981). Some continuity properties of polyhedral multifunctions. Math. Program. Study 14: 206–214 MATHGoogle Scholar
  45. 45.
    Robinson S.M. (1999). Linear convergence of ε-subgradient descent methods for a class of convex functions. Math. Program. 86: 41–50 CrossRefMATHMathSciNetGoogle Scholar
  46. 46.
    Robinson, S.M.: Calmness and Lipschitz continuity for multifunctions. Report, Department of Industrial Engineering, University of Wisconsin, Madison (2006)Google Scholar
  47. 47.
    Rockafellar R.T. (1970). Convex Analysis. Princeton University Press, Princeton MATHGoogle Scholar
  48. 48.
    Rockafellar R.T. and Wets R.J.-B. (1998). Variational Analysis. Springer, New York MATHGoogle Scholar
  49. 49.
    Sardy S., Bruce A. and Tseng P. (2000). Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Stat. 9: 361–379 CrossRefMathSciNetGoogle Scholar
  50. 50.
    Sardy S., Bruce A. and Tseng P. (2001). Robust wavelet denoising. IEEE Trans. Signal Proc. 49: 1146–1152 CrossRefGoogle Scholar
  51. 51.
    Sardy S. and Tseng P. (2004). AMlet, RAMlet and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets. J. Comput. Graph. Stat. 13: 283–309 CrossRefMathSciNetGoogle Scholar
  52. 52.
    Sardy S. and Tseng P. (2004). On the statistical analysis of smoothing by maximizing dirty Markov random field posterior distributions. J. Am. Stat. Assoc. 99: 191–204 CrossRefMATHMathSciNetGoogle Scholar
  53. 53.
    Tseng P. (1991). On the rate of convergence of a partially asynchronous gradient projection algorithm. SIAM J. Optim. 1: 603–619 CrossRefMATHMathSciNetGoogle Scholar
  54. 54.
    Tseng P. (1993). Dual coordinate ascent methods for non-strictly convex minimization. Math. Program. 59: 231–247 CrossRefMathSciNetGoogle Scholar
  55. 55.
    Tseng P. (2001). Convergence of block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109: 473–492 CrossRefMathSciNetGoogle Scholar
  56. 56.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Report, Department of Mathematics, University of Washington, Seattle; June 2006; revised February 2007. http://www.math.washington.edu/~tseng/papers.htmlGoogle Scholar
  57. 57.
    Vapnik, V., Golowich, S.E., Smola, A.: Support vector method for function approximation, regression estimation, and signal processing. In: Mozer, M.C., Jordan, M.I., and Petsche, T., (eds.) Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)Google Scholar
  58. 58.
    Yuan L. and Lin Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. 68: 49–67 CrossRefMATHMathSciNetGoogle Scholar
  59. 59.
    Zhu C., Byrd R.H. and Nocedal J. (1997). L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization. ACM Trans. Math. Softw. 23: 550–560CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.Department of MathematicsUniversity of WashingtonSeattleUSA

Personalised recommendations