Advertisement

Mathematical Programming

, Volume 117, Issue 1–2, pp 387–423 | Cite as

A coordinate gradient descent method for nonsmooth separable minimization

  • Paul Tseng
  • Sangwoon Yun
FULL LENGTH PAPER

Abstract

We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ1-regularization. We propose a (block) coordinate gradient descent method for solving this class of nonsmooth separable problems. We establish global convergence and, under a local Lipschitzian error bound assumption, linear convergence for this method. The local Lipschitzian error bound holds under assumptions analogous to those for constrained smooth optimization, e.g., the convex function is polyhedral and the smooth function is (nonconvex) quadratic or is the composition of a strongly convex function with a linear mapping. We report numerical experience with solving the ℓ1-regularization of unconstrained optimization problems from Moré et al. in ACM Trans. Math. Softw. 7, 17–41, 1981 and from the CUTEr set (Gould and Orban in ACM Trans. Math. Softw. 29, 373–394, 2003). Comparison with L-BFGS-B and MINOS, applied to a reformulation of the ℓ1-regularized problem as a bound-constrained optimization problem, is also reported.

Keywords

Error bound Global convergence Linear convergence rate Nonsmooth optimization Coordinate descent 

Mathematics Subject Classification (2000)

49M27 49M37 65K05 90C06 90C25 90C26 90C30 90C55 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Auslender A. (1978). Minimisation de fonctions localement lipschitziennes: applications à la programmation mi-convexe, mi-différentiable. In: Mangasarian, O.L., Meyer, R.R. and Robinson, S.M. (eds) Nonlinear Programming, vol 3., pp 429–460. Academic, New York Google Scholar
  2. 2.
    Balakrishnan, S.: Private communication (2006)Google Scholar
  3. 3.
    Bertsekas D.P. (1982). Constrained Optimization and Lagrange Multiplier Methods. Academic, New York zbMATHGoogle Scholar
  4. 4.
    Bertsekas D.P. (1999). Nonlinear Programming, 2nd edn. Athena Scientific, Belmont Google Scholar
  5. 5.
    Bradley P.S., Fayyad U.M. and Mangasarian O.L. (1999). Mathematical programming for data mining: formulations and challenges. INFORMS J. Comput. 11: 217–238 zbMATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    Burke J.V. (1985). Descent methods for composite nondifferentiable optimization problems. Math. Program. 33: 260–279 CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Chen S., Donoho D. and Saunders M. (1999). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20: 33–61 CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Censor Y. and Zenios S.A. (1997). Parallel Optimization: Theory, Algorithms and Applications. Oxford University Press, New York zbMATHGoogle Scholar
  9. 9.
    Conn A.R., Gould N.I.M. and Toint Ph.L. (2000). Trust-Region Methods. SIAM, Philadelphia zbMATHGoogle Scholar
  10. 10.
    Donoho D.L. and Johnstone I.M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81: 425–455 CrossRefzbMATHMathSciNetGoogle Scholar
  11. 11.
    Donoho D.L. and Johnstone I.M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90: 1200–1224 CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems 9. MIT Press, Cambridge (1997)Google Scholar
  13. 13.
    Facchinei F., Fischer A. and Kanzow C. (1998). On the accurate identification of active constraints. SIAM J. Optim. 9: 14–32 CrossRefzbMATHMathSciNetGoogle Scholar
  14. 14.
    Facchinei F. and Pang J.-S. (2003). Finite-Dimensional Variational Inequalities and Complementarity Problems, Vols. I and II. Springer, New York Google Scholar
  15. 15.
    Ferris M.C. and Mangasarian O.L. (1994). Parallel variable distribution. SIAM J. Optim. 4: 815–832 CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Fletcher R. (1982). A model algorithm for composite nondifferentiable optimization problems. Math. Program. Study 17: 67–76 zbMATHMathSciNetGoogle Scholar
  17. 17.
    Fletcher R. (1987). Practical Methods of Optimization, 2nd edn. Wiley, Chichester Google Scholar
  18. 18.
    Fletcher R. (1994). An overview of unconstrained optimization. In: Spedicato, E. (eds) Algorithms for Continuous Optimization., pp 109–143. Kluwer, Dordrecht Google Scholar
  19. 19.
    Fukushima M. (1990). A successive quadratic programming method for a class of constrained nonsmooth optimization problems. Math. Program. 49: 231–251 CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Fukushima M. (1998). Parallel variable transformation in unconstrained optimization. SIAM J. Optim. 8: 658–672 CrossRefzbMATHMathSciNetGoogle Scholar
  21. 21.
    Fukushima M. and Mine H. (1981). A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12: 989–1000 CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Gould N.I.M., Orban D. and Toint Ph.L. (2003). CUTEr, a constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 29: 373–394 CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Grippo L. and Sciandrone M. (2000). On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Oper. Res. Lett. 26: 127–136 CrossRefzbMATHMathSciNetGoogle Scholar
  24. 24.
    Kelley C.T. (1999). Iterative Methods for Optimization. SIAM, Philadelphia zbMATHGoogle Scholar
  25. 25.
    Kiwiel K.C. (1986). A method for minimizing the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 48: 437–449 CrossRefzbMATHMathSciNetGoogle Scholar
  26. 26.
    Luo Z.-Q. and Tseng P. (1992). Error bounds and the convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 2: 43–54 CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Luo Z.-Q. and Tseng P. (1992). On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30: 408–425 CrossRefzbMATHMathSciNetGoogle Scholar
  28. 28.
    Luo Z.-Q. and Tseng P. (1993). On the convergence rate of dual ascent methods for linearly constrained convex minimization. Math. Oper. Res. 18: 846–867 zbMATHMathSciNetGoogle Scholar
  29. 29.
    Luo Z.-Q. and Tseng P. (1993). Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46: 157–178 CrossRefMathSciNetGoogle Scholar
  30. 30.
    Mangasarian O.L. (1984). Sparsity-preserving SOR algorithms for separable quadratic and linear programming. Comput. Oper. Res. 11: 105–112 CrossRefzbMATHMathSciNetGoogle Scholar
  31. 31.
    Mangasarian O.L. (1995). Parallel gradient distribution in unconstrained optimization. SIAM J. Control Optim. 33: 1916–1925 CrossRefzbMATHMathSciNetGoogle Scholar
  32. 32.
    Mangasarian O.L. and De Leone R. (1988). Parallel gradient projection successive overrelaxation for symmetric linear complementarity problems and linear programs. Ann. Oper. Res. 14: 41–59 CrossRefzbMATHMathSciNetGoogle Scholar
  33. 33.
    Mangasarian O.L. and Musicant D.R. (1999). Successive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10: 1032–1037 CrossRefGoogle Scholar
  34. 34.
    Mangasarian O.L. and Musicant D.R. (2002). Large scale kernel regression via linear programming. Mach. Learn. 46: 255–269 CrossRefzbMATHGoogle Scholar
  35. 35.
    Meier, L., van de Geer, S., Bühlmann, P.: The group Lasso for logistic regression. Report Seminar für Statistik, ETH Zürich, ZürichGoogle Scholar
  36. 36.
    Mine H. and Fukushima M. (1981). A minimization method for the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 33: 9–23 CrossRefzbMATHMathSciNetGoogle Scholar
  37. 37.
    Moré J.J., Garbow B.S. and Hillstrom K.E. (1981). Testing unconstrained optimization software. ACM Trans. Math. Softw. 7: 17–41 CrossRefzbMATHGoogle Scholar
  38. 38.
    Moré J.J. and Toraldo G. (1991). On the solution of large quadratic programming problems with bound constraints. SIAM J. Optim. 1: 93–113 CrossRefzbMATHMathSciNetGoogle Scholar
  39. 39.
    Murtagh, B.A., Saunders, M.A.: MINOS 5.5 user’s guide. Report SOL 83-20R. Department of Operations Research, Stanford University, Stanford (1998)Google Scholar
  40. 40.
    Nocedal J. (1980). Updating quasi-Newton matrices with limited storage. Math. Comp. 35: 773–782 CrossRefzbMATHMathSciNetGoogle Scholar
  41. 41.
    Nocedal J. and Wright S.J. (1999). Numerical Optimization. Springer, New York zbMATHGoogle Scholar
  42. 42.
    Ortega J.M. and Rheinboldt W.C. (2000). Iterative Solution of Nonlinear Equations in Several Variables. Reprinted by SIAM, Philadelphia zbMATHGoogle Scholar
  43. 43.
    Powell M.J.D. (1973). On search directions for minimization algorithms. Math. Program. 4: 193–201 CrossRefzbMATHMathSciNetGoogle Scholar
  44. 44.
    Robinson S.M. (1981). Some continuity properties of polyhedral multifunctions. Math. Program. Study 14: 206–214 zbMATHGoogle Scholar
  45. 45.
    Robinson S.M. (1999). Linear convergence of ε-subgradient descent methods for a class of convex functions. Math. Program. 86: 41–50 CrossRefzbMATHMathSciNetGoogle Scholar
  46. 46.
    Robinson, S.M.: Calmness and Lipschitz continuity for multifunctions. Report, Department of Industrial Engineering, University of Wisconsin, Madison (2006)Google Scholar
  47. 47.
    Rockafellar R.T. (1970). Convex Analysis. Princeton University Press, Princeton zbMATHGoogle Scholar
  48. 48.
    Rockafellar R.T. and Wets R.J.-B. (1998). Variational Analysis. Springer, New York zbMATHGoogle Scholar
  49. 49.
    Sardy S., Bruce A. and Tseng P. (2000). Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Stat. 9: 361–379 CrossRefMathSciNetGoogle Scholar
  50. 50.
    Sardy S., Bruce A. and Tseng P. (2001). Robust wavelet denoising. IEEE Trans. Signal Proc. 49: 1146–1152 CrossRefGoogle Scholar
  51. 51.
    Sardy S. and Tseng P. (2004). AMlet, RAMlet and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets. J. Comput. Graph. Stat. 13: 283–309 CrossRefMathSciNetGoogle Scholar
  52. 52.
    Sardy S. and Tseng P. (2004). On the statistical analysis of smoothing by maximizing dirty Markov random field posterior distributions. J. Am. Stat. Assoc. 99: 191–204 CrossRefzbMATHMathSciNetGoogle Scholar
  53. 53.
    Tseng P. (1991). On the rate of convergence of a partially asynchronous gradient projection algorithm. SIAM J. Optim. 1: 603–619 CrossRefzbMATHMathSciNetGoogle Scholar
  54. 54.
    Tseng P. (1993). Dual coordinate ascent methods for non-strictly convex minimization. Math. Program. 59: 231–247 CrossRefMathSciNetGoogle Scholar
  55. 55.
    Tseng P. (2001). Convergence of block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109: 473–492 CrossRefMathSciNetGoogle Scholar
  56. 56.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Report, Department of Mathematics, University of Washington, Seattle; June 2006; revised February 2007. http://www.math.washington.edu/~tseng/papers.htmlGoogle Scholar
  57. 57.
    Vapnik, V., Golowich, S.E., Smola, A.: Support vector method for function approximation, regression estimation, and signal processing. In: Mozer, M.C., Jordan, M.I., and Petsche, T., (eds.) Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)Google Scholar
  58. 58.
    Yuan L. and Lin Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. 68: 49–67 CrossRefzbMATHMathSciNetGoogle Scholar
  59. 59.
    Zhu C., Byrd R.H. and Nocedal J. (1997). L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization. ACM Trans. Math. Softw. 23: 550–560CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.Department of MathematicsUniversity of WashingtonSeattleUSA

Personalised recommendations