A coordinate gradient descent method for nonsmooth separable minimization

Tseng, Paul; Yun, Sangwoon

doi:10.1007/s10107-007-0170-0

A coordinate gradient descent method for nonsmooth separable minimization

FULL LENGTH PAPER
Published: 01 August 2007

Volume 117, pages 387–423, (2009)
Cite this article

Mathematical Programming Submit manuscript

Paul Tseng¹ &
Sangwoon Yun¹

6541 Accesses
453 Citations
3 Altmetric
Explore all metrics

Abstract

We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ₁-regularization. We propose a (block) coordinate gradient descent method for solving this class of nonsmooth separable problems. We establish global convergence and, under a local Lipschitzian error bound assumption, linear convergence for this method. The local Lipschitzian error bound holds under assumptions analogous to those for constrained smooth optimization, e.g., the convex function is polyhedral and the smooth function is (nonconvex) quadratic or is the composition of a strongly convex function with a linear mapping. We report numerical experience with solving the ℓ₁-regularization of unconstrained optimization problems from Moré et al. in ACM Trans. Math. Softw. 7, 17–41, 1981 and from the CUTEr set (Gould and Orban in ACM Trans. Math. Softw. 29, 373–394, 2003). Comparison with L-BFGS-B and MINOS, applied to a reformulation of the ℓ₁-regularized problem as a bound-constrained optimization problem, is also reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

Article 13 April 2024

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Article 03 April 2024

References

Auslender A. (1978). Minimisation de fonctions localement lipschitziennes: applications à la programmation mi-convexe, mi-différentiable. In: Mangasarian, O.L., Meyer, R.R. and Robinson, S.M. (eds) Nonlinear Programming, vol 3., pp 429–460. Academic, New York
Google Scholar
Balakrishnan, S.: Private communication (2006)
Bertsekas D.P. (1982). Constrained Optimization and Lagrange Multiplier Methods. Academic, New York
MATH Google Scholar
Bertsekas D.P. (1999). Nonlinear Programming, 2nd edn. Athena Scientific, Belmont
Google Scholar
Bradley P.S., Fayyad U.M. and Mangasarian O.L. (1999). Mathematical programming for data mining: formulations and challenges. INFORMS J. Comput. 11: 217–238
Article MATH MathSciNet Google Scholar
Burke J.V. (1985). Descent methods for composite nondifferentiable optimization problems. Math. Program. 33: 260–279
Article MATH MathSciNet Google Scholar
Chen S., Donoho D. and Saunders M. (1999). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20: 33–61
Article MATH MathSciNet Google Scholar
Censor Y. and Zenios S.A. (1997). Parallel Optimization: Theory, Algorithms and Applications. Oxford University Press, New York
MATH Google Scholar
Conn A.R., Gould N.I.M. and Toint Ph.L. (2000). Trust-Region Methods. SIAM, Philadelphia
MATH Google Scholar
Donoho D.L. and Johnstone I.M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81: 425–455
Article MATH MathSciNet Google Scholar
Donoho D.L. and Johnstone I.M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90: 1200–1224
Article MATH MathSciNet Google Scholar
Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems 9. MIT Press, Cambridge (1997)
Facchinei F., Fischer A. and Kanzow C. (1998). On the accurate identification of active constraints. SIAM J. Optim. 9: 14–32
Article MATH MathSciNet Google Scholar
Facchinei F. and Pang J.-S. (2003). Finite-Dimensional Variational Inequalities and Complementarity Problems, Vols. I and II. Springer, New York
Google Scholar
Ferris M.C. and Mangasarian O.L. (1994). Parallel variable distribution. SIAM J. Optim. 4: 815–832
Article MATH MathSciNet Google Scholar
Fletcher R. (1982). A model algorithm for composite nondifferentiable optimization problems. Math. Program. Study 17: 67–76
MATH MathSciNet Google Scholar
Fletcher R. (1987). Practical Methods of Optimization, 2nd edn. Wiley, Chichester
Google Scholar
Fletcher R. (1994). An overview of unconstrained optimization. In: Spedicato, E. (eds) Algorithms for Continuous Optimization., pp 109–143. Kluwer, Dordrecht
Google Scholar
Fukushima M. (1990). A successive quadratic programming method for a class of constrained nonsmooth optimization problems. Math. Program. 49: 231–251
Article MATH MathSciNet Google Scholar
Fukushima M. (1998). Parallel variable transformation in unconstrained optimization. SIAM J. Optim. 8: 658–672
Article MATH MathSciNet Google Scholar
Fukushima M. and Mine H. (1981). A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12: 989–1000
Article MATH MathSciNet Google Scholar
Gould N.I.M., Orban D. and Toint Ph.L. (2003). CUTEr, a constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 29: 373–394
Article MATH MathSciNet Google Scholar
Grippo L. and Sciandrone M. (2000). On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Oper. Res. Lett. 26: 127–136
Article MATH MathSciNet Google Scholar
Kelley C.T. (1999). Iterative Methods for Optimization. SIAM, Philadelphia
MATH Google Scholar
Kiwiel K.C. (1986). A method for minimizing the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 48: 437–449
Article MATH MathSciNet Google Scholar
Luo Z.-Q. and Tseng P. (1992). Error bounds and the convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 2: 43–54
Article MATH MathSciNet Google Scholar
Luo Z.-Q. and Tseng P. (1992). On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30: 408–425
Article MATH MathSciNet Google Scholar
Luo Z.-Q. and Tseng P. (1993). On the convergence rate of dual ascent methods for linearly constrained convex minimization. Math. Oper. Res. 18: 846–867
MATH MathSciNet Google Scholar
Luo Z.-Q. and Tseng P. (1993). Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46: 157–178
Article MathSciNet Google Scholar
Mangasarian O.L. (1984). Sparsity-preserving SOR algorithms for separable quadratic and linear programming. Comput. Oper. Res. 11: 105–112
Article MATH MathSciNet Google Scholar
Mangasarian O.L. (1995). Parallel gradient distribution in unconstrained optimization. SIAM J. Control Optim. 33: 1916–1925
Article MATH MathSciNet Google Scholar
Mangasarian O.L. and De Leone R. (1988). Parallel gradient projection successive overrelaxation for symmetric linear complementarity problems and linear programs. Ann. Oper. Res. 14: 41–59
Article MATH MathSciNet Google Scholar
Mangasarian O.L. and Musicant D.R. (1999). Successive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10: 1032–1037
Article Google Scholar
Mangasarian O.L. and Musicant D.R. (2002). Large scale kernel regression via linear programming. Mach. Learn. 46: 255–269
Article MATH Google Scholar
Meier, L., van de Geer, S., Bühlmann, P.: The group Lasso for logistic regression. Report Seminar für Statistik, ETH Zürich, Zürich
Mine H. and Fukushima M. (1981). A minimization method for the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 33: 9–23
Article MATH MathSciNet Google Scholar
Moré J.J., Garbow B.S. and Hillstrom K.E. (1981). Testing unconstrained optimization software. ACM Trans. Math. Softw. 7: 17–41
Article MATH Google Scholar
Moré J.J. and Toraldo G. (1991). On the solution of large quadratic programming problems with bound constraints. SIAM J. Optim. 1: 93–113
Article MATH MathSciNet Google Scholar
Murtagh, B.A., Saunders, M.A.: MINOS 5.5 user’s guide. Report SOL 83-20R. Department of Operations Research, Stanford University, Stanford (1998)
Nocedal J. (1980). Updating quasi-Newton matrices with limited storage. Math. Comp. 35: 773–782
Article MATH MathSciNet Google Scholar
Nocedal J. and Wright S.J. (1999). Numerical Optimization. Springer, New York
MATH Google Scholar
Ortega J.M. and Rheinboldt W.C. (2000). Iterative Solution of Nonlinear Equations in Several Variables. Reprinted by SIAM, Philadelphia
MATH Google Scholar
Powell M.J.D. (1973). On search directions for minimization algorithms. Math. Program. 4: 193–201
Article MATH MathSciNet Google Scholar
Robinson S.M. (1981). Some continuity properties of polyhedral multifunctions. Math. Program. Study 14: 206–214
MATH Google Scholar
Robinson S.M. (1999). Linear convergence of ε-subgradient descent methods for a class of convex functions. Math. Program. 86: 41–50
Article MATH MathSciNet Google Scholar
Robinson, S.M.: Calmness and Lipschitz continuity for multifunctions. Report, Department of Industrial Engineering, University of Wisconsin, Madison (2006)
Rockafellar R.T. (1970). Convex Analysis. Princeton University Press, Princeton
MATH Google Scholar
Rockafellar R.T. and Wets R.J.-B. (1998). Variational Analysis. Springer, New York
MATH Google Scholar
Sardy S., Bruce A. and Tseng P. (2000). Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Stat. 9: 361–379
Article MathSciNet Google Scholar
Sardy S., Bruce A. and Tseng P. (2001). Robust wavelet denoising. IEEE Trans. Signal Proc. 49: 1146–1152
Article Google Scholar
Sardy S. and Tseng P. (2004). AMlet, RAMlet and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets. J. Comput. Graph. Stat. 13: 283–309
Article MathSciNet Google Scholar
Sardy S. and Tseng P. (2004). On the statistical analysis of smoothing by maximizing dirty Markov random field posterior distributions. J. Am. Stat. Assoc. 99: 191–204
Article MATH MathSciNet Google Scholar
Tseng P. (1991). On the rate of convergence of a partially asynchronous gradient projection algorithm. SIAM J. Optim. 1: 603–619
Article MATH MathSciNet Google Scholar
Tseng P. (1993). Dual coordinate ascent methods for non-strictly convex minimization. Math. Program. 59: 231–247
Article MathSciNet Google Scholar
Tseng P. (2001). Convergence of block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109: 473–492
Article MathSciNet Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Report, Department of Mathematics, University of Washington, Seattle; June 2006; revised February 2007. http://www.math.washington.edu/~tseng/papers.html
Vapnik, V., Golowich, S.E., Smola, A.: Support vector method for function approximation, regression estimation, and signal processing. In: Mozer, M.C., Jordan, M.I., and Petsche, T., (eds.) Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)
Yuan L. and Lin Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. 68: 49–67
Article MATH MathSciNet Google Scholar
Zhu C., Byrd R.H. and Nocedal J. (1997). L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization. ACM Trans. Math. Softw. 23: 550–560
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Washington, Seattle, WA, 98195, USA
Paul Tseng & Sangwoon Yun

Authors

Paul Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Sangwoon Yun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul Tseng.

Additional information

In honor of Steve Robinson, for his many contributions to mathematical programming, including the subject of this paper: nonsmooth optimization and error bound.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tseng, P., Yun, S. A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009). https://doi.org/10.1007/s10107-007-0170-0

Download citation

Received: 05 June 2006
Accepted: 24 April 2007
Published: 01 August 2007
Issue Date: March 2009
DOI: https://doi.org/10.1007/s10107-007-0170-0

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A coordinate gradient descent method for nonsmooth separable minimization

Abstract

Access this article

Similar content being viewed by others

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

A coordinate gradient descent method for nonsmooth separable minimization

Abstract

Access this article

Similar content being viewed by others

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation