# Gradient methods for minimizing composite functions

- 6.6k Downloads
- 233 Citations

## Abstract

In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two terms: one is smooth and given by a black-box oracle, and another is a simple general convex function with known structure. Despite the absence of good properties of the sum, such problems, both in convex and nonconvex cases, can be solved with efficiency typical for the first part of the objective. For convex problems of the above structure, we consider primal and dual variants of the gradient method (with convergence rate \(O\left({1 \over k}\right)\)), and an accelerated multistep version with convergence rate \(O\left({1 \over k^2}\right)\), where \(k\) is the iteration counter. For nonconvex problems with this structure, we prove convergence to a point from which there is no descent direction. In contrast, we show that for general nonsmooth, nonconvex problems, even resolving the question of whether a descent direction exists from a point is NP-hard. For all methods, we suggest some efficient “line search” procedures and show that the additional computational work necessary for estimating the unknown problem class parameters can only multiply the complexity of each iteration by a small constant factor. We present also the results of preliminary computational experiments, which confirm the superiority of the accelerated scheme.

## Keywords

Local optimization Convex Optimization Nonsmooth optimization Complexity theory Black-box model Optimal methods Structural optimization \(l_1\)-Regularization## Mathematics Subject Classification

90C25 90C47 68Q25## Notes

### Acknowledgments

The author would like to thank M. Overton, Y. Xia, and anonymous referees for numerous useful suggestions.

## References

- 1.Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput.
**20**, 33–61 (1998)MathSciNetCrossRefGoogle Scholar - 2.Claerbout, J., Muir, F.: Robust modelling of eratic data. Geophysics
**38**, 826–844 (1973)CrossRefGoogle Scholar - 3.Figueiredo, M., Novak, R., Wright, S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. Submitted for publicationGoogle Scholar
- 4.Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain nonconvex problems. Int. J. Sys. Sci.
**12**(8), 989–1000 (1981)MathSciNetzbMATHCrossRefGoogle Scholar - 5.Kim, S.-J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: A method for large-scale \(l_1\)-regularized least-squares problems with applications in signal processing and statistics. Stanford University, March 20, Research report (2007)Google Scholar
- 6.Levy, S., Fullagar, P.: Reconstruction of a sparse spike train from a portion of its spectrum and application to high-resolution deconvolution. Geophysics
**46**, 1235–1243 (1981)CrossRefGoogle Scholar - 7.Miller, A.: Subset Selection in Regression. Chapman and Hall, London (2002)zbMATHCrossRefGoogle Scholar
- 8.Nemirovsky, A., Yudin, D.: Informational Complexity and Efficient Methods for Solution of Convex Extremal Problems. Wiley, New-York (1983)Google Scholar
- 9.Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer, Boston (2004)zbMATHGoogle Scholar
- 10.Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. (A)
**103**(1), 127–152 (2005)MathSciNetzbMATHCrossRefGoogle Scholar - 11.Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE Discussion Paper \(\#\) 2007/76, CORE (2007)Google Scholar
- 12.Nesterov, Y.: Rounding of convex sets and efficient gradient methods for linear programming problems. Optim. Methods Softw.
**23**(1), 109–135 (2008)MathSciNetzbMATHCrossRefGoogle Scholar - 13.Nesterov, Y.: Accelerating the cubic regularization of Newton’s method on convex problems. Math. Program.
**112**(1), 159–181 (2008)MathSciNetzbMATHCrossRefGoogle Scholar - 14.Nesterov, Y., Nemirovskii, A.: Interior Point Polynomial Methods in Convex Programming: Theory and Applications. SIAM, Philadelphia (1994)CrossRefGoogle Scholar
- 15.Ortega, J., Rheinboldt, W.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)zbMATHGoogle Scholar
- 16.Santosa, F., Symes, W.: Linear inversion of band-limited reflection histograms. SIAM J. Sci. Stat. Comput.
**7**, 1307–1330 (1986)MathSciNetzbMATHCrossRefGoogle Scholar - 17.Taylor, H., Bank, S., McCoy, J.: Deconvolution with the \(l_1\) norm. Geophysics
**44**, 39–52 (1979)CrossRefGoogle Scholar - 18.Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B
**58**, 267–288 (1996)MathSciNetzbMATHGoogle Scholar - 19.Tropp, J.: Just relax: convex programming methods for identifying sparse signals. IEEE Trans. Inf. Theory
**51**, 1030–1051 (2006)MathSciNetCrossRefGoogle Scholar - 20.Wright, S.J.: Solving \(l_{1}\)-Regularized Regression Problems. Talk at International Conference “Combinatorics and Optimization”, Waterloo (June 2007)Google Scholar