Abstract
We consider in this paper a class of composite optimization problems whose objective function is given by the summation of a general smooth and nonsmooth component, together with a relatively simple nonsmooth term. We present a new class of first-order methods, namely the gradient sliding algorithms, which can skip the computation of the gradient for the smooth component from time to time. As a consequence, these algorithms require only \(\mathcal{O}(1/\sqrt{\epsilon })\) gradient evaluations for the smooth component in order to find an \(\epsilon \)-solution for the composite problem, while still maintaining the optimal \(\mathcal{O}(1/\epsilon ^2)\) bound on the total number of subgradient evaluations for the nonsmooth component. We then present a stochastic counterpart for these algorithms and establish similar complexity bounds for solving an important class of stochastic composite optimization problems. Moreover, if the smooth component in the composite function is strongly convex, the developed gradient sliding algorithms can significantly reduce the number of graduate and subgradient evaluations for the smooth and nonsmooth component to \(\mathcal{O} (\log (1/\epsilon ))\) and \(\mathcal{O}(1/\epsilon )\), respectively. Finally, we generalize these algorithms to the case when the smooth component is replaced by a nonsmooth one possessing a certain bi-linear saddle point structure.
Similar content being viewed by others
References
Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Controal Optim. 42, 596–636 (2003)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Bregman, L.M.: The relaxation method of finding the common point convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Phys. 7, 200–217 (1967)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)
Jacob, L., Obozinski, G., Vert, J.-P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th International Conference on Machine Learning (2009)
Juditsky, A., Nemirovski, A.S., Tauvel, C.: Solving Variational Inequalities with Stochastic Mirror-Prox Algorithm. Georgia Institute of Technology, Atlanta (2011)
Kiwiel, K.C.: Proximal minimization methods with generalized Bregman functions. SIAM J. Controal Optim. 35, 1142–1168 (1997)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)
Lan, G.: The Complexity of Large-Scale Convex Programming Under a Linear Optimization Oracle. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL (2013). http://www.optimization-online.org/
Lan, G., Nemirovski, A.S., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134, 425–458 (2012)
Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Convex and network flow optimization for structured sparsity. J. Mach. Learn. Res. 12, 2681–2720 (2011)
Nesterov, Y.E.: A method for unconstrained convex minimization problem with the rate of convergence \(O(1/k^2)\). Dokl. SSSR 269, 543–547 (1983)
Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Massachusetts (2004)
Nesterov, Y.E.: Smooth minimization of nonsmooth functions. Math. Program. 103, 127–152 (2005)
Nesterov, Y.E.: Gradient Methods for Minimizing Composite Objective Functions. Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (2007)
Schmidt, M., Roux, N.L., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. Adv. Neural Inf. Process. Syst. 24, 1458–1466 (2011)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. B 67(1), 91–108 (2005)
Tomioka, R., Suzuki, T., Hayashi, K., Kashima, H.: Statistical performance of convex tensor decomposition. Adv. Neural Inf. Process. Syst. 24 (2011)
Tseng, P.: On Accelerated Proximal Gradient Methods for Convex–Concave Optimization. University of Washington, Seattle (2008)
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 3, 1607–1633 (2013)
Author information
Authors and Affiliations
Corresponding author
Additional information
The author of this paper was partially supported by NSF CAREER Award CMMI-1254446, CMMI-1537414 NSF Grant DMS-1319050, and ONR Grant N00014-13-1-0036.
Rights and permissions
About this article
Cite this article
Lan, G. Gradient sliding for composite optimization. Math. Program. 159, 201–235 (2016). https://doi.org/10.1007/s10107-015-0955-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-015-0955-5