Abstract
In this paper, we consider a class of finite-sum convex optimization problems whose objective function is given by the average of \(m\, ({\ge }1)\) smooth components together with some other relatively simple terms. We first introduce a deterministic primal–dual gradient (PDG) method that can achieve the optimal black-box iteration complexity for solving these composite optimization problems using a primal–dual termination criterion. Our major contribution is to develop a randomized primal–dual gradient (RPDG) method, which needs to compute the gradient of only one randomly selected smooth component at each iteration, but can possibly achieve better complexity than PDG in terms of the total number of gradient evaluations. More specifically, we show that the total number of gradient evaluations performed by RPDG can be \({{\mathcal {O}}} (\sqrt{m})\) times smaller, both in expectation and with high probability, than those performed by deterministic optimal first-order methods under favorable situations. We also show that the complexity of the RPDG method is not improvable by developing a new lower complexity bound for a general class of randomized methods for solving large-scale finite-sum convex optimization problems.
Similar content being viewed by others
Notes
Observe that the subgradients of h and \(\omega \) are not required due to the assumption in (1.5).
Suppose that \(f_i\) are Lipschitz continuous with constants \(M_i\) and let us denote \(M := \textstyle {\sum }_{i=1}^m M_i\), we should set \(\nu _i = M_i/M\) in order to get the optimal complexity for SGDs.
As pointed out by one anonymous reviewer, the authors of [9] had also later mentioned in the published version of their paper the possibility of incorporating more general Bregman distance for strongly convex problems, although no detailed information is provided.
Relative accuracy is a common termination criteria for unconstrained problems, see [36] for a similar example.
References
Agarwal, A., Bottou, L.: A Lower Bound for the Optimization of Finite Sums. ArXiv e-prints, Oct 2014
Allen-Zhu, Z., Hazan, E.: Optimal black-box reductions between optimization objectives. arXiv preprint arXiv:1603.05642 (2016)
Arjevani, Y., Shamir, O.: Dimension-free iteration complexity of finite sum optimization problems. arXiv preprint arXiv:1606.09333 (2016)
Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. In: Nowozin, S., Sra, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 85–119. MIT Press, Cambridge (2012). (Extended version: LIDS report LIDS-P2848, MIT, 2010)
Bregman, L.M.: The relaxation method of finding the common point convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Phys. 7, 200–217 (1967)
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm, Oct. 30, 2014
Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155, 57–79 (2016)
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal–dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Dang, C., Lan, G.: Randomized First-Order Methods for Saddle Point Optimization. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, Sept 2014
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. Adv. Neural Inf. Process. Syst. (NIPS) 27, 1646–1654 (2014)
Fercoq, O., Richtárik, P.: Smooth minimization of nonsmooth functions with parallel coordinate descent methods. ArXiv e-prints, Sept 2013
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic optimization. Technical report, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, June 2013
Hiriart-Urruty, J.-B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, New York (2012)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. (NIPS) 26, 315–323 (2013)
Juditsky, A., Nemirovski, A.S.: First-order methods for nonsmooth convex large-scale optimization, I: general purpose methods. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press, Cambridge (2011)
Kiwiel, K.C.: Proximal minimization methods with generalized Bregman functions. SIAM J. Control Optim. 35, 1142–1168 (1997)
Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)
Lan, G., Lu, Z., Monteiro, R.D.C.: Primal–dual first-order methods with \({{\cal{O}}}(1/\epsilon )\) iteration-complexity for cone programming. Math. Program. 126, 1–29 (2011)
Lan, G., Nemirovski, A.S., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134, 425–458 (2012)
Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. Technical report, 2015. hal-01160728
Lin, Q., Lu, Z., Xiao, L.: An accelerated proximal coordinate gradient method and its application to regularized empirical risk minimization. Technical report, 2014. no. MSR-TR-2014-94
Nemirovski, A.S.: Prox-method with rate of convergence \(o(1/t)\) for variational inequalities with lipschitz continuous monotone operators and smooth convex–concave saddle point problems. SIAM J. Optim. 15, 229–251 (2005)
Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)
Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization, vol. XV. Wiley-Interscience Series in Discrete Mathematics. Wiley, New York (1983)
Nesterov, Y.E.: A method for unconstrained convex minimization problem with the rate of convergence \(O(1/k^2)\). Dokl. AN SSSR 269, 543–547 (1983)
Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)
Nesterov, Y.E.: Smooth minimization of nonsmooth functions. Math. Program. 103, 127–152 (2005)
Nesterov, Y.E.: Efficiency of coordinate descent methods on huge-scale optimization problems. Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, Feb 2010
Nesterov, Y.E.: Gradient methods for minimizing composite objective functions. Math. Program. Ser. B 140, 125–161 (2013)
Nesterov, Y.: Unconstrained convex minimization in relative scale. Math. Oper. Res. 34(1), 180–193 (2009)
Schmidt, M., Roux, N.L., Bach, F.: Minimizing finite sums with the stochastic average gradient. Technical report, Sept 2013
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567599 (2013)
Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155, 105–145 (2016)
Tseng, P.: On accelerated proximal gradient methods for convex–concave optimization. Manuscript, University of Washington, Seattle, May 2008
Winston, W.L., Goldberg, J.B.: Operations Research: Applications and Algorithms, vol. 3. Duxbury Press, Belmont (2004)
Woodworth, B., Srebro, N.: Tight complexity bounds for optimizing composite objectives. arXiv preprint arXiv:1605.08003 (2016)
Zhang, Y., Xiao, L.: Stochastic primal–dual coordinate method for regularized empirical risk minimization. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 353–361 (2015)
Author information
Authors and Affiliations
Corresponding author
Additional information
The author of this paper was partially supported by NSF Grant CMMI-1537414, DMS-1319050, ONR Grant N00014-13-1-0036 and NSF CAREER Award CMMI-1254446.
Rights and permissions
About this article
Cite this article
Lan, G., Zhou, Y. An optimal randomized incremental gradient method. Math. Program. 171, 167–215 (2018). https://doi.org/10.1007/s10107-017-1173-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-017-1173-0
Keywords
- Convex programming
- Complexity
- Incremental gradient
- Primal–dual gradient method
- Nesterov’s method
- Data analysis