An optimal randomized incremental gradient method

Lan, Guanghui; Zhou, Yi

doi:10.1007/s10107-017-1173-0

An optimal randomized incremental gradient method

Full Length Paper
Series A
Published: 28 June 2017

Volume 171, pages 167–215, (2018)
Cite this article

Mathematical Programming Submit manuscript

2287 Accesses
67 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, we consider a class of finite-sum convex optimization problems whose objective function is given by the average of \(m\, ({\ge }1)\) smooth components together with some other relatively simple terms. We first introduce a deterministic primal–dual gradient (PDG) method that can achieve the optimal black-box iteration complexity for solving these composite optimization problems using a primal–dual termination criterion. Our major contribution is to develop a randomized primal–dual gradient (RPDG) method, which needs to compute the gradient of only one randomly selected smooth component at each iteration, but can possibly achieve better complexity than PDG in terms of the total number of gradient evaluations. More specifically, we show that the total number of gradient evaluations performed by RPDG can be \({{\mathcal {O}}} (\sqrt{m})\) times smaller, both in expectation and with high probability, than those performed by deterministic optimal first-order methods under favorable situations. We also show that the complexity of the RPDG method is not improvable by developing a new lower complexity bound for a general class of randomized methods for solving large-scale finite-sum convex optimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Restarting Frank–Wolfe: Faster Rates under Hölderian Error Bounds

Article 30 January 2022

Globally linearly convergent nonlinear conjugate gradients without Wolfe line search

Article 09 February 2024

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

Article 01 June 2019

Notes

Observe that the subgradients of h and \(\omega \) are not required due to the assumption in (1.5).
Suppose that \(f_i\) are Lipschitz continuous with constants \(M_i\) and let us denote \(M := \textstyle {\sum }_{i=1}^m M_i\), we should set \(\nu _i = M_i/M\) in order to get the optimal complexity for SGDs.
As pointed out by one anonymous reviewer, the authors of [9] had also later mentioned in the published version of their paper the possibility of incorporating more general Bregman distance for strongly convex problems, although no detailed information is provided.
Relative accuracy is a common termination criteria for unconstrained problems, see [36] for a similar example.

References

Agarwal, A., Bottou, L.: A Lower Bound for the Optimization of Finite Sums. ArXiv e-prints, Oct 2014
Allen-Zhu, Z., Hazan, E.: Optimal black-box reductions between optimization objectives. arXiv preprint arXiv:1603.05642 (2016)
Arjevani, Y., Shamir, O.: Dimension-free iteration complexity of finite sum optimization problems. arXiv preprint arXiv:1606.09333 (2016)
Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. In: Nowozin, S., Sra, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 85–119. MIT Press, Cambridge (2012). (Extended version: LIDS report LIDS-P2848, MIT, 2010)
Google Scholar
Bregman, L.M.: The relaxation method of finding the common point convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Phys. 7, 200–217 (1967)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm, Oct. 30, 2014
Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Article MathSciNet MATH Google Scholar
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155, 57–79 (2016)
Chen, Y., Lan, G., Ouyang, Y.: Optimal primal–dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)
Article MathSciNet MATH Google Scholar
Dang, C., Lan, G.: Randomized First-Order Methods for Saddle Point Optimization. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, Sept 2014
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. Adv. Neural Inf. Process. Syst. (NIPS) 27, 1646–1654 (2014)
Google Scholar
Fercoq, O., Richtárik, P.: Smooth minimization of nonsmooth functions with parallel coordinate descent methods. ArXiv e-prints, Sept 2013
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic optimization. Technical report, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, June 2013
Hiriart-Urruty, J.-B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, New York (2012)
MATH Google Scholar
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. (NIPS) 26, 315–323 (2013)
Google Scholar
Juditsky, A., Nemirovski, A.S.: First-order methods for nonsmooth convex large-scale optimization, I: general purpose methods. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press, Cambridge (2011)
Google Scholar
Kiwiel, K.C.: Proximal minimization methods with generalized Bregman functions. SIAM J. Control Optim. 35, 1142–1168 (1997)
Article MathSciNet MATH Google Scholar
Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)
Article MathSciNet MATH Google Scholar
Lan, G., Lu, Z., Monteiro, R.D.C.: Primal–dual first-order methods with \({{\cal{O}}}(1/\epsilon )\) iteration-complexity for cone programming. Math. Program. 126, 1–29 (2011)
Article MathSciNet MATH Google Scholar
Lan, G., Nemirovski, A.S., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134, 425–458 (2012)
Article MathSciNet MATH Google Scholar
Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. Technical report, 2015. hal-01160728
Lin, Q., Lu, Z., Xiao, L.: An accelerated proximal coordinate gradient method and its application to regularized empirical risk minimization. Technical report, 2014. no. MSR-TR-2014-94
Nemirovski, A.S.: Prox-method with rate of convergence \(o(1/t)\) for variational inequalities with lipschitz continuous monotone operators and smooth convex–concave saddle point problems. SIAM J. Optim. 15, 229–251 (2005)
Article MathSciNet MATH Google Scholar
Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)
Article MathSciNet MATH Google Scholar
Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization, vol. XV. Wiley-Interscience Series in Discrete Mathematics. Wiley, New York (1983)
Google Scholar
Nesterov, Y.E.: A method for unconstrained convex minimization problem with the rate of convergence \(O(1/k^2)\). Dokl. AN SSSR 269, 543–547 (1983)
Google Scholar
Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)
Book MATH Google Scholar
Nesterov, Y.E.: Smooth minimization of nonsmooth functions. Math. Program. 103, 127–152 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.E.: Efficiency of coordinate descent methods on huge-scale optimization problems. Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, Feb 2010
Nesterov, Y.E.: Gradient methods for minimizing composite objective functions. Math. Program. Ser. B 140, 125–161 (2013)
Article MATH Google Scholar
Nesterov, Y.: Unconstrained convex minimization in relative scale. Math. Oper. Res. 34(1), 180–193 (2009)
Article MathSciNet MATH Google Scholar
Schmidt, M., Roux, N.L., Bach, F.: Minimizing finite sums with the stochastic average gradient. Technical report, Sept 2013
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567599 (2013)
MathSciNet MATH Google Scholar
Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155, 105–145 (2016)
Tseng, P.: On accelerated proximal gradient methods for convex–concave optimization. Manuscript, University of Washington, Seattle, May 2008
Winston, W.L., Goldberg, J.B.: Operations Research: Applications and Algorithms, vol. 3. Duxbury Press, Belmont (2004)
Google Scholar
Woodworth, B., Srebro, N.: Tight complexity bounds for optimizing composite objectives. arXiv preprint arXiv:1605.08003 (2016)
Zhang, Y., Xiao, L.: Stochastic primal–dual coordinate method for regularized empirical risk minimization. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 353–361 (2015)

Download references

Author information

Authors and Affiliations

School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
Guanghui Lan & Yi Zhou

Authors

Guanghui Lan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guanghui Lan.

Additional information

The author of this paper was partially supported by NSF Grant CMMI-1537414, DMS-1319050, ONR Grant N00014-13-1-0036 and NSF CAREER Award CMMI-1254446.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lan, G., Zhou, Y. An optimal randomized incremental gradient method. Math. Program. 171, 167–215 (2018). https://doi.org/10.1007/s10107-017-1173-0

Download citation

Received: 18 October 2015
Accepted: 19 June 2017
Published: 28 June 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10107-017-1173-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optimal randomized incremental gradient method

Abstract

Access this article

Similar content being viewed by others

Restarting Frank–Wolfe: Faster Rates under Hölderian Error Bounds

Globally linearly convergent nonlinear conjugate gradients without Wolfe line search

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

An optimal randomized incremental gradient method

Abstract

Access this article

Similar content being viewed by others

Restarting Frank–Wolfe: Faster Rates under Hölderian Error Bounds

Globally linearly convergent nonlinear conjugate gradients without Wolfe line search

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation