Skip to main content
Log in

An optimal randomized incremental gradient method

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

In this paper, we consider a class of finite-sum convex optimization problems whose objective function is given by the average of \(m\, ({\ge }1)\) smooth components together with some other relatively simple terms. We first introduce a deterministic primal–dual gradient (PDG) method that can achieve the optimal black-box iteration complexity for solving these composite optimization problems using a primal–dual termination criterion. Our major contribution is to develop a randomized primal–dual gradient (RPDG) method, which needs to compute the gradient of only one randomly selected smooth component at each iteration, but can possibly achieve better complexity than PDG in terms of the total number of gradient evaluations. More specifically, we show that the total number of gradient evaluations performed by RPDG can be \({{\mathcal {O}}} (\sqrt{m})\) times smaller, both in expectation and with high probability, than those performed by deterministic optimal first-order methods under favorable situations. We also show that the complexity of the RPDG method is not improvable by developing a new lower complexity bound for a general class of randomized methods for solving large-scale finite-sum convex optimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Observe that the subgradients of h and \(\omega \) are not required due to the assumption in (1.5).

  2. Suppose that \(f_i\) are Lipschitz continuous with constants \(M_i\) and let us denote \(M := \textstyle {\sum }_{i=1}^m M_i\), we should set \(\nu _i = M_i/M\) in order to get the optimal complexity for SGDs.

  3. As pointed out by one anonymous reviewer, the authors of [9] had also later mentioned in the published version of their paper the possibility of incorporating more general Bregman distance for strongly convex problems, although no detailed information is provided.

  4. Relative accuracy is a common termination criteria for unconstrained problems, see [36] for a similar example.

References

  1. Agarwal, A., Bottou, L.: A Lower Bound for the Optimization of Finite Sums. ArXiv e-prints, Oct 2014

  2. Allen-Zhu, Z., Hazan, E.: Optimal black-box reductions between optimization objectives. arXiv preprint arXiv:1603.05642 (2016)

  3. Arjevani, Y., Shamir, O.: Dimension-free iteration complexity of finite sum optimization problems. arXiv preprint arXiv:1606.09333 (2016)

  4. Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  6. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. In: Nowozin, S., Sra, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 85–119. MIT Press, Cambridge (2012). (Extended version: LIDS report LIDS-P2848, MIT, 2010)

    Google Scholar 

  8. Bregman, L.M.: The relaxation method of finding the common point convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Phys. 7, 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal–dual algorithm, Oct. 30, 2014

  10. Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  11. Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155, 57–79 (2016)

  12. Chen, Y., Lan, G., Ouyang, Y.: Optimal primal–dual methods for a class of saddle point problems. SIAM J. Optim. 24(4), 1779–1814 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Dang, C., Lan, G.: Randomized First-Order Methods for Saddle Point Optimization. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, Sept 2014

  14. Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. Adv. Neural Inf. Process. Syst. (NIPS) 27, 1646–1654 (2014)

    Google Scholar 

  15. Fercoq, O., Richtárik, P.: Smooth minimization of nonsmooth functions with parallel coordinate descent methods. ArXiv e-prints, Sept 2013

  16. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  17. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic optimization. Technical report, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, June 2013

  19. Hiriart-Urruty, J.-B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, New York (2012)

    MATH  Google Scholar 

  20. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst. (NIPS) 26, 315–323 (2013)

    Google Scholar 

  21. Juditsky, A., Nemirovski, A.S.: First-order methods for nonsmooth convex large-scale optimization, I: general purpose methods. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press, Cambridge (2011)

    Google Scholar 

  22. Kiwiel, K.C.: Proximal minimization methods with generalized Bregman functions. SIAM J. Control Optim. 35, 1142–1168 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  23. Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  24. Lan, G., Lu, Z., Monteiro, R.D.C.: Primal–dual first-order methods with \({{\cal{O}}}(1/\epsilon )\) iteration-complexity for cone programming. Math. Program. 126, 1–29 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  25. Lan, G., Nemirovski, A.S., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134, 425–458 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  26. Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. Technical report, 2015. hal-01160728

  27. Lin, Q., Lu, Z., Xiao, L.: An accelerated proximal coordinate gradient method and its application to regularized empirical risk minimization. Technical report, 2014. no. MSR-TR-2014-94

  28. Nemirovski, A.S.: Prox-method with rate of convergence \(o(1/t)\) for variational inequalities with lipschitz continuous monotone operators and smooth convex–concave saddle point problems. SIAM J. Optim. 15, 229–251 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  29. Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  30. Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization, vol. XV. Wiley-Interscience Series in Discrete Mathematics. Wiley, New York (1983)

    Google Scholar 

  31. Nesterov, Y.E.: A method for unconstrained convex minimization problem with the rate of convergence \(O(1/k^2)\). Dokl. AN SSSR 269, 543–547 (1983)

    Google Scholar 

  32. Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)

    Book  MATH  Google Scholar 

  33. Nesterov, Y.E.: Smooth minimization of nonsmooth functions. Math. Program. 103, 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  34. Nesterov, Y.E.: Efficiency of coordinate descent methods on huge-scale optimization problems. Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, Feb 2010

  35. Nesterov, Y.E.: Gradient methods for minimizing composite objective functions. Math. Program. Ser. B 140, 125–161 (2013)

    Article  MATH  Google Scholar 

  36. Nesterov, Y.: Unconstrained convex minimization in relative scale. Math. Oper. Res. 34(1), 180–193 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  37. Schmidt, M., Roux, N.L., Bach, F.: Minimizing finite sums with the stochastic average gradient. Technical report, Sept 2013

  38. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567599 (2013)

    MathSciNet  MATH  Google Scholar 

  39. Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155, 105–145 (2016)

  40. Tseng, P.: On accelerated proximal gradient methods for convex–concave optimization. Manuscript, University of Washington, Seattle, May 2008

  41. Winston, W.L., Goldberg, J.B.: Operations Research: Applications and Algorithms, vol. 3. Duxbury Press, Belmont (2004)

    Google Scholar 

  42. Woodworth, B., Srebro, N.: Tight complexity bounds for optimizing composite objectives. arXiv preprint arXiv:1605.08003 (2016)

  43. Zhang, Y., Xiao, L.: Stochastic primal–dual coordinate method for regularized empirical risk minimization. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 353–361 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guanghui Lan.

Additional information

The author of this paper was partially supported by NSF Grant CMMI-1537414, DMS-1319050, ONR Grant N00014-13-1-0036 and NSF CAREER Award CMMI-1254446.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lan, G., Zhou, Y. An optimal randomized incremental gradient method. Math. Program. 171, 167–215 (2018). https://doi.org/10.1007/s10107-017-1173-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-017-1173-0

Keywords

Mathematics Subject Classification

Navigation