Linear convergence of cyclic SAGA

Abstract

In this work, we present and analyze C-SAGA, a (deterministic) cyclic variant of SAGA. C-SAGA is an incremental gradient method that minimizes a sum of differentiable convex functions by cyclically accessing their gradients. Even though the theory of stochastic algorithms is more mature than that of cyclic counterparts in general, practitioners often prefer cyclic algorithms. We prove C-SAGA converges linearly under the standard assumptions. Then, we compare the rate of convergence with the full gradient method, (stochastic) SAGA, and incremental aggregated gradient (IAG), theoretically and experimentally.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. 1.

    Agarwal, A., Bartlett, P.L., Ravikumar, P., Wainwright, M.J.: Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Trans. Inf. Theory 58(5), 3235–3249 (2012)

    MathSciNet  MATH  Article  Google Scholar 

  2. 2.

    Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. 129(2), 163–195 (2011)

    MathSciNet  MATH  Article  Google Scholar 

  3. 3.

    Bertsekas, D.P.: Incremental aggregated proximal and augmented Lagrangian algorithms. Comput. Sci. Syst. Control. arXiv preprint arXiv:1509.09257 (2015)

  4. 4.

    Blatt, D., Hero, A.O., Gauchman, H.: A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18(1), 29–51 (2007)

    MathSciNet  MATH  Article  Google Scholar 

  5. 5.

    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  6. 6.

    Defazio, A.: A simple practical accelerated method for finite sums. In: NIPS, pp. 676–684 (2016)

  7. 7.

    Defazio, A., Bach, F., Lacoste-Julien, A.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: NIPS, pp. 1646–1654 (2014)

  8. 8.

    Defazio, A., Domke, J., Caetano, T.S.: Finito: a faster, permutable incremental gradient method for big data problems. ICML 32, 1125–1133 (2014)

    Google Scholar 

  9. 9.

    Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.A.: On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 27(2), 1035–1048 (2017)

    MathSciNet  MATH  Article  Google Scholar 

  10. 10.

    Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013)

  11. 11.

    Kushner, H.J., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications, 2nd edn. Springer, New York (2003)

    MATH  Google Scholar 

  12. 12.

    Lan, G., Zhou, Y.: An optimal randomized incremental gradient method. Math. Program. 171(1), 167–215 (2018)

    MathSciNet  MATH  Article  Google Scholar 

  13. 13.

    Le Roux, N., Schmidt, M., Bach, F.: Stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS (2012)

  14. 14.

    Mairal, J.: Optimization with first-order surrogate functions. In: ICML, pp. 783–791 (2013)

  15. 15.

    Mairal, J.: Incremental majorization–minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015)

    MathSciNet  MATH  Article  Google Scholar 

  16. 16.

    Mokhtari, A., Gurbuzbalaban, M., Ribeiro, A.: Surpassing gradient descent provably: Acyclic incremental method with linear convergence rate. SIAM J. Optim. 28(2), 1420–1447 (2018)

    MathSciNet  MATH  Article  Google Scholar 

  17. 17.

    Nemirovski, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)

    Google Scholar 

  18. 18.

    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, New York (2004)

    MATH  Book  Google Scholar 

  19. 19.

    Nitanda, A.: Stochastic proximal gradient descent with acceleration techniques. In: NIPS, pp. 1574–1582 (2014)

  20. 20.

    Polyak, B.T.: Introduction to Optimization. Optimization Software Inc, New York (1987)

    MATH  Google Scholar 

  21. 21.

    Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    MathSciNet  MATH  Article  Google Scholar 

  22. 22.

    Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)

    MathSciNet  MATH  Article  Google Scholar 

  23. 23.

    Shalev-Shwartz, S.: SDCA without duality, regularization and individual convexity. In: ICML, pp. 747–754 (2016)

  24. 24.

    Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14, 567–599 (2013)

    MathSciNet  MATH  Google Scholar 

  25. 25.

    Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155(1), 105–145 (2016)

    MathSciNet  MATH  Article  Google Scholar 

  26. 26.

    Shor, N.Z.: Notes of Scientific Seminar on Theory and Applications of Cybernetics and Operations Research, pp. 9–17. Ukrainian Academy of Sciences, Kiev (1962)

    Google Scholar 

  27. 27.

    Shor, N.Z., Kiwiel, K.C., Ruszcayński, A.: Minimization Methods for Non-differentiable Functions. Springer, New York (1985)

    Book  Google Scholar 

  28. 28.

    Tseng, P., Yun, S.: Incrementally updated gradient methods for constrained and regularized optimization. J. Optim. Theory Appl. 160(3), 832–853 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  29. 29.

    Vanli, N.D., Gürbüzbalaban, M., Ozdaglar, A.: A simple proof for the iteration complexity of the proximal gradient algorithm. NIPS Workshop on Optimization for Machine Learning (2016)

  30. 30.

    Vanli, N.D., Gürbüzbalaban, M., Ozdaglar, A.: A stronger convergence result on the proximal incremental aggregated gradient method. arXiv (2016)

  31. 31.

    Vanli, N.D., Gürbüzbalaban, M., Ozdaglar, A.: Global convergence rate of proximal incremental aggregated gradient methods. SIAM J. Optim. 28(2), 1282–1300 (2018)

    MathSciNet  MATH  Article  Google Scholar 

  32. 32.

    Wang, M., Bertsekas, D.P.: Incremental constraint projection methods for variational inequalities. Math. Program. 150(2), 321–363 (2015)

    MathSciNet  MATH  Article  Google Scholar 

  33. 33.

    Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  34. 34.

    Ying B., Yuan, K., Sayed A.H.: Variance-reduced stochastic learning under random reshuffling. arXiv preprint arXiv:1708.01383 (2017)

  35. 35.

    Zhang, H., Dai, Y.H., Guo, L., Peng, W.: Proximal-like incremental aggregated gradient method with linear convergence under Bregman distance growth conditions. arXiv preprint arXiv:1711.01136 (2017)

  36. 36.

    Zhang, L., Mahdavi, M., Jin, R.: Linear convergence with condition number independent access of full gradients. In: NIPS, pp. 980–988 (2013)

Download references

Acknowledgements

Ernest Ryu was supported in part by NSF Grant DMS-1720237 and ONR Grant N000141712162.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Youngsuk Park.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Park, Y., Ryu, E.K. Linear convergence of cyclic SAGA. Optim Lett 14, 1583–1598 (2020). https://doi.org/10.1007/s11590-019-01520-y

Download citation

Keywords

  • Cyclic updates
  • SAGA
  • IAG
  • Incremental methods
  • Just-in-time update
  • Linear convergence