Advertisement

On the linear convergence of the stochastic gradient method with constant step-size

  • Volkan Cevher
  • Bằng Công VũEmail author
Original Paper
  • 161 Downloads

Abstract

The strong growth condition (SGC) is known to be a sufficient condition for linear convergence of the stochastic gradient method using a constant step-size \(\gamma \) (SGM-CS). In this paper, we provide a necessary condition, for the linear convergence of SGM-CS, that is weaker than SGC. Moreover, when this necessary is violated up to a additive perturbation \(\sigma \), we show that both the projected stochastic gradient method using a constant step-size, under the restricted strong convexity assumption, and the proximal stochastic gradient method, under the strong convexity assumption, exhibit linear convergence to a noise dominated region, whose distance to the optimal solution is proportional to \({\gamma {\sigma }}\).

Keywords

Stochastic gradient Linear convergence Strong growth condition 

Notes

Acknowledgements

The authors would like to thank Yen-Huan-Li, Ahmet Alacaoglu for useful discussions. We thank the referees for their suggestions and correction which helped to improve the first version of the manuscript. The work of B. Cong Vu and V. Cevher was supported by European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 725594 - time-data).

References

  1. 1.
    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)CrossRefGoogle Scholar
  2. 2.
    Combettes, P.L., Pesquet, J.-C.: Stochastic approximations and perturbations in forward-backward splitting for monotone operators. Pure Appl. Funct. Anal. 1, 13–37 (2016)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Duchi, J.C., Singer, Y.: Efficient online and batch learning using forward backward splitting. J. Mach. Learn. Res. 10, 2899–2934 (2009)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Gürbüzbalaban, M., Ozdaglar, A., Parrilo, P.: A globally convergent incremental Newton method. Math. Program. 151, 283–313 (2015)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for strongly-convex optimization with finite training sets. Adv. Neural Inf. Process. Syst. 25, 2663–2671 (2012)Google Scholar
  6. 6.
    Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. In: Uryasev, S., Pardalos, P. (eds.) Stochastic Optimization: Algorithms and Applications, pp. 263–304. Springer, New York (2000)Google Scholar
  7. 7.
    Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. Math. Program. 155, 549–573 (2016)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2008)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Rosasco, L., Villa, S., Vũ, B.C.: Stochastic forward-backward splitting for monotone inclusions. J. Optim. Theory Appl. 169, 388–406 (2016)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Ryu, E., Boyd, S.: Stochastic proximal iteration: a non-asymptotic improvement upon stochastic gradient descent. http://web.stanford.edu/~eryu/ (2016)
  11. 11.
    Schmidt, M., Le Roux, N.: Fast convergence of stochastic Gradient descent under a strong growth condition (2013). https://arxiv.org/pdf/1308.6370.pdf
  12. 12.
    Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes, ICML (2013)Google Scholar
  13. 13.
    Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 11, 23–35 (1998)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Tseng, P.: An incremental gradient (-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8, 506–531 (1998)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24, 2057–2075 (2014)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Zhang, H., Cheng, L.: Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization. Optim. Lett. 9, 961–979 (2015)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Laboratory for Information and Inference Systems (LIONS)EPFLLausanneSwitzerland

Personalised recommendations