Stochastic Approximation Methods and Their Finite-Time Convergence Properties

  • Saeed Ghadimi
  • Guanghui Lan
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 216)


This chapter surveys some recent advances in the design and analysis of two classes of stochastic approximation methods: stochastic first- and zeroth-order methods for stochastic optimization. We focus on the finite-time convergence properties (i.e., iteration complexity) of these algorithms by providing bounds on the number of iterations required to achieve a certain accuracy. We point out that many of these complexity bounds are theoretically optimal for solving different classes of stochastic optimization problems.



This work was supported in part by the National Science Foundation under Grants CMMI-1000347, CMMI-1254446, and DMS-1319050, and by the Office of Naval Research under Grant N00014-13-1-0036.


  1. 1.
    A. Benveniste, M. Métivier, and P. Priouret. Algorithmes adaptatifs et approximations stochastiques. Masson, 1987. English translation: Adaptive Algorithms and Stochastic Approximations, Springer Verlag (1993).Google Scholar
  2. 2.
    C. Cartis, N. I. M. Gould, and P. L. Toint. On the oracle complexity of first-order and derivative-free algorithms for smooth nonconvex minimization. SIAM Journal on Optimization, 22:66–86, 2012.CrossRefGoogle Scholar
  3. 3.
    K. Chung. On a stochastic approximation method. Annals of Mathematical Statistics, pages 463–483, 1954.Google Scholar
  4. 4.
    A. R. Conn, K. Scheinberg, and L. N. Vicente. Introduction to Derivative-Free Optimization. SIAM, Philadelphia, 2009.CrossRefGoogle Scholar
  5. 5.
    Y. Ermoliev. Stochastic quasigradient methods and their application to system optimization. Stochastics, 9:1–36, 1983.CrossRefGoogle Scholar
  6. 6.
    A. Gaivoronski. Nonstationary stochastic programming problems. Kybernetika, 4:89–92, 1978.Google Scholar
  7. 7.
    R. Garmanjani and L. N. Vicente. Smoothing and worst-case complexity for direct-search methods in nonsmooth optimization. IMA Journal of Numerical Analysis, 33:1008–1028, 2013.CrossRefGoogle Scholar
  8. 8.
    S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM Journal on Optimization, 22:1469–1492, 2012.CrossRefGoogle Scholar
  9. 9.
    S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM Journal on Optimization, 23:2061–2089, 2013.CrossRefGoogle Scholar
  10. 10.
    S. Ghadimi and G. Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23:2341–2368, 2013.CrossRefGoogle Scholar
  11. 11.
    S. Ghadimi and G. Lan. Accelerated gradient methods for nonconvex nonlinear and stochastic optimization. Technical report, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, June 2013.Google Scholar
  12. 12.
    S. Ghadimi, G. Lan, and H. Zhang. Mini-batch stochastic approximation methods for constrained nonconvex stochastic programming. Manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, August 2013.Google Scholar
  13. 13.
    A. Juditsky, A. Nazin, A. B. Tsybakov, and N. Vayatis. Recursive aggregation of estimators via the mirror descent algorithm with average. Problems of Information Transmission, 41:n.4, 2005.Google Scholar
  14. 14.
    A. Juditsky, P. Rigollet, and A. B. Tsybakov. Learning by mirror averaging. Annals of Statistics, 36:2183–2206, 2008.CrossRefGoogle Scholar
  15. 15.
    J. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. Annals of Mathematical Statistics, 23:462–466, 1952.CrossRefGoogle Scholar
  16. 16.
    A. J. Kleywegt, A. Shapiro, and T. Homem-de-Mello. The sample average approximation method for stochastic discrete optimization. SIAM Journal on Optimization, 12:479–502, 2001.CrossRefGoogle Scholar
  17. 17.
    H. J. Kushner and G. Yin. Stochastic Approximation and Recursive Algorithms and Applications, volume 35 of Applications of Mathematics. Springer-Verlag, New York, 2003.Google Scholar
  18. 18.
    G. Lan. An optimal method for stochastic composite optimization. Mathematical Programming, 133(1):365–397, 2012.CrossRefGoogle Scholar
  19. 19.
    G. Lan, A. S. Nemirovski, and A. Shapiro. Validation analysis of mirror descent stochastic approximation method. Mathematical Programming, 134(2):425–458, 2012.CrossRefGoogle Scholar
  20. 20.
    A. S. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19:1574–1609, 2009.CrossRefGoogle Scholar
  21. 21.
    A. S. Nemirovski and D. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley, XV, 1983.Google Scholar
  22. 22.
    Y. E. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120:221–259, 2006.CrossRefGoogle Scholar
  23. 23.
    Y. E. Nesterov. Random gradient-free minimization of convex functions. Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, January 2010.Google Scholar
  24. 24.
    G. Pflug. Optimization of stochastic models. In The Interface Between Simulation and Optimization. Kluwer, Boston, 1996.Google Scholar
  25. 25.
    B. Polyak. New stochastic approximation type procedures. Automat. i Telemekh., 7:98–107, 1990.Google Scholar
  26. 26.
    B. Polyak and A. Juditsky. Acceleration of stochastic approximation by averaging. SIAM J. Control and Optimization, 30:838–855, 1992.CrossRefGoogle Scholar
  27. 27.
    H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407, 1951.CrossRefGoogle Scholar
  28. 28.
    A. Ruszczyński and W. Sysk. A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems. Mathematical Programming Study, 28:113–131, 1986.CrossRefGoogle Scholar
  29. 29.
    J. Sacks. Asymptotic distribution of stochastic approximation. Annals of Mathematical Statistics, 29:373–409, 1958.CrossRefGoogle Scholar
  30. 30.
    A. Shapiro. Monte Carlo sampling methods. In A. Ruszczyński and A. Shapiro, editors, Stochastic Programming. North-Holland Publishing Company, Amsterdam, 2003.Google Scholar
  31. 31.
    A. Shapiro. Sample average approximation. In S. I. Gass and M. C. Fu, editors, Encyclopedia of Operations Research and Management Science, pages 1350–1355. Springer, 3rd edition, 2013.Google Scholar
  32. 32.
    A. Shapiro, D. Dentcheva, and A. Ruszczyński. Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia, 2009.CrossRefGoogle Scholar
  33. 33.
    J. Spall. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. John Wiley, Hoboken, NJ, 2003.CrossRefGoogle Scholar
  34. 34.
    V. Strassen. The existence of probability measures with given marginals. Annals of Mathematical Statistics, 38:423–439, 1965.CrossRefGoogle Scholar
  35. 35.
    L. N. Vicente. Worst case complexity of direct search. EURO Journal on Computational Optimization, 1:143–153, 2013.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.University of FloridaGainesvilleUSA

Personalised recommendations