An Overview of Stochastic Approximation

Part of the International Series in Operations Research & Management Science book series (ISOR, volume 216)


This chapter provides an overview of stochastic approximation (SA) methods in the context of simulation optimization. SA is an iterative search algorithm that can be viewed as the stochastic counterpart to steepest descent in deterministic optimization. We begin with the classical methods of Robbins–Monro (RM) and Kiefer–Wolfowitz (KW). We discuss the challenges in implementing SA algorithms and present some of the most well-known variants such as Kesten’s rule, iterate averaging, varying bounds, and simultaneous perturbation stochastic approximation (SPSA), as well as recently proposed versions including scaled-and-shifted Kiefer–Wolfowitz (SSKW), robust stochastic approximation (RSA), accelerated stochastic approximation (AC-SA) for convex and strongly convex functions, and Secant-Tangents AveRaged stochastic approximation (STAR-SA). We investigate the empirical performance of several of the recent algorithms by comparing them on a set of numerical examples.


Feasible Region Stochastic Approximation Simulation Optimization Simultaneous Perturbation Stochastic Approximation Stochastic Approximation Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported in part by the National Science Foundation under Grants CMMI 0856256 and ECCS 0901543, and by the Air Force Office of Scientific Research under Grant FA9550-10-10340.


  1. 1.
    S. Andradóttir. A stochastic approximation algorithm with varying bounds. Operations Research, 43(6):1037–1048, 1995.CrossRefGoogle Scholar
  2. 2.
    J. R. Blum. Multidimensional stochastic approximation methods. The Annals of Mathematical Statistics, 25(4):737–744–200, 1954.Google Scholar
  3. 3.
    M. Broadie, D. Cicek, and A. Zeevi. An adaptive multidimensional version of the Kiefer-Wolfowitz stochastic approximation algorithm. In M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, editors, Proceedings of the Winter Simulation Conference, pages 601–612. IEEE, Piscataway, NJ, 2009.Google Scholar
  4. 4.
    M. Broadie, D. Cicek, and A. Zeevi. General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm. Operations Research, 59(5):1211–1224, 2011.CrossRefGoogle Scholar
  5. 5.
    M. Chau, M. C. Fu, and H. Qu. Multivariate stochastic approximation using a Secant-Tangents AveRaged (STAR) gradient estimator. Technical report, Working paper, University of Maryland, College Park, 2014.Google Scholar
  6. 6.
    M. Chau, H. Qu, and M. C. Fu. A new hybrid stochastic approximation algorithm. In Proceedings of the 12th International Workshop on Discrete Event Systems, 2014.Google Scholar
  7. 7.
    M. Chau, H. Qu, M. C. Fu, and I. O. Ryzhov. An empirical sensitivity analysis of the Kiefer-Wolfowitz algorithm and its variants. In R. Pasupathy, S.-H. Kim, A. Tolk, R. Hill, and M. E. Kuhl, editors, Proceedings of the 2013 Winter Simulation Conference, pages 945–965. IEEE, Piscataway, NJ, 2013.CrossRefGoogle Scholar
  8. 8.
    H. F. Chen. Convergence rate of stochastic approximation algorithms in the degenerate case. SIAM Journal on Control and Optimization, 36(1):100–114, 1998.CrossRefGoogle Scholar
  9. 9.
    H. F. Chen, T. E. Duncan, and B. Pasik-Duncan. A Kiefer-Wolfowitz algorithm with randomized differences. IEEE Transactions on Automatic Control, 44(3):442–453, 1999.CrossRefGoogle Scholar
  10. 10.
    H. F. Chen and Y. M. Zhu. Stochastic approximation procedure with randomly varying truncations. Scientia Sinica Series A, 29:914–926, 1986.Google Scholar
  11. 11.
    B. Delyon and A. B. Juditsky. Accelerated stochastic approximation. SIAM Journal on Optimization, 3(4):868–881, 1993.CrossRefGoogle Scholar
  12. 12.
    C. Derman. An application of Chung’s lemma to the Kiefer-Wolfowitz stochastic approximation procedure. The Annals of Mathematical Statistics, 27(2):532–536, 1956.CrossRefGoogle Scholar
  13. 13.
    J. Dippon and J. Renz. Weighted means in stochastic approximation of minima. SIAM Journal on Control Optimization, 35(5):1811–1827, 1997.CrossRefGoogle Scholar
  14. 14.
    V. Dupa\(\check{c}\). On the Kiefer-Wolfowitz approximation method. \(\check{C}\) asopis pro péstování Matematiky, 82(1):47–75, 1957.Google Scholar
  15. 15.
    V. Fabian. Stochastic approximation of minima with improved asymptotic speed. The Annals of Mathematical Statistics, 38(1):191–200, 1967.CrossRefGoogle Scholar
  16. 16.
    V. Fabian. Stochastic approximation. In Rustagi, editor, Optimizing Methods in Statistics, pages 439–470. Academic Press, 1997.Google Scholar
  17. 17.
    M. C. Fu and S. D. Hill. Optimization of discrete event systems via simultaneous perturbation stochastic approximation. IIE Transactions, 29(3):233–243, 1997.Google Scholar
  18. 18.
    A. P. George and W. B. Powell. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning, 65(1):167–198, 2006.CrossRefGoogle Scholar
  19. 19.
    L. Gerencsér. Convergence rates of moments in stochastic approximation with simultaneous perturbation gradient approximation and resetting. IEEE Transactions on Automatic Control, 44(5):894–905, 1998.CrossRefGoogle Scholar
  20. 20.
    S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework. SIAM Journal on Optimization, 22(4):1469–1492, 2012.CrossRefGoogle Scholar
  21. 21.
    S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization II: shrinking procedures and optimal algorithms. SIAM Journal on Optimization, 23(4):2061–2089, 2013.CrossRefGoogle Scholar
  22. 22.
    C. Kao, W. T. Song, and S. P. Chen. A modified quasi-Newton method for optimization in simulation. International Transactions in Operations Research, 4(3):223–233, 1997.CrossRefGoogle Scholar
  23. 23.
    H. Kesten. Accelerated stochastic approximation. The Annals of Mathematical Statistics, 29(1):41–59, 1958.CrossRefGoogle Scholar
  24. 24.
    K. Kiefer and J. Wolfowitz. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 23(3):462–466, 1952.CrossRefGoogle Scholar
  25. 25.
    N. L. Kleinman, J. C. Spall, and D. Q. Naiman. Simulation-based optimization with stochastic approximation using common random numbers. Management Science, 45(11):1570–1578, 1999.CrossRefGoogle Scholar
  26. 26.
    H. J. Kushner and J. Yang. Stochastic approximation with averaging of the iterates: optimal asymptotic rates of convergence for general processes. SIAM Journal on Control and Optimization, 31:1045–1062, 1993.CrossRefGoogle Scholar
  27. 27.
    H. J. Kushner and J. Yang. Stochastic approximation with averaging and feedback: rapidly convergent “on-line” algorithms. IEEE Transactions on Automatic Control, 40:24–34, 1995.CrossRefGoogle Scholar
  28. 28.
    H. J. Kushner and G. G. Yin. Stochastic Approximation and Recursive Algorithms and Applications. Springer, New York, NY, 2003.Google Scholar
  29. 29.
    J. L. Maryak. Some guidelines for using iterate averaging in stochastic approximation. Proceedings of the 36th IEEE Conference on Decision and Control, 3:2287–2290, 1997.Google Scholar
  30. 30.
    A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574–1609, 2009.CrossRefGoogle Scholar
  31. 31.
    B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838–855, 1992.CrossRefGoogle Scholar
  32. 32.
    H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22:400–407, 1951.CrossRefGoogle Scholar
  33. 33.
    D. Ruppert. A Newton-Raphson version of the multivariate Robbins-Monro procedure. The Annals of Statistics, 13(1):236–245, 1985.CrossRefGoogle Scholar
  34. 34.
    D. Ruppert. Efficient estimations from a slowly convergent Robbins-Monro process. Technical report, Cornell University Operations Research and Industrial Engineering, Ithaca, NY, February 1988.Google Scholar
  35. 35.
    J. Sacks. Asymptotic distribution of stochastic approximation procedures. The Annals of Mathematical Statistics, 29(2):351–634, 1958.CrossRefGoogle Scholar
  36. 36.
    P. Sadegh. Constrained optimization via stochastic approximation with a simultaneous perturbation gradient approximation. Automatica, 33(5):889–892, 1997.CrossRefGoogle Scholar
  37. 37.
    G. Sardis. Learning applied to successive approximation algorithms. IEEE Transactions on Systems, Science and Cynbernetics, SSC(6):97–103, 1970.Google Scholar
  38. 38.
    J. C. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control, 37(3):332–341, 1992.CrossRefGoogle Scholar
  39. 39.
    J. C. Spall. Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Transactions on Automatic Control, 45(10):1839–1853, 2000.CrossRefGoogle Scholar
  40. 40.
    J. C. Spall. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. John Wiley, Hoboken, NJ, 2003.CrossRefGoogle Scholar
  41. 41.
    A. B. Tsybakov and B. T. Polyak. Optimal order of accuracy of search algorithms in stochastic optimization. Problemy Peredachi Informatsii, 26(2):45–63, 1990.Google Scholar
  42. 42.
    J. H. Venter. An extension of the Robbins-Monro procedure. The Annals of Mathematical Statistics, 38(1):181–190, 1967.CrossRefGoogle Scholar
  43. 43.
    I. J. Wang and E. K. P. Chong. A deterministic analysis of stochastic approximation with randomized differences. IEEE Transactions on Automatic Control, 43(12):1745–1749, 1998.CrossRefGoogle Scholar
  44. 44.
    Q. Wang. Optimization with Discrete Simultaneous Perturbation Stochastic Approximation Using Noisy Loss Function Measurements. PhD thesis, Johns Hopkins University, 2013.Google Scholar
  45. 45.
    S. Yakowitz, P. L’Ecuyer, and F. Vazquez-Abad. Global stochastic optimization with low-dispersion point sets. Operations Research, 48(6):939–950, 2000.CrossRefGoogle Scholar
  46. 46.
    G. Yin and Y. M. Zhu. Almost sure convergence of stochastic approximation algorithms with nonadditive noise. International Journal of Control, 49(4):1361–1376, 1989.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.University of MarylandCollege ParkUSA

Personalised recommendations