Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

George, Abraham P.; Powell, Warren B.

doi:10.1007/s10994-006-8365-9

Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

Published: 17 May 2006

Volume 65, pages 167–198, (2006)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

Download PDF

Abraham P. George¹ &
Warren B. Powell¹

2406 Accesses
106 Citations
Explore all metrics

Abstract

We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamic programming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas.

Article PDF

Revisiting the ODE Method for Recursive Algorithms: Fast Convergence Using Quasi Stochastic Approximation

Article 26 October 2021

A pruned recursive solution to the multiple change point problem

Article 03 August 2017

Dynamic Programming

References

Benveniste, A., Metivier, M., & Priouret, P. (1990). Adaptive algorithms and stochastic approximations, New York: Springer-Verlag.
MATH Google Scholar
Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
MATH Google Scholar
Bickel, P. J., & Doksum, K. A. (2001). Mathematical statistics—Basic ideas and selected topics volume 1. Upper Saddle River, NJ: Prentice Hall.
Google Scholar
Blum, J. (1954a). Approximation methods which converge with probability one. Annals of Mathematical Statistics, 25, 382–386.
MathSciNet MATH Google Scholar
Blum, J. (1954b). Multidimensional stochastic approximation methods. Annals of Mathematical Statistics, 25, 737–744.
MathSciNet MATH Google Scholar
Bouzaiene-Ayari, B. (1998). Private communication.
Bouzeghoub, M., Ellacott, S., Easdown, A., & Brown, M. (2000). On the identification of non-stationary linear processes. International Journal of Systems Science, 31(3), 273–286.
Google Scholar
Brossier, J.-M. (1992). Egalization adaptive er estimateion de phase: Application aux Communications Sous- Marines, PhD thesis, Institut National Polytechnique de Grenoble.
Brown, R. (1959). Statistical forecasting for inventory control. New York: McGraw-Hill.
MATH Google Scholar
Brown, R. (1963). Smoothing, forecasting and prediction of discrete time series, Englewood Cliffs, N.J.: Prentice-Hall.
MATH Google Scholar
Chow, W. M. (1965). Adaptive control of the exponential smoothing constant. The Journal of Industrial Engineering.
Darken, C., & Moody, J. (1991). Note on learning rate schedules for stochastic optimization. In Lippmann, Moody and Touretzky, (Eds.), ‘Advances in neural information processing systems 3’ (pp. 1009–1016).
DeFreitas, J., Niranjan, M., & Gee, A. (1998). Hierarchical bayesian Kalman models for regularisation and ard in sequential learning, Technical report, Cambridge University, Department of Engineering.
Douglas, S., & Cichocki, A. (1998). Adaptive step size techniques for decorrelation and blind source separation. In Proc. 32nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, Vol. 2, (pp. 1191–1195).
Douglas, S., & Mathews, V. (1995). Stochastic gradient adaptive step size algorithms for adaptive filtering. In Proc. International Conference on Digital Signal Processing, Limassol, Cyprus, Vol. 1, (pp. 142–147).
Even-Dar, E., & Mansour, Y. (2004). Learning rates for q-learning. Journal of Machine Learning Research, 5, 1–25.
MathSciNet Google Scholar
Fabian, V. (1960). Stochastic approximation methods. Czechoslovak Mathematical Journal, 10, 123–159.
MATH MathSciNet Google Scholar
Gaivoronski, A. (1988). Stochastic quasigradient methods and their implementation. In Y. Ermoliev and R. Wets (Eds.) Numerical techniques for stochastic optimization, Berlin: Springer-Verlag.
Gardner, E. S. (1983). Automatic monitoring of forecast errors. Journal of Forecasting, 2, 1–21.
Google Scholar
Gardner, E. S. (1985). Exponential smoothing: The state of the art. Journal of Forecasting, 4, 1–28.
Google Scholar
Godfrey, G. (1996). Private communication.
Harris, R., Chabries, D. M., & Bishop, F. A. (1986). A variable step (vs) adaptive filter algorithm. IEEE Trans. Acoust., Speech, Signal Processing, ASSP-34, 309–316.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. New York, NY: Springer series in Statistics.
MATH Google Scholar
Holt, C., Modigliani, F., Muth, J., & Simon, H. (1960). Planning, production, inventories and work force. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Jacobs, R. A. (1988). Increased rate of convergence through learning rate adaptation. Neural Networks, 1, 295 – 307.
Article Google Scholar
Jazwinski, A. (1969). Adaptive filtering. Automatica, 5, 475–485.
Article MATH Google Scholar
Karni, S., & Zeng, G. (1989). A new convergence factor for adaptive filters. IEEE Trans. Circuits Syst., 36, 1011–1012.
Article Google Scholar
Kesten, H. (1958). Accelerated stochastic approximation. The Annals of Mathematical Statistics, 29(4), 41–59.
MATH MathSciNet Google Scholar
Kiefer, J., & Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression function. Annals Math. Stat., 23, 462–466.
MATH MathSciNet Google Scholar
Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1–63.
Article MATH MathSciNet Google Scholar
Kmenta, J. (1997). Elements of econometrics, second edn. Ann Arbor, Michigan: University of Michigan Press.
MATH Google Scholar
Kushner, H., & Yang, J. (1995). Analysis of adaptive step-size sa algorithms for parameter tracking. IEEE Trans. Automat. Control, 40, 1403–1410.
Article MATH MathSciNet Google Scholar
Kushner, H. J., & Yin, G. G. (1997). Stochastic approximation algorithms and applications. New York: Springer-Verlag.
MATH Google Scholar
Kwong, C. (1986). Dual sign algorithm for adaptive filtering. IEEE Trans. Commun., COM-34, 1272–1275.
Article Google Scholar
Mathews, V. J., & Xie, Z. (1993). A stochastic gradient adaptive filter with gradient adaptive step size. IEEE Transactions on Signal Processing, 41, 2075–2087.
Google Scholar
Mikhael, W., Wu, F., Kazovsky, L., Kang, G., & Fransen, L. (1986). Adaptive filters with individual adaptation of parameters. IEEE Trans. Circuits Syst., CAS-33, 677–685.
Article Google Scholar
Mirozahmedov, F., & Uryasev, S. P. (1983). Adaptive stepsize regulation for stochastic optimization algorithm. Zurnal vicisl. mat. i. mat. fiz., 23(6), 1314–1325.
Google Scholar
Penny, W., & Roberts, S. (1998). Dynamic linear models, recursive least squares and steepest descent learning, Technical report, Imperial College, London, Department of Electrical Engineering.
Pflug, G. C. (1988). Stepsize rules, stopping times and their implementation in stochastic quasi-gradient algorithms, In numerical techniques for stochastic optimization: (pp. 353–372) Springer-Verlag.
Precup, D., & Sutton, R. (1997). Exponentiated gradient methods for reinforcement learning. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97), (pp. 272–277) Morgan Kaufmann.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Math. Stat., 22, 400–407.
MATH MathSciNet Google Scholar
Ruszczyński, A., & Syski, W. (1986). A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems. Mathematical Programming Study, 28, 113–131.
MATH Google Scholar
Saridis, G. (1970). Learning applied to successive approximation algorithms. IEEE Transactions on Systems, Science and Cynbernetics, SSC-6, 97–103.
Article Google Scholar
Schraudolph, N. N. (1999). Local gain adaptation in stochastic gradient descent. In Proceedings of the Ninth International Conference on Artificial Neural Networks. Edinburgh, London.
Shan, T., & Kailath, T. (1988). Adaptive algorithms with an automatic gain control feature. IEEE Trans. Circuits Systems, CAS-35, 122–127.
Article Google Scholar
Spall, J. C. (2003). Introduction to stochastic search and optimization: estimation, simulation and control, Inc., Hoboken, NJ: John Wiley and Sons.
Book MATH Google Scholar
Stengel, R. (1994). Optimal control and estimation, New York: Dover Publications, NY.
MATH Google Scholar
Sutton, R. S. (1992). Gain adaptation beats least squares?. Proceedings of the Seventh Yale Workshop on Adaptive and Learning Systems (pp. 161–1666).
Trigg, D. (1964). Monitoring a forecasting system. Operations Research Quarterly, 15(3), 271–274.
Google Scholar
Trigg, D., & Leach, A. (1967). Exponential smoothing with an adaptive response rate. Operations Research Quarterly, 18(1), 53–59.
Article Google Scholar
Wasan, M. (1969). Stochastic approximations. In J. T. J.F.C. Kingman, F. Smithies & T. Wall (Eds.), ‘Cambridge transactions in math. and math. phys. 58’. Cambridge: Cambridge University Press.
Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6, 324–342.
Article MATH MathSciNet Google Scholar
Young, P. (1984). Recursive estimation and time-series analysis. Berlin, Heidelberg: Springer-Verlag.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, 08544
Abraham P. George & Warren B. Powell

Authors

Abraham P. George
View author publications
You can also search for this author in PubMed Google Scholar
Warren B. Powell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abraham P. George.

Additional information

Editor: Prasad Tadepalli

Rights and permissions

Reprints and permissions

About this article

Cite this article

George, A.P., Powell, W.B. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach Learn 65, 167–198 (2006). https://doi.org/10.1007/s10994-006-8365-9

Download citation

Received: 23 July 2004
Revised: 08 March 2006
Accepted: 08 March 2006
Published: 17 May 2006
Issue Date: October 2006
DOI: https://doi.org/10.1007/s10994-006-8365-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

Abstract

Article PDF

Similar content being viewed by others

Revisiting the ODE Method for Recursive Algorithms: Fast Convergence Using Quasi Stochastic Approximation

A pruned recursive solution to the multiple change point problem

Dynamic Programming

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

Abstract

Article PDF

Similar content being viewed by others

Revisiting the ODE Method for Recursive Algorithms: Fast Convergence Using Quasi Stochastic Approximation

A pruned recursive solution to the multiple change point problem

Dynamic Programming

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation