Abstract
We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamic programming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas.
Article PDF
Similar content being viewed by others
References
Benveniste, A., Metivier, M., & Priouret, P. (1990). Adaptive algorithms and stochastic approximations, New York: Springer-Verlag.
Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
Bickel, P. J., & Doksum, K. A. (2001). Mathematical statistics—Basic ideas and selected topics volume 1. Upper Saddle River, NJ: Prentice Hall.
Blum, J. (1954a). Approximation methods which converge with probability one. Annals of Mathematical Statistics, 25, 382–386.
Blum, J. (1954b). Multidimensional stochastic approximation methods. Annals of Mathematical Statistics, 25, 737–744.
Bouzaiene-Ayari, B. (1998). Private communication.
Bouzeghoub, M., Ellacott, S., Easdown, A., & Brown, M. (2000). On the identification of non-stationary linear processes. International Journal of Systems Science, 31(3), 273–286.
Brossier, J.-M. (1992). Egalization adaptive er estimateion de phase: Application aux Communications Sous- Marines, PhD thesis, Institut National Polytechnique de Grenoble.
Brown, R. (1959). Statistical forecasting for inventory control. New York: McGraw-Hill.
Brown, R. (1963). Smoothing, forecasting and prediction of discrete time series, Englewood Cliffs, N.J.: Prentice-Hall.
Chow, W. M. (1965). Adaptive control of the exponential smoothing constant. The Journal of Industrial Engineering.
Darken, C., & Moody, J. (1991). Note on learning rate schedules for stochastic optimization. In Lippmann, Moody and Touretzky, (Eds.), ‘Advances in neural information processing systems 3’ (pp. 1009–1016).
DeFreitas, J., Niranjan, M., & Gee, A. (1998). Hierarchical bayesian Kalman models for regularisation and ard in sequential learning, Technical report, Cambridge University, Department of Engineering.
Douglas, S., & Cichocki, A. (1998). Adaptive step size techniques for decorrelation and blind source separation. In Proc. 32nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, Vol. 2, (pp. 1191–1195).
Douglas, S., & Mathews, V. (1995). Stochastic gradient adaptive step size algorithms for adaptive filtering. In Proc. International Conference on Digital Signal Processing, Limassol, Cyprus, Vol. 1, (pp. 142–147).
Even-Dar, E., & Mansour, Y. (2004). Learning rates for q-learning. Journal of Machine Learning Research, 5, 1–25.
Fabian, V. (1960). Stochastic approximation methods. Czechoslovak Mathematical Journal, 10, 123–159.
Gaivoronski, A. (1988). Stochastic quasigradient methods and their implementation. In Y. Ermoliev and R. Wets (Eds.) Numerical techniques for stochastic optimization, Berlin: Springer-Verlag.
Gardner, E. S. (1983). Automatic monitoring of forecast errors. Journal of Forecasting, 2, 1–21.
Gardner, E. S. (1985). Exponential smoothing: The state of the art. Journal of Forecasting, 4, 1–28.
Godfrey, G. (1996). Private communication.
Harris, R., Chabries, D. M., & Bishop, F. A. (1986). A variable step (vs) adaptive filter algorithm. IEEE Trans. Acoust., Speech, Signal Processing, ASSP-34, 309–316.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. New York, NY: Springer series in Statistics.
Holt, C., Modigliani, F., Muth, J., & Simon, H. (1960). Planning, production, inventories and work force. Englewood Cliffs, NJ: Prentice-Hall.
Jacobs, R. A. (1988). Increased rate of convergence through learning rate adaptation. Neural Networks, 1, 295 – 307.
Jazwinski, A. (1969). Adaptive filtering. Automatica, 5, 475–485.
Karni, S., & Zeng, G. (1989). A new convergence factor for adaptive filters. IEEE Trans. Circuits Syst., 36, 1011–1012.
Kesten, H. (1958). Accelerated stochastic approximation. The Annals of Mathematical Statistics, 29(4), 41–59.
Kiefer, J., & Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression function. Annals Math. Stat., 23, 462–466.
Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1–63.
Kmenta, J. (1997). Elements of econometrics, second edn. Ann Arbor, Michigan: University of Michigan Press.
Kushner, H., & Yang, J. (1995). Analysis of adaptive step-size sa algorithms for parameter tracking. IEEE Trans. Automat. Control, 40, 1403–1410.
Kushner, H. J., & Yin, G. G. (1997). Stochastic approximation algorithms and applications. New York: Springer-Verlag.
Kwong, C. (1986). Dual sign algorithm for adaptive filtering. IEEE Trans. Commun., COM-34, 1272–1275.
Mathews, V. J., & Xie, Z. (1993). A stochastic gradient adaptive filter with gradient adaptive step size. IEEE Transactions on Signal Processing, 41, 2075–2087.
Mikhael, W., Wu, F., Kazovsky, L., Kang, G., & Fransen, L. (1986). Adaptive filters with individual adaptation of parameters. IEEE Trans. Circuits Syst., CAS-33, 677–685.
Mirozahmedov, F., & Uryasev, S. P. (1983). Adaptive stepsize regulation for stochastic optimization algorithm. Zurnal vicisl. mat. i. mat. fiz., 23(6), 1314–1325.
Penny, W., & Roberts, S. (1998). Dynamic linear models, recursive least squares and steepest descent learning, Technical report, Imperial College, London, Department of Electrical Engineering.
Pflug, G. C. (1988). Stepsize rules, stopping times and their implementation in stochastic quasi-gradient algorithms, In numerical techniques for stochastic optimization: (pp. 353–372) Springer-Verlag.
Precup, D., & Sutton, R. (1997). Exponentiated gradient methods for reinforcement learning. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97), (pp. 272–277) Morgan Kaufmann.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Math. Stat., 22, 400–407.
Ruszczyński, A., & Syski, W. (1986). A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems. Mathematical Programming Study, 28, 113–131.
Saridis, G. (1970). Learning applied to successive approximation algorithms. IEEE Transactions on Systems, Science and Cynbernetics, SSC-6, 97–103.
Schraudolph, N. N. (1999). Local gain adaptation in stochastic gradient descent. In Proceedings of the Ninth International Conference on Artificial Neural Networks. Edinburgh, London.
Shan, T., & Kailath, T. (1988). Adaptive algorithms with an automatic gain control feature. IEEE Trans. Circuits Systems, CAS-35, 122–127.
Spall, J. C. (2003). Introduction to stochastic search and optimization: estimation, simulation and control, Inc., Hoboken, NJ: John Wiley and Sons.
Stengel, R. (1994). Optimal control and estimation, New York: Dover Publications, NY.
Sutton, R. S. (1992). Gain adaptation beats least squares?. Proceedings of the Seventh Yale Workshop on Adaptive and Learning Systems (pp. 161–1666).
Trigg, D. (1964). Monitoring a forecasting system. Operations Research Quarterly, 15(3), 271–274.
Trigg, D., & Leach, A. (1967). Exponential smoothing with an adaptive response rate. Operations Research Quarterly, 18(1), 53–59.
Wasan, M. (1969). Stochastic approximations. In J. T. J.F.C. Kingman, F. Smithies & T. Wall (Eds.), ‘Cambridge transactions in math. and math. phys. 58’. Cambridge: Cambridge University Press.
Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6, 324–342.
Young, P. (1984). Recursive estimation and time-series analysis. Berlin, Heidelberg: Springer-Verlag.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Prasad Tadepalli
Rights and permissions
About this article
Cite this article
George, A.P., Powell, W.B. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach Learn 65, 167–198 (2006). https://doi.org/10.1007/s10994-006-8365-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-8365-9