Abstract
With the advent of Kearns & Singh’s (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound. We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo matrix inversion (MCMI) and temporal difference (TD) estimation methods for policy evaluation. We use these bounds to confirm generally held notions of the superior accuracy of the model-based estimation methods of ML and MCMI over the model-free method of TD. With our error bounds, we are also able to specify parameters and conditions that affect each method’s estimation accuracy.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sutton, R.S.: Learning to predict by the method of Temporal Differences. Machine Learning 3, 9–44 (1988)
Singh, S.P., Sutton, R.S.: Reinforcement Learning with Replacing Eligibility Traces. Machine Learning 22, 123–158 (1996)
Barto, A.G., Duff, M.: Monte Carlo matrix inversion and reinforcement learning. In: NIPS: Proceedings of the 1994 Conference, pp. 687–694. Morgan Kaufmann, San Francisco (1994)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, Massachusetts (1998)
Lu, F., Patrascu, R., Schuurmans, D.: Investigating the Maximum Likelihood alternative to TD(λ). In: Proceedings of the 19th ICML, pp. 403–410. Morgan Kaufmann, San Francisco (2002)
Lu, F., Schuurmans, D.: Monte Carlo Matrix Inversion Policy Evaluation. In: UAI: Proceedings of the 19th Conference, pp. 386–393. Morgan Kaufmann, San Francisco (2003)
Kearns, M., Singh, S.: Bias-variance error bounds for temporal difference updates. In: Proceedings of the 13th Annual Conference on Computational Learning Theory, pp. 142–147 (2000)
Forsythe, G.E., Leibler, R.A.: Matrix inversion by a Monte Carlo Method. MTAC 4, 127–129 (1950)
Kearns, M., Singh, S.: Finite-sample convergence rates for q-learning and indirect algorithms. In: NIPS: Proceedings of the 1998 Conference, pp. 996–1002 (1998)
Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (1989)
Singh, S.P., Dayan, P.: Analytical Mean Squared Error Curves for Temporal Difference Learning. Machine Learning 32, 5–40 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, F. (2005). Error Bounds in Reinforcement Learning Policy Evaluation. In: Kégl, B., Lapalme, G. (eds) Advances in Artificial Intelligence. Canadian AI 2005. Lecture Notes in Computer Science(), vol 3501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424918_48
Download citation
DOI: https://doi.org/10.1007/11424918_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25864-3
Online ISBN: 978-3-540-31952-8
eBook Packages: Computer ScienceComputer Science (R0)