Abstract
In the previous chapter (Chap. 4), we introduced Monte Carlo (MC) methods for reinforcement learning, which allow an agent to learn from its own experience without relying on a model of the environment. However, MC methods are limited to episodic reinforcement learning problems, where the agent interacts with the environment in discrete episodes. In this chapter, we introduce temporal difference (TD) learning, a class of algorithms that generalize MC methods to support both episodic and continuing reinforcement learning problems. TD learning is based on the idea of updating value estimates using a combination of current rewards and estimated future rewards, making it a more versatile and efficient approach than MC methods for many types of reinforcement learning problems. We’ll explore the theory and implementation of various TD learning algorithms in this chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3(1):9–44, Aug 1988.
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018.
Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279–292, May 1992.
Hado Hasselt. Double q-learning. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010.
Dimitri P Bertsekas. Reinforcement learning and optimal control / by Dimitri P. Bertsekas. Athena Scientific optimization and computation series. Athena Scientific, Belmont, Massachusetts, 2019.
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improvements in deep reinforcement learning, 2017.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature
About this chapter
Cite this chapter
Hu, M. (2023). Temporal Difference Learning. In: The Art of Reinforcement Learning. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-9606-6_5
Download citation
DOI: https://doi.org/10.1007/978-1-4842-9606-6_5
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-9605-9
Online ISBN: 978-1-4842-9606-6
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)