Temporal Difference Learning

Hu, Michael

doi:10.1007/978-1-4842-9606-6_5

Michael Hu²

551 Accesses

Abstract

In the previous chapter (Chap. 4), we introduced Monte Carlo (MC) methods for reinforcement learning, which allow an agent to learn from its own experience without relying on a model of the environment. However, MC methods are limited to episodic reinforcement learning problems, where the agent interacts with the environment in discrete episodes. In this chapter, we introduce temporal difference (TD) learning, a class of algorithms that generalize MC methods to support both episodic and continuing reinforcement learning problems. TD learning is based on the idea of updating value estimates using a combination of current rewards and estimated future rewards, making it a more versatile and efficient approach than MC methods for many types of reinforcement learning problems. We’ll explore the theory and implementation of various TD learning algorithms in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3(1):9–44, Aug 1988.
Article Google Scholar
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018.
Google Scholar
Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279–292, May 1992.
Google Scholar
Hado Hasselt. Double q-learning. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010.
Google Scholar
Dimitri P Bertsekas. Reinforcement learning and optimal control / by Dimitri P. Bertsekas. Athena Scientific optimization and computation series. Athena Scientific, Belmont, Massachusetts, 2019.
Google Scholar
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improvements in deep reinforcement learning, 2017.
Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai, Shanghai, China
Michael Hu

Authors

Michael Hu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hu, M. (2023). Temporal Difference Learning. In: The Art of Reinforcement Learning. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-9606-6_5

Download citation

DOI: https://doi.org/10.1007/978-1-4842-9606-6_5
Published: 20 July 2023
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-9605-9
Online ISBN: 978-1-4842-9606-6
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics