Skip to main content

Temporal Difference Learning

  • Chapter
  • First Online:
The Art of Reinforcement Learning
  • 551 Accesses

Abstract

In the previous chapter (Chap. 4), we introduced Monte Carlo (MC) methods for reinforcement learning, which allow an agent to learn from its own experience without relying on a model of the environment. However, MC methods are limited to episodic reinforcement learning problems, where the agent interacts with the environment in discrete episodes. In this chapter, we introduce temporal difference (TD) learning, a class of algorithms that generalize MC methods to support both episodic and continuing reinforcement learning problems. TD learning is based on the idea of updating value estimates using a combination of current rewards and estimated future rewards, making it a more versatile and efficient approach than MC methods for many types of reinforcement learning problems. We’ll explore the theory and implementation of various TD learning algorithms in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Richard S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3(1):9–44, Aug 1988.

    Article  Google Scholar 

  2. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018.

    Google Scholar 

  3. Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279–292, May 1992.

    Google Scholar 

  4. Hado Hasselt. Double q-learning. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010.

    Google Scholar 

  5. Dimitri P Bertsekas. Reinforcement learning and optimal control / by Dimitri P. Bertsekas. Athena Scientific optimization and computation series. Athena Scientific, Belmont, Massachusetts, 2019.

    Google Scholar 

  6. Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improvements in deep reinforcement learning, 2017.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hu, M. (2023). Temporal Difference Learning. In: The Art of Reinforcement Learning. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-9606-6_5

Download citation

Publish with us

Policies and ethics