Dynamic Programming and Reinforcement Learning

Joshi, Ameet V

doi:10.1007/978-3-030-26622-6_9

Ameet V Joshi²

8152 Accesses

Abstract

In this chapter we will study dynamic programming. Starting with the fundamental equation of dynamic programming as defined by Bellman, we will further dive deep into its generalization. We will understand the class of problems that can be solved with the framework of dynamic programming. Then we will study reinforcement learning as one subcategory of dynamic programming in detail. We will study the concepts of exploration and exploitation and the optimal tradeoff between them to achieve the best performance. We will also look at some variation of the reinforcement learning in the form of Q-learning and SARSA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wikipedia - Dynamic Programming Applications https://en.wikipedia.org/wiki/Dynamic_programming#Algorithms_that_use_dynamic_programming
Shannon number https://en.wikipedia.org/wiki/Shannon_number
Deep Blue (chess computer) https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)
Setting up Mario Bros. in OpenAI’s gym https://becominghuman.ai/getting-mario-back-into-the-gym-setting-up-super-mario-bros-in-openais-gym-8e39a96c1e41
Open AI Gym http://gym.openai.com/
Richard Bellman, Dynamic Programming, (Dover Publications, Inc., New York, 2003).
Google Scholar
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis, Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, AxXiv e-prints, Dec 2017.
Google Scholar
G. A. Rummery, Mahesh Niranjan On-Line Q-Learning using Connectionist Systems, volume 37. University of Cambridge, Department of Engineering.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft (United States), Redmond, WA, USA
Ameet V Joshi

Authors

Ameet V Joshi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Joshi, A.V. (2020). Dynamic Programming and Reinforcement Learning. In: Machine Learning and Artificial Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-26622-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-26622-6_9
Published: 25 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26621-9
Online ISBN: 978-3-030-26622-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics