Skip to main content
  • 4139 Accesses

Abstract

The term “reinforcement learning” refers to both a class of problems and a set of computational methods. No matter what type an RL algorithm belongs to, it generally contains four key elements: state-action samples, a policy, reward signals, and an environment model. In most stochastic tasks, the value function is defined as the expectation of the long-term return, which is used to evaluate how good a policy is. It naturally holds a recursive relationship called the self-consistency condition. Mathematically, the goal of RL is to optimize the weighted value function while being subjected to either the data distribution sampled from environment interactions (i.e., model-free) or the environment dynamic model (i.e., model-based). In addition, RL can be classified into indirect RL and direct RL based on their reliance on optimality conditions. Indirect RL methods aim to use the solution of the Bellman equation, which is the necessary and sufficient condition of optimality. In contrast, direct RL methods directly optimize the primal problem by searching the entire policy space. For conciseness and ease of understanding, this chapter begins with model-free RL in discrete-time stochastic environments and assumes that we have a Markov decision process (MDP) with finite state and action spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Li, S.E. (2023). Principles of RL Problems. In: Reinforcement Learning for Sequential Decision and Optimal Control. Springer, Singapore. https://doi.org/10.1007/978-981-19-7784-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-7784-8_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-7783-1

  • Online ISBN: 978-981-19-7784-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics