Principles of RL Problems

Li, Shengbo Eben

doi:10.1007/978-981-19-7784-8_2

Shengbo Eben Li²

4139 Accesses

Abstract

The term “reinforcement learning” refers to both a class of problems and a set of computational methods. No matter what type an RL algorithm belongs to, it generally contains four key elements: state-action samples, a policy, reward signals, and an environment model. In most stochastic tasks, the value function is defined as the expectation of the long-term return, which is used to evaluate how good a policy is. It naturally holds a recursive relationship called the self-consistency condition. Mathematically, the goal of RL is to optimize the weighted value function while being subjected to either the data distribution sampled from environment interactions (i.e., model-free) or the environment dynamic model (i.e., model-based). In addition, RL can be classified into indirect RL and direct RL based on their reliance on optimality conditions. Indirect RL methods aim to use the solution of the Bellman equation, which is the necessary and sufficient condition of optimality. In contrast, direct RL methods directly optimize the primal problem by searching the entire policy space. For conciseness and ease of understanding, this chapter begins with model-free RL in discrete-time stochastic environments and assumes that we have a Markov decision process (MDP) with finite state and action spaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

School of Vehicle and Mobility, Tsinghua University, Beijing, China
Shengbo Eben Li

Authors

Shengbo Eben Li
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, S.E. (2023). Principles of RL Problems. In: Reinforcement Learning for Sequential Decision and Optimal Control. Springer, Singapore. https://doi.org/10.1007/978-981-19-7784-8_2

Download citation

DOI: https://doi.org/10.1007/978-981-19-7784-8_2
Published: 06 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7783-1
Online ISBN: 978-981-19-7784-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics