Abstract
The term “reinforcement learning” refers to both a class of problems and a set of computational methods. No matter what type an RL algorithm belongs to, it generally contains four key elements: state-action samples, a policy, reward signals, and an environment model. In most stochastic tasks, the value function is defined as the expectation of the long-term return, which is used to evaluate how good a policy is. It naturally holds a recursive relationship called the self-consistency condition. Mathematically, the goal of RL is to optimize the weighted value function while being subjected to either the data distribution sampled from environment interactions (i.e., model-free) or the environment dynamic model (i.e., model-based). In addition, RL can be classified into indirect RL and direct RL based on their reliance on optimality conditions. Indirect RL methods aim to use the solution of the Bellman equation, which is the necessary and sufficient condition of optimality. In contrast, direct RL methods directly optimize the primal problem by searching the entire policy space. For conciseness and ease of understanding, this chapter begins with model-free RL in discrete-time stochastic environments and assumes that we have a Markov decision process (MDP) with finite state and action spaces.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Li, S.E. (2023). Principles of RL Problems. In: Reinforcement Learning for Sequential Decision and Optimal Control. Springer, Singapore. https://doi.org/10.1007/978-981-19-7784-8_2
Download citation
DOI: https://doi.org/10.1007/978-981-19-7784-8_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7783-1
Online ISBN: 978-981-19-7784-8
eBook Packages: Computer ScienceComputer Science (R0)