Abstract
This chapter presents the optimal control solution using reinforcement learning (RL). RL methods can successfully learn the solution to the optimal control and game problems online and using measured data along the system trajectories. However, one major challenge is that standard RL algorithms are data hungry in the sense that they must obtain a large number of samples from the interaction with the system to learn about the optimal policy. We discuss data-efficient RL algorithms using concepts of off-policy learning and experience replay and show how to solve \({H}_2\) and \({H}_\infty \) control problems, as well as graphical games using these approaches. Off-policy and experience replay-based RL algorithms allow reuse of data for learning and consequently lead to data-efficient RL algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sutton, S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn, in preparation. MIT Press, Cambridge (2017)
Zhang, H., Liu, D., Luo, Y., Wang, D.: Adaptive Dynamic Programming for Control. Springer, London (2013)
Powell, W.B.: Approximate Dynamic Programming. Wiley, Hoboken (2007)
Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Wiley, Hoboken (2013)
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9, 32–50 (2009)
Lewis, F.L., Vrabie, D., Syrmos, V.L.: Optimal Control. Wiley, Hoboken (2012)
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45, 477–484 (2009)
Werbos, P.J.: Approximate dynamic programming for real-time control and neural modeling. In: White, D.A., Sofge, D.A. (eds.) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York (1992)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic Programming. Athena Scientific, MA (1996)
Vrabie, D., Lewis, F.L.: Neural network approach to continuous time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22, 237–246 (2009)
Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46, 878–888 (2010)
Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50, 193–202 (2014)
Kamalapurkar, R., Walters, P., Dixon, W.E.: Model-based reinforcement learning for approximate optimal regulation. Automatica 64, 94–104 (2016)
Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25, 882–893 (2014)
Luo, B., Wu, H.N., Huang, T.: Off-policy reinforcement learning for \({H}_\infty \) control design. IEEE Trans. Cybern. 45, 65–76 (2015)
Jiang, Y., Jiang, Z.P.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48, 2699–2704 (2012)
Başar, T., Bernard, P.: \({H}_\infty \) Optimal Control and Related Minimax Design Problems. Birkhäuser, Boston (1995)
Abu-Khalaf, M., Lewis, F.L.: Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans. Neural Netw. 19, 1243–1252 (2008)
Vamvoudakis, K.G., Lewis, F.L.: Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int. J. Robust Nonlinear Control 22, 1460–1483 (2012)
Modares, H., Lewis, F.L., Naghibi-Sistani, M.-B.: Online solution of nonquadratic two-player zero-sum games arising in the \(H_\infty \) control of constrained input systems. Int. J. Adapt. Control Signal Process. 28, 232–254 (2014)
Li, L., Liu, D., Wang, D.: Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans. Autom. Sci. Eng. 11, 706–714 (2014)
Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47, 1556–1569 (2011)
Johnson, M., Kamalapurkar, R., Bhasin, S., Dixon, W.E.: Approximate N-player nonzero-sum game solution for an uncertain continuous nonlinear system. IEEE Trans. Neural Netw. Learn. Syst. 26, 1645–1658 (2015)
Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica 48, 1598–1611 (2012)
Vamvoudakis, K.G., Modares, H., Kiumarsi, B., Lewis, F.L.: Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online. IEEE Control Syst. Mag. 37, 33–52 (2017)
Li, J., Modares, H., Chai, T., Lewis, F.L., Xie, L.: Off-policy reinforcement learning for synchronization in multiagent graphical games. IEEE Trans. Neural Netw. Learn. Syst. 28, 2434–2445 (2017)
Modares, H., Lewis, F.L.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans. Autom. Control. 59 (2014)
Modares, H., Lewis, F.L., Jiang, Z.P.: H-\(\infty \) tracking control of completely unknown continuous-time systems. IEEE Trans. Neural Netw. Learn. Syst. 26, 2550–2562 (2015)
Modares, H., Nageshrao, S.P., Delgado Lopes, G.A., Babuska, R., Lewis, F.L.: Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning. Automatica 71, 334–341 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kiumarsi, B., Modares, H., Lewis, F. (2021). Reinforcement Learning for Distributed Control and Multi-player Games. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-60990-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60989-4
Online ISBN: 978-3-030-60990-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)