Reinforcement Learning for Distributed Control and Multi-player Games

Kiumarsi, Bahare; Modares, Hamidreza; Lewis, Frank

doi:10.1007/978-3-030-60990-0_2

Bahare Kiumarsi⁶,
Hamidreza Modares⁶ &
Frank Lewis⁷

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 325))

7497 Accesses

Abstract

This chapter presents the optimal control solution using reinforcement learning (RL). RL methods can successfully learn the solution to the optimal control and game problems online and using measured data along the system trajectories. However, one major challenge is that standard RL algorithms are data hungry in the sense that they must obtain a large number of samples from the interaction with the system to learn about the optimal policy. We discuss data-efficient RL algorithms using concepts of off-policy learning and experience replay and show how to solve \({H}_2\) and \({H}_\infty \) control problems, as well as graphical games using these approaches. Off-policy and experience replay-based RL algorithms allow reuse of data for learning and consequently lead to data-efficient RL algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sutton, S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn, in preparation. MIT Press, Cambridge (2017)
Google Scholar
Zhang, H., Liu, D., Luo, Y., Wang, D.: Adaptive Dynamic Programming for Control. Springer, London (2013)
Google Scholar
Powell, W.B.: Approximate Dynamic Programming. Wiley, Hoboken (2007)
Google Scholar
Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Wiley, Hoboken (2013)
Google Scholar
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9, 32–50 (2009)
Article Google Scholar
Lewis, F.L., Vrabie, D., Syrmos, V.L.: Optimal Control. Wiley, Hoboken (2012)
Google Scholar
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45, 477–484 (2009)
Article MathSciNet Google Scholar
Werbos, P.J.: Approximate dynamic programming for real-time control and neural modeling. In: White, D.A., Sofge, D.A. (eds.) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York (1992)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic Programming. Athena Scientific, MA (1996)
MATH Google Scholar
Vrabie, D., Lewis, F.L.: Neural network approach to continuous time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22, 237–246 (2009)
Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46, 878–888 (2010)
Article MathSciNet Google Scholar
Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50, 193–202 (2014)
Google Scholar
Kamalapurkar, R., Walters, P., Dixon, W.E.: Model-based reinforcement learning for approximate optimal regulation. Automatica 64, 94–104 (2016)
Article MathSciNet Google Scholar
Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25, 882–893 (2014)
Article Google Scholar
Luo, B., Wu, H.N., Huang, T.: Off-policy reinforcement learning for \({H}_\infty \) control design. IEEE Trans. Cybern. 45, 65–76 (2015)
Google Scholar
Jiang, Y., Jiang, Z.P.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48, 2699–2704 (2012)
Article MathSciNet Google Scholar
Başar, T., Bernard, P.: \({H}_\infty \) Optimal Control and Related Minimax Design Problems. Birkhäuser, Boston (1995)
Google Scholar
Abu-Khalaf, M., Lewis, F.L.: Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans. Neural Netw. 19, 1243–1252 (2008)
Article Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int. J. Robust Nonlinear Control 22, 1460–1483 (2012)
Article MathSciNet Google Scholar
Modares, H., Lewis, F.L., Naghibi-Sistani, M.-B.: Online solution of nonquadratic two-player zero-sum games arising in the \(H_\infty \) control of constrained input systems. Int. J. Adapt. Control Signal Process. 28, 232–254 (2014)
Article MathSciNet Google Scholar
Li, L., Liu, D., Wang, D.: Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans. Autom. Sci. Eng. 11, 706–714 (2014)
Article Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47, 1556–1569 (2011)
Article MathSciNet Google Scholar
Johnson, M., Kamalapurkar, R., Bhasin, S., Dixon, W.E.: Approximate N-player nonzero-sum game solution for an uncertain continuous nonlinear system. IEEE Trans. Neural Netw. Learn. Syst. 26, 1645–1658 (2015)
Article MathSciNet Google Scholar
Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica 48, 1598–1611 (2012)
Article MathSciNet Google Scholar
Vamvoudakis, K.G., Modares, H., Kiumarsi, B., Lewis, F.L.: Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online. IEEE Control Syst. Mag. 37, 33–52 (2017)
MathSciNet Google Scholar
Li, J., Modares, H., Chai, T., Lewis, F.L., Xie, L.: Off-policy reinforcement learning for synchronization in multiagent graphical games. IEEE Trans. Neural Netw. Learn. Syst. 28, 2434–2445 (2017)
Article MathSciNet Google Scholar
Modares, H., Lewis, F.L.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans. Autom. Control. 59 (2014)
Google Scholar
Modares, H., Lewis, F.L., Jiang, Z.P.: H-\(\infty \) tracking control of completely unknown continuous-time systems. IEEE Trans. Neural Netw. Learn. Syst. 26, 2550–2562 (2015)
Google Scholar
Modares, H., Nageshrao, S.P., Delgado Lopes, G.A., Babuska, R., Lewis, F.L.: Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning. Automatica 71, 334–341 (2016)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Michigan State University, 428 S Shaw Ln, East Lansing, MI, 48824, USA
Bahare Kiumarsi & Hamidreza Modares
University of Texas at Arlington, 701 S Nedderman Dr, Arlington, TX, 76019, USA
Frank Lewis

Authors

Bahare Kiumarsi
View author publications
You can also search for this author in PubMed Google Scholar
Hamidreza Modares
View author publications
You can also search for this author in PubMed Google Scholar
Frank Lewis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bahare Kiumarsi .

Editor information

Editors and Affiliations

The Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Kyriakos G. Vamvoudakis
Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, USA
Yan Wan
Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, USA
Frank L. Lewis
Army Research Office, Durham, NC, USA
Derya Cansever

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kiumarsi, B., Modares, H., Lewis, F. (2021). Reinforcement Learning for Distributed Control and Multi-player Games. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-60990-0_2
Published: 24 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60989-4
Online ISBN: 978-3-030-60990-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics