Skip to main content

Reinforcement Learning for Distributed Control and Multi-player Games

  • Chapter
  • First Online:
Handbook of Reinforcement Learning and Control

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 325))

  • 7497 Accesses

Abstract

This chapter presents the optimal control solution using reinforcement learning (RL). RL methods can successfully learn the solution to the optimal control and game problems online and using measured data along the system trajectories. However, one major challenge is that standard RL algorithms are data hungry in the sense that they must obtain a large number of samples from the interaction with the system to learn about the optimal policy. We discuss data-efficient RL algorithms using concepts of off-policy learning and experience replay and show how to solve \({H}_2\) and \({H}_\infty \) control problems, as well as graphical games using these approaches. Off-policy and experience replay-based RL algorithms allow reuse of data for learning and consequently lead to data-efficient RL algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sutton, S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn, in preparation. MIT Press, Cambridge (2017)

    Google Scholar 

  2. Zhang, H., Liu, D., Luo, Y., Wang, D.: Adaptive Dynamic Programming for Control. Springer, London (2013)

    Google Scholar 

  3. Powell, W.B.: Approximate Dynamic Programming. Wiley, Hoboken (2007)

    Google Scholar 

  4. Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Wiley, Hoboken (2013)

    Google Scholar 

  5. Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9, 32–50 (2009)

    Article  Google Scholar 

  6. Lewis, F.L., Vrabie, D., Syrmos, V.L.: Optimal Control. Wiley, Hoboken (2012)

    Google Scholar 

  7. Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45, 477–484 (2009)

    Article  MathSciNet  Google Scholar 

  8. Werbos, P.J.: Approximate dynamic programming for real-time control and neural modeling. In: White, D.A., Sofge, D.A. (eds.) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York (1992)

    Google Scholar 

  9. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic Programming. Athena Scientific, MA (1996)

    MATH  Google Scholar 

  10. Vrabie, D., Lewis, F.L.: Neural network approach to continuous time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22, 237–246 (2009)

    Google Scholar 

  11. Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46, 878–888 (2010)

    Article  MathSciNet  Google Scholar 

  12. Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50, 193–202 (2014)

    Google Scholar 

  13. Kamalapurkar, R., Walters, P., Dixon, W.E.: Model-based reinforcement learning for approximate optimal regulation. Automatica 64, 94–104 (2016)

    Article  MathSciNet  Google Scholar 

  14. Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25, 882–893 (2014)

    Article  Google Scholar 

  15. Luo, B., Wu, H.N., Huang, T.: Off-policy reinforcement learning for \({H}_\infty \) control design. IEEE Trans. Cybern. 45, 65–76 (2015)

    Google Scholar 

  16. Jiang, Y., Jiang, Z.P.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48, 2699–2704 (2012)

    Article  MathSciNet  Google Scholar 

  17. Başar, T., Bernard, P.: \({H}_\infty \) Optimal Control and Related Minimax Design Problems. Birkhäuser, Boston (1995)

    Google Scholar 

  18. Abu-Khalaf, M., Lewis, F.L.: Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans. Neural Netw. 19, 1243–1252 (2008)

    Article  Google Scholar 

  19. Vamvoudakis, K.G., Lewis, F.L.: Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int. J. Robust Nonlinear Control 22, 1460–1483 (2012)

    Article  MathSciNet  Google Scholar 

  20. Modares, H., Lewis, F.L., Naghibi-Sistani, M.-B.: Online solution of nonquadratic two-player zero-sum games arising in the \(H_\infty \) control of constrained input systems. Int. J. Adapt. Control Signal Process. 28, 232–254 (2014)

    Article  MathSciNet  Google Scholar 

  21. Li, L., Liu, D., Wang, D.: Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans. Autom. Sci. Eng. 11, 706–714 (2014)

    Article  Google Scholar 

  22. Vamvoudakis, K.G., Lewis, F.L.: Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47, 1556–1569 (2011)

    Article  MathSciNet  Google Scholar 

  23. Johnson, M., Kamalapurkar, R., Bhasin, S., Dixon, W.E.: Approximate N-player nonzero-sum game solution for an uncertain continuous nonlinear system. IEEE Trans. Neural Netw. Learn. Syst. 26, 1645–1658 (2015)

    Article  MathSciNet  Google Scholar 

  24. Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica 48, 1598–1611 (2012)

    Article  MathSciNet  Google Scholar 

  25. Vamvoudakis, K.G., Modares, H., Kiumarsi, B., Lewis, F.L.: Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online. IEEE Control Syst. Mag. 37, 33–52 (2017)

    MathSciNet  Google Scholar 

  26. Li, J., Modares, H., Chai, T., Lewis, F.L., Xie, L.: Off-policy reinforcement learning for synchronization in multiagent graphical games. IEEE Trans. Neural Netw. Learn. Syst. 28, 2434–2445 (2017)

    Article  MathSciNet  Google Scholar 

  27. Modares, H., Lewis, F.L.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans. Autom. Control. 59 (2014)

    Google Scholar 

  28. Modares, H., Lewis, F.L., Jiang, Z.P.: H-\(\infty \) tracking control of completely unknown continuous-time systems. IEEE Trans. Neural Netw. Learn. Syst. 26, 2550–2562 (2015)

    Google Scholar 

  29. Modares, H., Nageshrao, S.P., Delgado Lopes, G.A., Babuska, R., Lewis, F.L.: Optimal model-free output synchronization of heterogeneous systems using off-policy reinforcement learning. Automatica 71, 334–341 (2016)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bahare Kiumarsi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kiumarsi, B., Modares, H., Lewis, F. (2021). Reinforcement Learning for Distributed Control and Multi-player Games. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_2

Download citation

Publish with us

Policies and ethics