Skip to main content
Log in

Integral temporal difference learning for continuous-time linear quadratic regulations

  • Regular Papers
  • Control Theory and Applications
  • Published:
International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Abstract

In this paper, we propose a temporal difference (TD) learning method, called integral TD learning that efficiently finds solutions to continuous-time (CT) linear quadratic regulation (LQR) problems in an online fashion where system matrix A is unknown. The idea originates from a computational reinforcement learning method known as TD(0), which is the simplest TD method in a finite Markov decision process. For the proposed integral TD method, we mathematically analyze the positive definiteness of the updated value functions, monotone convergence conditions, and stability properties concerning the locations of the closed-loop poles in terms of the learning rate and the discount factor. The proposed method includes the existing value iteration method for CT LQR problems as a special case. Finally, numerical simulations are carried out to verify the effectiveness of the proposed method and further investigate the aforementioned mathematical properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge Univ Press, 1998.

    Google Scholar 

  2. J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Handbook of Learning and Approximate Dynamic Programming, Wiley-IEEE Press, 2004.

    Book  Google Scholar 

  3. W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, Wiley-Interscience, 2007.

    Book  MATH  Google Scholar 

  4. F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009.

    Article  Google Scholar 

  5. J. M. Lee and J. H. Lee, “Approximate dynamic programming strategies and their applicability for process control: a review and future directions,” International Journal of Control, Automation, and Systems, vol. 2, no. 3, pp. 263–278, 2004.

    Google Scholar 

  6. S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” Proc. of American Control Conference (ACC), vol. 3, pp. 3475–3479, 1994.

    Google Scholar 

  7. D. V. Prokhorov and D. C. Wunsch II, “Adaptive critic designs,” IEEE Trans. Neural Networks, vol. 8, no. 5, pp. 997–1007, 1997.

    Article  Google Scholar 

  8. A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Modelfree Q-learning designs for linear discrete-time zero-sum games with application to H control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007. [click]

    Article  MathSciNet  MATH  Google Scholar 

  9. W. C. Wong and J. H. Lee, “A reinforcement learningbased scheme for direct adaptive optimal control of linear stochastic systems,” Optimal Control Applications and Methods, vol. 31, no. 4, pp. 365–374, 2010.

    Article  MathSciNet  MATH  Google Scholar 

  10. D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 25, no. 3, pp. 621–634, 2014.

    Article  Google Scholar 

  11. D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009. [click]

    Article  MathSciNet  MATH  Google Scholar 

  12. D. Vrabie, F. Lewis, and M. Abu-Khalaf, “Biologically inspired scheme for continuous-time approximate dynamic programming,” Transactions of the Institute of Measurement and Control, vol. 30, no. 3–4, pp. 207–223, 2008. [click]

    Article  Google Scholar 

  13. L. C. Baird III, “Reinforcement learning in continuous time: Advantage updating,” Proc. Int. Conf. Neural Networks, vol. 4, pp. 2448–2453, 1994.

    Google Scholar 

  14. K. Doya, “Reinforcement learning in continuous time and space,” Neural Computation, vol. 12, no. 1, pp. 219–245, 2000. [click]

    Article  Google Scholar 

  15. J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic programming,” IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 32, no. 2, pp. 140–153, 2002.

    Article  Google Scholar 

  16. J. Y. Lee, J. B. Park, and Y. H. Choi, “Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems,” Automatica, vol. 48, no. 11, pp. 2850–2859, 2012. [click]

    Article  MathSciNet  MATH  Google Scholar 

  17. J. Y. Lee, J. B. Park, and Y. H. Choi, “Approximate dynamic programming for continuous-time linear quadratic regulator problems: relaxation of known input-coupling assumption,” IET Control Theory & Applications, vol. 6, no. 13, pp. 2063–2075, 2012.

    Article  MathSciNet  Google Scholar 

  18. J. Y. Lee, T. Y. Chun, J. B. Park, and Y. H. Choi, “On generalized policy iteration for continuous-time linear systems,” Proc. of 50th IEEE CDC and ECC, pp. 1722–1728, 2011.

    Google Scholar 

  19. H. Li, D. Liu, and D. Wang, “Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics,” IEEE Trans. Automation Science and Engineering, vol. 11, no. 3, pp. 706–714, 2014.

    Article  Google Scholar 

  20. L. M. Zhu, H. Modares, G. O. Peen, F. L. Lewis, and B. Yue, “Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning,” IEEE Trans. Control Systems Technology, vol. 23, no. 1, pp. 264–273, 2015.

    Article  Google Scholar 

  21. M. P. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, “Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems,” IEEE Trans. Cybernetics, vol. 45, no. 2, pp. 165–176, 2015.

    Article  Google Scholar 

  22. B. Anderson and J. Moore, Optimal Control: Linear Quadratic Methods. Prentice-Hall, Inc., 1989.

    Google Scholar 

  23. H. K. Khalil, Nonlinear Systems. Macmillan Publishing Company, New York, 2002.

    MATH  Google Scholar 

  24. J. Y. Lee, J. B. Park, and Y. H. Choi, “A novel generalized value iteration scheme for uncertain continuous-time linear systems,” Proc. Int. Conf. Decision and Control (CDC), pp. 4637–4642, 2010.

    Chapter  Google Scholar 

  25. F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design methods to design optimal adaptive controllers,” IEEE Trans. Control Systems, vol. 32, no. 6, pp. 76–105, 2012.

    Article  MathSciNet  Google Scholar 

  26. B. L. Stevens and F. L. Lewis, Aircraft Control and Simulation, Wiley-Interscience, 2003.

    Google Scholar 

  27. L. Xie, “Output feedback H control of systems with parameter uncertainty,” International Journal of Control, vol. 63, no. 4, pp. 741–750, 1996. [click]

    Article  MathSciNet  MATH  Google Scholar 

  28. G. Strang, Linear Algebra and Its Applications, Thomson Higher Education, California, 2006.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoon Ho Choi.

Additional information

Recommended by Associate Editor Changchun Hua under the direction of Editor Fuchun Sun.

Tae Yoon Chun received the B.S. and M.S. degree in Electrical and Electronic Engineering from Yonsei University, Seoul, Korea, in 2010 and 2012, respectively. Since 2012, he has been working as a research assistant in the Control Engineering Laboratory, Yonsei University, Seoul, Korea, where he is currently pursuing his Ph.D. degree in Electrical and Electronic Engineering. His major research interests include approximate dynamic programming/reinforcement learning, optimal/adaptive control, synchrophasor, and power systems.

Jae Young Lee received his B.S. degree in Information & Control Engineering in 2006 from Kwang-woon University, Seoul, Korea, and his Ph.D. degree in Electrical and Electronic Engineering in 2015 from Yonsei University, Seoul, Korea. Since Sep. 2015, he has been working as a postdoctoral fellow in Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Edmonton, Alberta, Canada. His major research interests include reinforcement learning/approximate dynamic programming, optimal/adaptive control, nonlinear control theories, neural networks, and their applications to multi-agent systems and robotics.

Jin Bae Park received his B.S. degree in electrical engineering from Yonsei University, Seoul, Korea, in 1977, and his M.S. and Ph.D. degrees in electrical engineering from Kansas State University, Manhattan, KS, USA, in 1985 and 1990, respectively. He has been with the Department of Electrical and Electronic Engineering, Yonsei University, since 1992, where he is currently a Professor. His current research interests include robust control and filtering, nonlinear control, drone, intelligent mobile robot, fuzzy logic control, neural networks, adaptive dynamic programming, and genetic algorithms. Dr. Park served as the Editor-in-Chief of the International Journal of Control, Automation, and Systems from 2006 to 2010, and the President of the Institute of Control, Robotics, and Systems in 2013.

Yoon Ho Choi received his B.S., M.S., and Ph.D. degrees in Electrical Engineering from Yonsei University, Seoul, Korea, in 1980, 1982, and 1991, respectively. He was with the Department of Electrical Engineering, Ohio State University, Columbus, OH, USA, as a Visiting Scholar from 2000 to 2002 and from 2009 to 2010. He has been with the Department of Electronic Engineering, Kyonggi University, Suwon, Korea, since 1993, where he is currently a Professor. His current research interests include nonlinear control, intelligent control, multilegged and mobile robots, networked control systems, and ADP-based control. He was the Director of the Institute of Control, Robotics and Systems from 2003 to 2004 and from 2007 to 2008, where he also served as the Vice President from 2012 to 2015.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chun, T.Y., Lee, J.Y., Park, J.B. et al. Integral temporal difference learning for continuous-time linear quadratic regulations. Int. J. Control Autom. Syst. 15, 226–238 (2017). https://doi.org/10.1007/s12555-015-0319-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12555-015-0319-1

Keywords

Navigation