Integral temporal difference learning for continuous-time linear quadratic regulations

Chun, Tae Yoon; Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho

doi:10.1007/s12555-015-0319-1

Integral temporal difference learning for continuous-time linear quadratic regulations

Regular Papers
Control Theory and Applications
Published: 19 January 2017

Volume 15, pages 226–238, (2017)
Cite this article

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Tae Yoon Chun¹,
Jae Young Lee²,
Jin Bae Park¹ &
…
Yoon Ho Choi³

173 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, we propose a temporal difference (TD) learning method, called integral TD learning that efficiently finds solutions to continuous-time (CT) linear quadratic regulation (LQR) problems in an online fashion where system matrix A is unknown. The idea originates from a computational reinforcement learning method known as TD(0), which is the simplest TD method in a finite Markov decision process. For the proposed integral TD method, we mathematically analyze the positive definiteness of the updated value functions, monotone convergence conditions, and stability properties concerning the locations of the closed-loop poles in terms of the learning rate and the discount factor. The proposed method includes the existing value iteration method for CT LQR problems as a special case. Finally, numerical simulations are carried out to verify the effectiveness of the proposed method and further investigate the aforementioned mathematical properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

Article 24 August 2021

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Article 05 April 2023

References

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge Univ Press, 1998.
Google Scholar
J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Handbook of Learning and Approximate Dynamic Programming, Wiley-IEEE Press, 2004.
Book Google Scholar
W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, Wiley-Interscience, 2007.
Book MATH Google Scholar
F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009.
Article Google Scholar
J. M. Lee and J. H. Lee, “Approximate dynamic programming strategies and their applicability for process control: a review and future directions,” International Journal of Control, Automation, and Systems, vol. 2, no. 3, pp. 263–278, 2004.
Google Scholar
S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” Proc. of American Control Conference (ACC), vol. 3, pp. 3475–3479, 1994.
Google Scholar
D. V. Prokhorov and D. C. Wunsch II, “Adaptive critic designs,” IEEE Trans. Neural Networks, vol. 8, no. 5, pp. 997–1007, 1997.
Article Google Scholar
A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Modelfree Q-learning designs for linear discrete-time zero-sum games with application to H _∞ control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007. [click]
Article MathSciNet MATH Google Scholar
W. C. Wong and J. H. Lee, “A reinforcement learningbased scheme for direct adaptive optimal control of linear stochastic systems,” Optimal Control Applications and Methods, vol. 31, no. 4, pp. 365–374, 2010.
Article MathSciNet MATH Google Scholar
D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 25, no. 3, pp. 621–634, 2014.
Article Google Scholar
D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009. [click]
Article MathSciNet MATH Google Scholar
D. Vrabie, F. Lewis, and M. Abu-Khalaf, “Biologically inspired scheme for continuous-time approximate dynamic programming,” Transactions of the Institute of Measurement and Control, vol. 30, no. 3–4, pp. 207–223, 2008. [click]
Article Google Scholar
L. C. Baird III, “Reinforcement learning in continuous time: Advantage updating,” Proc. Int. Conf. Neural Networks, vol. 4, pp. 2448–2453, 1994.
Google Scholar
K. Doya, “Reinforcement learning in continuous time and space,” Neural Computation, vol. 12, no. 1, pp. 219–245, 2000. [click]
Article Google Scholar
J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic programming,” IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 32, no. 2, pp. 140–153, 2002.
Article Google Scholar
J. Y. Lee, J. B. Park, and Y. H. Choi, “Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems,” Automatica, vol. 48, no. 11, pp. 2850–2859, 2012. [click]
Article MathSciNet MATH Google Scholar
J. Y. Lee, J. B. Park, and Y. H. Choi, “Approximate dynamic programming for continuous-time linear quadratic regulator problems: relaxation of known input-coupling assumption,” IET Control Theory & Applications, vol. 6, no. 13, pp. 2063–2075, 2012.
Article MathSciNet Google Scholar
J. Y. Lee, T. Y. Chun, J. B. Park, and Y. H. Choi, “On generalized policy iteration for continuous-time linear systems,” Proc. of 50th IEEE CDC and ECC, pp. 1722–1728, 2011.
Google Scholar
H. Li, D. Liu, and D. Wang, “Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics,” IEEE Trans. Automation Science and Engineering, vol. 11, no. 3, pp. 706–714, 2014.
Article Google Scholar
L. M. Zhu, H. Modares, G. O. Peen, F. L. Lewis, and B. Yue, “Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning,” IEEE Trans. Control Systems Technology, vol. 23, no. 1, pp. 264–273, 2015.
Article Google Scholar
M. P. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, “Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems,” IEEE Trans. Cybernetics, vol. 45, no. 2, pp. 165–176, 2015.
Article Google Scholar
B. Anderson and J. Moore, Optimal Control: Linear Quadratic Methods. Prentice-Hall, Inc., 1989.
Google Scholar
H. K. Khalil, Nonlinear Systems. Macmillan Publishing Company, New York, 2002.
MATH Google Scholar
J. Y. Lee, J. B. Park, and Y. H. Choi, “A novel generalized value iteration scheme for uncertain continuous-time linear systems,” Proc. Int. Conf. Decision and Control (CDC), pp. 4637–4642, 2010.
Chapter Google Scholar
F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design methods to design optimal adaptive controllers,” IEEE Trans. Control Systems, vol. 32, no. 6, pp. 76–105, 2012.
Article MathSciNet Google Scholar
B. L. Stevens and F. L. Lewis, Aircraft Control and Simulation, Wiley-Interscience, 2003.
Google Scholar
L. Xie, “Output feedback H _∞ control of systems with parameter uncertainty,” International Journal of Control, vol. 63, no. 4, pp. 741–750, 1996. [click]
Article MathSciNet MATH Google Scholar
G. Strang, Linear Algebra and Its Applications, Thomson Higher Education, California, 2006.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Yonsei University, 50 Yonsei-ro, Seodaemungu, Seoul, Korea
Tae Yoon Chun & Jin Bae Park
University of Alberta, Edmonton, AB, Canada
Jae Young Lee
Department of Electronic Engineering, Kyonggi University, 94-6 Yiui-dong, Yeongtong-gu, Suwon, Kyonggi-Do, Korea
Yoon Ho Choi

Authors

Tae Yoon Chun
View author publications
You can also search for this author in PubMed Google Scholar
Jae Young Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jin Bae Park
View author publications
You can also search for this author in PubMed Google Scholar
Yoon Ho Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoon Ho Choi.

Additional information

Recommended by Associate Editor Changchun Hua under the direction of Editor Fuchun Sun.

Tae Yoon Chun received the B.S. and M.S. degree in Electrical and Electronic Engineering from Yonsei University, Seoul, Korea, in 2010 and 2012, respectively. Since 2012, he has been working as a research assistant in the Control Engineering Laboratory, Yonsei University, Seoul, Korea, where he is currently pursuing his Ph.D. degree in Electrical and Electronic Engineering. His major research interests include approximate dynamic programming/reinforcement learning, optimal/adaptive control, synchrophasor, and power systems.

Jae Young Lee received his B.S. degree in Information & Control Engineering in 2006 from Kwang-woon University, Seoul, Korea, and his Ph.D. degree in Electrical and Electronic Engineering in 2015 from Yonsei University, Seoul, Korea. Since Sep. 2015, he has been working as a postdoctoral fellow in Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Edmonton, Alberta, Canada. His major research interests include reinforcement learning/approximate dynamic programming, optimal/adaptive control, nonlinear control theories, neural networks, and their applications to multi-agent systems and robotics.

Jin Bae Park received his B.S. degree in electrical engineering from Yonsei University, Seoul, Korea, in 1977, and his M.S. and Ph.D. degrees in electrical engineering from Kansas State University, Manhattan, KS, USA, in 1985 and 1990, respectively. He has been with the Department of Electrical and Electronic Engineering, Yonsei University, since 1992, where he is currently a Professor. His current research interests include robust control and filtering, nonlinear control, drone, intelligent mobile robot, fuzzy logic control, neural networks, adaptive dynamic programming, and genetic algorithms. Dr. Park served as the Editor-in-Chief of the International Journal of Control, Automation, and Systems from 2006 to 2010, and the President of the Institute of Control, Robotics, and Systems in 2013.

Yoon Ho Choi received his B.S., M.S., and Ph.D. degrees in Electrical Engineering from Yonsei University, Seoul, Korea, in 1980, 1982, and 1991, respectively. He was with the Department of Electrical Engineering, Ohio State University, Columbus, OH, USA, as a Visiting Scholar from 2000 to 2002 and from 2009 to 2010. He has been with the Department of Electronic Engineering, Kyonggi University, Suwon, Korea, since 1993, where he is currently a Professor. His current research interests include nonlinear control, intelligent control, multilegged and mobile robots, networked control systems, and ADP-based control. He was the Director of the Institute of Control, Robotics and Systems from 2003 to 2004 and from 2007 to 2008, where he also served as the Vice President from 2012 to 2015.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chun, T.Y., Lee, J.Y., Park, J.B. et al. Integral temporal difference learning for continuous-time linear quadratic regulations. Int. J. Control Autom. Syst. 15, 226–238 (2017). https://doi.org/10.1007/s12555-015-0319-1

Download citation

Received: 21 August 2015
Revised: 01 February 2016
Accepted: 09 March 2016
Published: 19 January 2017
Issue Date: February 2017
DOI: https://doi.org/10.1007/s12555-015-0319-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integral temporal difference learning for continuous-time linear quadratic regulations

Abstract

Access this article

Similar content being viewed by others

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integral temporal difference learning for continuous-time linear quadratic regulations

Abstract

Access this article

Similar content being viewed by others

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation