Abstract
In this paper, we propose a temporal difference (TD) learning method, called integral TD learning that efficiently finds solutions to continuous-time (CT) linear quadratic regulation (LQR) problems in an online fashion where system matrix A is unknown. The idea originates from a computational reinforcement learning method known as TD(0), which is the simplest TD method in a finite Markov decision process. For the proposed integral TD method, we mathematically analyze the positive definiteness of the updated value functions, monotone convergence conditions, and stability properties concerning the locations of the closed-loop poles in terms of the learning rate and the discount factor. The proposed method includes the existing value iteration method for CT LQR problems as a special case. Finally, numerical simulations are carried out to verify the effectiveness of the proposed method and further investigate the aforementioned mathematical properties.
Similar content being viewed by others
References
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge Univ Press, 1998.
J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Handbook of Learning and Approximate Dynamic Programming, Wiley-IEEE Press, 2004.
W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, Wiley-Interscience, 2007.
F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009.
J. M. Lee and J. H. Lee, “Approximate dynamic programming strategies and their applicability for process control: a review and future directions,” International Journal of Control, Automation, and Systems, vol. 2, no. 3, pp. 263–278, 2004.
S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” Proc. of American Control Conference (ACC), vol. 3, pp. 3475–3479, 1994.
D. V. Prokhorov and D. C. Wunsch II, “Adaptive critic designs,” IEEE Trans. Neural Networks, vol. 8, no. 5, pp. 997–1007, 1997.
A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Modelfree Q-learning designs for linear discrete-time zero-sum games with application to H ∞ control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007. [click]
W. C. Wong and J. H. Lee, “A reinforcement learningbased scheme for direct adaptive optimal control of linear stochastic systems,” Optimal Control Applications and Methods, vol. 31, no. 4, pp. 365–374, 2010.
D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Networks and Learning Systems, vol. 25, no. 3, pp. 621–634, 2014.
D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. Lewis, “Adaptive optimal control for continuous-time linear systems based on policy iteration,” Automatica, vol. 45, no. 2, pp. 477–484, 2009. [click]
D. Vrabie, F. Lewis, and M. Abu-Khalaf, “Biologically inspired scheme for continuous-time approximate dynamic programming,” Transactions of the Institute of Measurement and Control, vol. 30, no. 3–4, pp. 207–223, 2008. [click]
L. C. Baird III, “Reinforcement learning in continuous time: Advantage updating,” Proc. Int. Conf. Neural Networks, vol. 4, pp. 2448–2453, 1994.
K. Doya, “Reinforcement learning in continuous time and space,” Neural Computation, vol. 12, no. 1, pp. 219–245, 2000. [click]
J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic programming,” IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 32, no. 2, pp. 140–153, 2002.
J. Y. Lee, J. B. Park, and Y. H. Choi, “Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems,” Automatica, vol. 48, no. 11, pp. 2850–2859, 2012. [click]
J. Y. Lee, J. B. Park, and Y. H. Choi, “Approximate dynamic programming for continuous-time linear quadratic regulator problems: relaxation of known input-coupling assumption,” IET Control Theory & Applications, vol. 6, no. 13, pp. 2063–2075, 2012.
J. Y. Lee, T. Y. Chun, J. B. Park, and Y. H. Choi, “On generalized policy iteration for continuous-time linear systems,” Proc. of 50th IEEE CDC and ECC, pp. 1722–1728, 2011.
H. Li, D. Liu, and D. Wang, “Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics,” IEEE Trans. Automation Science and Engineering, vol. 11, no. 3, pp. 706–714, 2014.
L. M. Zhu, H. Modares, G. O. Peen, F. L. Lewis, and B. Yue, “Adaptive suboptimal output-feedback control for linear systems using integral reinforcement learning,” IEEE Trans. Control Systems Technology, vol. 23, no. 1, pp. 264–273, 2015.
M. P. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, “Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems,” IEEE Trans. Cybernetics, vol. 45, no. 2, pp. 165–176, 2015.
B. Anderson and J. Moore, Optimal Control: Linear Quadratic Methods. Prentice-Hall, Inc., 1989.
H. K. Khalil, Nonlinear Systems. Macmillan Publishing Company, New York, 2002.
J. Y. Lee, J. B. Park, and Y. H. Choi, “A novel generalized value iteration scheme for uncertain continuous-time linear systems,” Proc. Int. Conf. Decision and Control (CDC), pp. 4637–4642, 2010.
F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design methods to design optimal adaptive controllers,” IEEE Trans. Control Systems, vol. 32, no. 6, pp. 76–105, 2012.
B. L. Stevens and F. L. Lewis, Aircraft Control and Simulation, Wiley-Interscience, 2003.
L. Xie, “Output feedback H ∞ control of systems with parameter uncertainty,” International Journal of Control, vol. 63, no. 4, pp. 741–750, 1996. [click]
G. Strang, Linear Algebra and Its Applications, Thomson Higher Education, California, 2006.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor Changchun Hua under the direction of Editor Fuchun Sun.
Tae Yoon Chun received the B.S. and M.S. degree in Electrical and Electronic Engineering from Yonsei University, Seoul, Korea, in 2010 and 2012, respectively. Since 2012, he has been working as a research assistant in the Control Engineering Laboratory, Yonsei University, Seoul, Korea, where he is currently pursuing his Ph.D. degree in Electrical and Electronic Engineering. His major research interests include approximate dynamic programming/reinforcement learning, optimal/adaptive control, synchrophasor, and power systems.
Jae Young Lee received his B.S. degree in Information & Control Engineering in 2006 from Kwang-woon University, Seoul, Korea, and his Ph.D. degree in Electrical and Electronic Engineering in 2015 from Yonsei University, Seoul, Korea. Since Sep. 2015, he has been working as a postdoctoral fellow in Reinforcement Learning and Artificial Intelligence Laboratory, University of Alberta, Edmonton, Alberta, Canada. His major research interests include reinforcement learning/approximate dynamic programming, optimal/adaptive control, nonlinear control theories, neural networks, and their applications to multi-agent systems and robotics.
Jin Bae Park received his B.S. degree in electrical engineering from Yonsei University, Seoul, Korea, in 1977, and his M.S. and Ph.D. degrees in electrical engineering from Kansas State University, Manhattan, KS, USA, in 1985 and 1990, respectively. He has been with the Department of Electrical and Electronic Engineering, Yonsei University, since 1992, where he is currently a Professor. His current research interests include robust control and filtering, nonlinear control, drone, intelligent mobile robot, fuzzy logic control, neural networks, adaptive dynamic programming, and genetic algorithms. Dr. Park served as the Editor-in-Chief of the International Journal of Control, Automation, and Systems from 2006 to 2010, and the President of the Institute of Control, Robotics, and Systems in 2013.
Yoon Ho Choi received his B.S., M.S., and Ph.D. degrees in Electrical Engineering from Yonsei University, Seoul, Korea, in 1980, 1982, and 1991, respectively. He was with the Department of Electrical Engineering, Ohio State University, Columbus, OH, USA, as a Visiting Scholar from 2000 to 2002 and from 2009 to 2010. He has been with the Department of Electronic Engineering, Kyonggi University, Suwon, Korea, since 1993, where he is currently a Professor. His current research interests include nonlinear control, intelligent control, multilegged and mobile robots, networked control systems, and ADP-based control. He was the Director of the Institute of Control, Robotics and Systems from 2003 to 2004 and from 2007 to 2008, where he also served as the Vice President from 2012 to 2015.
Rights and permissions
About this article
Cite this article
Chun, T.Y., Lee, J.Y., Park, J.B. et al. Integral temporal difference learning for continuous-time linear quadratic regulations. Int. J. Control Autom. Syst. 15, 226–238 (2017). https://doi.org/10.1007/s12555-015-0319-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-015-0319-1