Abstract
Online learning is an important property of adaptive dynamic programming (ADP). Online observations contain plentiful dynamics information, and ADP algorithms can utilize them to learn the optimal control policy. This paper reviews the research of online ADP algorithms for the optimal control of continuous-time systems. With the intensive study, ADP has been developed towards model free and data efficient. After separately introducing the algorithms, we compare their performance on the same problem. This paper is desired to provide a comprehensive understanding of continuous-time online ADP algorithms.
Similar content being viewed by others
References
Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38(4):943–949
Bardi M, Capuzzo-Dolcetta I (2008) Optimal control and viscosity solutions of Hamilton–Jacobi–Bellman equations. Springer, NewYork
Beard R, McLain T et al (1998) Successive Galerkin approximation algorithms for nonlinear optimal and robust control. Int J Control 71(5):717–743
Beard RW, Saridis GN, Wen JT (1997) Galerkin approximations of the generalized Hamilton–Jacobi–Bellman equation. Automatica 33(12):2159–2177
Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor–critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92
Cochocki A, Unbehauen R (1993) Neural networks for optimization and signal processing, 1st edn. Wiley, NewYork, NY
Hunt K, Sbarbaro D, Zbikowski R, Gawthrop P (1992) Neural networks for control systemsa survey. Automatica 28(6):1083–1112
Jiang Y, Jiang ZP (2014) Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(5):882–893
Jiang Y, Jiang ZP (2015) Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Autom Control 60(11):2917–2929
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
Modares H, Lewis FL, Naghibi-Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 24(10):1513–1525
Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202
Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153
Ribeiro C (2002) Reinforcement learning agents. Artif Intell Rev 17(3):223–250
Song R, Lewis F, Wei Q, Zhang HG, Jiang ZP, Levine D (2015) Multiple actor–critic structures for continuous-time optimal control using input–output data. IEEE Trans Neural Netw Learn Syst 26(4):851–865
Stevens BL, Lewis FL (2003) Aircraft control and simulation. Wiley, Hoboken
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Vamvoudakis K, Vrabie D, Lewis F (2011) Online adaptive learning of optimal control solutions using integral reinforcement learning. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pp 250–257
Vamvoudakis KG, Lewis FL (2010) Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Vamvoudakis KG, Vrabie D, Lewis FL (2014) Online adaptive algorithm for optimal control with integral reinforcement learning. Int J Robust Nonlinear Control 24(17):2686–2710
Vrabie D, Lewis F (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246
Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832
Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47
Werbos PJ (1977) Advanced forecasting methods for global crisis warning and models of intelligence. Gen Syst Yearb 22(6):25–38
Zhang H, Liu D, Luo Y, Wang D (2012) Adaptive dynamic programming for control: algorithms and stability. Springer, NewYork
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216
Zhao D, Zhu Y (2015) MEC—a near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans Neural Netw Learn Syst 26(2):346–356
Zhao D, Zhang Q, Wang D, Zhu Y (2016) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865
Zhu Y, Zhao D, He H, Ji J (2016a) Event-triggered optimal control for partially-unknown constrained-input systems via adaptive dynamic programming. IEEE Trans Ind Electron PP(99):1
Zhu Y, Zhao D, Li X (2016b) Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET Control Theory Appl 10(12):1339–1347
Zhu Y, Zhao D, Li X (2017a) Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data. IEEE Trans Neural Netw Learn Syst 28(3):714–725
Zhu Y, Zhao D, Yang X, Zhang Q (2017b) Policy iteration for \({H}_\infty \) optimal control of polynomial nonlinear systems via sum of squares programming. IEEE Trans Cybern PP(99):1–10
Acknowledgements
This work is supported partly by National Natural Science Foundation of China (61603382, 61573353, 61533017), and partly by the Early Career Development Award of SKLMCCS.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, Y., Zhao, D. Comprehensive comparison of online ADP algorithms for continuous-time optimal control. Artif Intell Rev 49, 531–547 (2018). https://doi.org/10.1007/s10462-017-9548-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-017-9548-4