Comprehensive comparison of online ADP algorithms for continuous-time optimal control

Zhu, Yuanheng; Zhao, Dongbin

doi:10.1007/s10462-017-9548-4

Comprehensive comparison of online ADP algorithms for continuous-time optimal control

Published: 24 February 2017

Volume 49, pages 531–547, (2018)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Yuanheng Zhu^1,2 &
Dongbin Zhao^1,2

1319 Accesses
64 Citations
Explore all metrics

Abstract

Online learning is an important property of adaptive dynamic programming (ADP). Online observations contain plentiful dynamics information, and ADP algorithms can utilize them to learn the optimal control policy. This paper reviews the research of online ADP algorithms for the optimal control of continuous-time systems. With the intensive study, ADP has been developed towards model free and data efficient. After separately introducing the algorithms, we compare their performance on the same problem. This paper is desired to provide a comprehensive understanding of continuous-time online ADP algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Article 05 April 2023

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

Adaptive Control for Systems with Output Constraints Using an Online Optimization Method

Article 19 August 2014

References

Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791
Article MathSciNet MATH Google Scholar
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B Cybern 38(4):943–949
Article Google Scholar
Bardi M, Capuzzo-Dolcetta I (2008) Optimal control and viscosity solutions of Hamilton–Jacobi–Bellman equations. Springer, NewYork
MATH Google Scholar
Beard R, McLain T et al (1998) Successive Galerkin approximation algorithms for nonlinear optimal and robust control. Int J Control 71(5):717–743
Article MathSciNet MATH Google Scholar
Beard RW, Saridis GN, Wen JT (1997) Galerkin approximations of the generalized Hamilton–Jacobi–Bellman equation. Automatica 33(12):2159–2177
Article MathSciNet MATH Google Scholar
Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis KG, Lewis FL, Dixon WE (2013) A novel actor–critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92
Article MathSciNet MATH Google Scholar
Cochocki A, Unbehauen R (1993) Neural networks for optimization and signal processing, 1st edn. Wiley, NewYork, NY
Google Scholar
Hunt K, Sbarbaro D, Zbikowski R, Gawthrop P (1992) Neural networks for control systemsa survey. Automatica 28(6):1083–1112
Article MathSciNet MATH Google Scholar
Jiang Y, Jiang ZP (2014) Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(5):882–893
Article Google Scholar
Jiang Y, Jiang ZP (2015) Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Autom Control 60(11):2917–2929
Article MathSciNet MATH Google Scholar
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Google Scholar
Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
Article Google Scholar
Modares H, Lewis FL, Naghibi-Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learn Syst 24(10):1513–1525
Article Google Scholar
Modares H, Lewis FL, Naghibi-Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202
Article MathSciNet MATH Google Scholar
Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern Part C Appl Rev 32(2):140–153
Article Google Scholar
Ribeiro C (2002) Reinforcement learning agents. Artif Intell Rev 17(3):223–250
Article MATH Google Scholar
Song R, Lewis F, Wei Q, Zhang HG, Jiang ZP, Levine D (2015) Multiple actor–critic structures for continuous-time optimal control using input–output data. IEEE Trans Neural Netw Learn Syst 26(4):851–865
Article MathSciNet Google Scholar
Stevens BL, Lewis FL (2003) Aircraft control and simulation. Wiley, Hoboken
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Vamvoudakis K, Vrabie D, Lewis F (2011) Online adaptive learning of optimal control solutions using integral reinforcement learning. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pp 250–257
Vamvoudakis KG, Lewis FL (2010) Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Article MathSciNet MATH Google Scholar
Vamvoudakis KG, Vrabie D, Lewis FL (2014) Online adaptive algorithm for optimal control with integral reinforcement learning. Int J Robust Nonlinear Control 24(17):2686–2710
Article MathSciNet MATH Google Scholar
Vrabie D, Lewis F (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246
Article MATH Google Scholar
Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832
Article MathSciNet MATH Google Scholar
Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47
Article Google Scholar
Werbos PJ (1977) Advanced forecasting methods for global crisis warning and models of intelligence. Gen Syst Yearb 22(6):25–38
Google Scholar
Zhang H, Liu D, Luo Y, Wang D (2012) Adaptive dynamic programming for control: algorithms and stability. Springer, NewYork
Google Scholar
Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 43(1):206–216
Article Google Scholar
Zhao D, Zhu Y (2015) MEC—a near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans Neural Netw Learn Syst 26(2):346–356
Article MathSciNet Google Scholar
Zhao D, Zhang Q, Wang D, Zhu Y (2016) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865
Article Google Scholar
Zhu Y, Zhao D, He H, Ji J (2016a) Event-triggered optimal control for partially-unknown constrained-input systems via adaptive dynamic programming. IEEE Trans Ind Electron PP(99):1
Zhu Y, Zhao D, Li X (2016b) Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics. IET Control Theory Appl 10(12):1339–1347
Article MathSciNet Google Scholar
Zhu Y, Zhao D, Li X (2017a) Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data. IEEE Trans Neural Netw Learn Syst 28(3):714–725
Article MathSciNet Google Scholar
Zhu Y, Zhao D, Yang X, Zhang Q (2017b) Policy iteration for \({H}_\infty \) optimal control of polynomial nonlinear systems via sum of squares programming. IEEE Trans Cybern PP(99):1–10

Download references

Acknowledgements

This work is supported partly by National Natural Science Foundation of China (61603382, 61573353, 61533017), and partly by the Early Career Development Award of SKLMCCS.

Author information

Authors and Affiliations

The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Yuanheng Zhu & Dongbin Zhao
University of Chinese Academy of Sciences, Beijing, China
Yuanheng Zhu & Dongbin Zhao

Authors

Yuanheng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Dongbin Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongbin Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Y., Zhao, D. Comprehensive comparison of online ADP algorithms for continuous-time optimal control. Artif Intell Rev 49, 531–547 (2018). https://doi.org/10.1007/s10462-017-9548-4

Download citation

Published: 24 February 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10462-017-9548-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comprehensive comparison of online ADP algorithms for continuous-time optimal control

Abstract

Access this article

Similar content being viewed by others

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

Adaptive Control for Systems with Output Constraints Using an Online Optimization Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comprehensive comparison of online ADP algorithms for continuous-time optimal control

Abstract

Access this article

Similar content being viewed by others

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

Adaptive Control for Systems with Output Constraints Using an Online Optimization Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation