Adaptive Dynamic Programming in the Hamiltonian-Driven Framework

Yang, Yongliang; Wunsch II, Donald C.; Yin, Yixin

doi:10.1007/978-3-030-60990-0_7

Yongliang Yang⁶,
Donald C. Wunsch II⁸ &
Yixin Yin⁷

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 325))

7388 Accesses

Abstract

This chapter presents a Hamiltonian-driven framework of adaptive dynamic programming (ADP) for continuous-time nonlinear systems. Three fundamental problems for solving the optimal control problem are presented, i.e., the evaluation of given admissible policy, the comparison between two different admissible policies with respect to the performance, and the performance improvement of given admissible control. It is shown that the Hamiltonian functional can be viewed as the temporal difference for dynamical systems in continuous time. Therefore, the minimization of the Hamiltonian functional is equivalent to the value function approximation. An iterative algorithm starting from an arbitrary admissible control is presented for the optimal control approximation with its convergence proof. The Hamiltonian-driven ADP algorithm can be implemented using a critic only structure, which is trained to approximate the optimal value gradient. Simulation example is conducted to verify the effectiveness of Hamiltonian-driven ADP.

This work was supported in part by the National Natural Science Foundation of China under Grant No. 61903028, in part by the China Post-Doctoral Science Foundation under Grant 2018M641197, in part by the Fundamental Research Funds for the Central Universities under grant No. FRF-TP-18-031A1 and No. FRF-BD-17-002A, in part by the DARPA/Microsystems Technology Office and in part by the Army Research Laboratory under grant No. W911NF-18-2-0260. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network hjb approach. Automatica 41(5), 779–791 (2005)
Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton, NJ, USA (1957)
MATH Google Scholar
Dimitri, P.: Bertsekas. Nonlinear programming, Athena Scientific, Belmont, MA (1995)
Google Scholar
Bertsekas, D.P.: Dynamic programming and optimal control, 3rd edn., vol. 1, Athena Scientific, Belmont, MA (2007)
Google Scholar
Fairbank, M., Alonso, E.: Value-gradient learning. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012)
Google Scholar
Hadda, W., Chellaboina, V.: Nonlinear dynamical systems and control (2008)
Google Scholar
Jiang, Yu., Jiang, Z.-P.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10), 2699–2704 (2012)
Article MathSciNet Google Scholar
Orrin Frink, J.R.: Differentiation of sequences. Bull. Am. Math. Soc. 41, 553–560 (1935)
Article MathSciNet Google Scholar
Kiumarsi, B., Vamvoudakis, K.G., Modares, H., Lewis, F.L.: Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2042–2062 (2018)
Article MathSciNet Google Scholar
Kiumarsi, B., Lewis, F.L., Jiang, Z.-P.: H\(_\infty \) control of linear discrete-time systems: Off-policy reinforcement learning. Automatica 78, 144–152 (2017)
Article MathSciNet Google Scholar
Lewis, F.L., Liu, D.: Reinforcement learning and approximate dynamic programming for feedback control, vol. 17. Wiley, Hoboken (2013)
Google Scholar
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. Circuits Syst. Mag. IEEE 9(3), 32–50 (2009)
Google Scholar
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. 32(6), 76–105 (2012)
Google Scholar
Daniel, L.: Calculus of variations and optimal control theory: a concise introduction. Princeton University Press, Princeton (2012)
Google Scholar
Biao, L., Wu, H.-N., Tingwen, H.: Off-policy reinforcement learning for control design. IEEE Trans. Cybern. 45(1), 65–76 (2015)
Google Scholar
Miller, W.T., Werbos, P.J., Sutton, R.S.: Neural networks for control. MIT press, Cambridge (1995)
Google Scholar
Modares, H., Lewis, F.L.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)
Google Scholar
Modares, H., Lewis, F.L., Jiang, Z.-P.: Tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 26(10), 2550–2562 (2015)
Google Scholar
Modares, H., Lewis, F.L., Naghibi-Sistani, M.-B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)
Google Scholar
Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R.: Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 32(2), 140–153 (2002)
Google Scholar
Padhi, R., Unnikrishnan, N., Wang, X., Balakrishnan, S.N.: A single network adaptive critic (snac) architecture for optimal control synthesis for a class of nonlinear systems. Neural Netw. 19(10), 1648–1660 (2006)
Google Scholar
Lev Semenovich Pontryagin.: Mathematical theory of optimal processes. CRC Press, Boca Raton (1987)
Google Scholar
Prokhorov, D.V., Wunsch, D.C., et al.: Adaptive critic designs. Neural Netw. IEEE Trans. 8(5), 997–1007 (1997)
Google Scholar
Saridis, G.N., Lee, C.-S.G.: An approximation theory of optimal control for trainable manipulators. IEEE Trans. Syst. Man Cybern. 9(3), 152–159 (1979)
Google Scholar
Speyer, J.L., Jacobson, D.H.: Primer on optimal control theory, vol. 20. SIAM, Philadelphia (2010)
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction, vol. 1, MIT Press, Cambridge (1998)
Google Scholar
Vamvoudakis, K.G., Ferraz, H.: Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance. Automatica 87, 412–420 (2018)
Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)
Google Scholar
Van der Schaft, A.J.: L2-gain analysis of nonlinear systems and nonlinear state-feedback h control. IEEE Trans. Autom. Control 37(6), 770–784 (1992)
Google Scholar
Vrabie, D., Lewis, F.: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 22(3), 237–246 (2009)
Article Google Scholar
Wang, F.-Y., Zhang, H., Liu, D.: Adaptive dynamic programming: an introduction. IEEE Comput. Intell. Mag. 4(2), 39–47 (2009)
Article Google Scholar
Werbos, P.J.: Beyond regression: new tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard (1974)
Google Scholar
Paul John Werbos: Consistency of hdp applied to a simple reinforcement learning problem. Neural Netw. 3(2), 179–189 (1990)
Article Google Scholar
Paul John Werbos: Approximate dynamic programming for real-time control and neural modeling. Handb. Intell. Control.: Neural Fuzzy Adapt. Approaches 15, 493–525 (1992)
Google Scholar
Werbos, P.J.: The roots of backpropagation: from ordered derivatives to neural networks and political forecasting, vol. 1. Wiley, Hoboken (1994)
Google Scholar
White, D.A., Sofge, D.A.: Handbook of intelligent control: neural, fuzzy, and adaptative approaches. Van Nostrand Reinhold Company, New York (1992)
Google Scholar
Yang, Y., Guo, Z.-S., Haoyi Xiong, Z.-S., Ding, D.-W., Yin, Y., Wunsch, D.C.: Data-driven robust control of discrete-time uncertain linear systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. (2019)
Google Scholar
Yang, Y., Modares, H., Wunsch, D.C., Yin, Y.: Leader-follower output synchronization of linear heterogeneous systems with active leader using reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2139–2153 (2018)
Article MathSciNet Google Scholar
Yang, Y.: Optimal containment control of unknown heterogeneous systems with active leaders. IEEE Trans. Control Syst. Technol. 27(3), 1228–1236 (2019)
Article Google Scholar
Yang, Y., Vamvoudakis, K.G., Ferraz, H., Modares, H.: Dynamic intermittent Q-learning-based model-free suboptimal co-design of L\(_2\)-stabilization. Int. J. Robust Nonlinear Control 29(9), 2673–2694 (2019)
Article Google Scholar
Yang, Y., Wunsch, D.C., Yin, Y.: Hamiltonian-driven adaptive dynamic programming for continuous nonlinear dynamical systems. IEEE Trans. Neural Netw. Learn. Syst. 28(8), 1929–1940 (2017)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Automation and Electrical Engineering, University of Macau, Avenida da Universidade, Taipa, Macau, China
Yongliang Yang
School of Automation and Electrical Engineering, University of Science and Technology Beijing, No. 30 Xueyuan Road, Beijing, 100083, China
Yixin Yin
Department of Electrical and Computer Engoneering, Missouri University of Science and Technology, 301 W. 16th St., Rolla, MO, 65409, USA
Donald C. Wunsch II

Authors

Yongliang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Donald C. Wunsch II
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongliang Yang .

Editor information

Editors and Affiliations

The Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Kyriakos G. Vamvoudakis
Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, USA
Yan Wan
Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, USA
Frank L. Lewis
Army Research Office, Durham, NC, USA
Derya Cansever

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yang, Y., Wunsch II, D.C., Yin, Y. (2021). Adaptive Dynamic Programming in the Hamiltonian-Driven Framework. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-60990-0_7
Published: 24 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60989-4
Online ISBN: 978-3-030-60990-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics