Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems

Pang, Bo; Bian, Tao; Jiang, Zhong-Ping

doi:10.1007/s11768-019-8168-8

Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems

Published: 25 January 2019

Volume 17, pages 73–84, (2019)
Cite this article

Control Theory and Technology Aims and scope Submit manuscript

Bo Pang¹,
Tao Bian² &
Zhong-Ping Jiang¹

775 Accesses
21 Citations
Explore all metrics

Abstract

This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discrete-time systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-varying discrete-time systems is presented. Its connections with existing infinite-horizon PI methods are discussed. Then, both data-driven off-policy PI and Value Iteration (VI) algorithms are derived to find approximate optimal controllers when the system dynamics is completely unknown. Under mild conditions, the proposed data-driven off-policy algorithms converge to the optimal solution. Finally, the effectiveness and feasibility of the developed methods are validated by a practical example of spacecraft attitude control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimum-Time Spacecraft Attitude Motion Planning Using Objective Alternation in Derivative-Free Optimization

Article Open access 09 March 2021

Finite Horizon Optimal Nonlinear Spacecraft Attitude Control

Article 22 August 2019

Nonlinear Output-Feedback H ∞ Control for Spacecraft Attitude Control

References

R. E. Bellman. Dynamic Programming. Princeton: Princeton University Press, 1957.
MATH Google Scholar
D. P. Bertsekas. Dynamic Programming and Optimal Control. 4th ed. Belmont: Athena Scientific, 2017.
MATH Google Scholar
D. Liberzon. Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton: Princeton University Press, 2011.
MATH Google Scholar
D. P. Bertsekas, J. N. Tsitsiklis. Neuro-Dynamic Programming. Belmont: Athena Scientific, 1996.
MATH Google Scholar
R. S. Sutton, A. G. Barto. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT Press, 2018.
Google Scholar
C. Szepesvari. Algorithms for Reinforcement Learning. San Franscisco: Morgan and Claypool Publishers, 2010.
Book MATH Google Scholar
Y. Jiang, Z. P. Jiang. Robust Adaptive Dynamic Programming. Hoboken: Wiley, 2017.
Book MATH Google Scholar
F. L. Lewis, D. Liu (editors). Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Hoboken: Wiley, 2013.
Google Scholar
B. Kiumarsi, K. G. Vamvoudakis, H. Modares, et al. Optimal and autonomous control using reinforcement learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(6): 2042–2062.
Article MathSciNet Google Scholar
W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Hoboken: Wiley, 2011.
Book MATH Google Scholar
D. Liu, Q. Wei, D. Wang, et al. Adaptive Dynamic Programming with Applications in Optimal Control. Berlin: Springer International Publishing, 2017.
Book MATH Google Scholar
R. Kamalapurkar, P. Walters, J. Rosenfeld, et al. Reinforcement Learning for Optimal Feedback Control: A Lyapunov Based Approach. Berlin: Springer International Publishing, 2018.
Book MATH Google Scholar
W. Gao, Z. P. Jiang. Adaptive dynamic programming and adaptive optimal output regulation of linear systems. IEEE Transactions on Automatic Control, 2016, 61(12): 4164–4169.
Article MathSciNet MATH Google Scholar
D. Vrabie, K. G. Vamvoudakis, F. L. Lewis. Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles. London: Institution of Engineering and Technology, 2013.
MATH Google Scholar
M. Huang, W. Gao, Z. P. Jiang. Connected cruise control with delayed feedback and disturbance: An adaptive dynamic programming approach. International Journal of Adaptive Control and Signal Processing, 2017: DOI https://doi.org/10.1002/acs.2834.
Google Scholar
T. Bian, Z. P. Jiang. Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica, 2016, 71: 348–360.
Article MathSciNet MATH Google Scholar
D. P. Bertsekas. Value and policy iterations in optimal control and adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3): 500–509.
Article MathSciNet Google Scholar
D. Kleinman, T. Fortmann, M. Athans. On the design of linear systems with piecewise-constant feedback gains. IEEE Transactions on Automatic Control, 1968, 13(4): 354–361.
Article Google Scholar
Q. M. Zhao, H. Xu, J. Sarangapani. Finite-horizon near optimal adaptive control of uncertain linear discrete-time systems. Optimal Control Applications and Methods, 2015, 36(6): 853–872.
Article MathSciNet MATH Google Scholar
C. X. Mu, D. Wang, H. B. He. Data-driven finite-horizon approximate optimal control for discrete-time nonlinear systems using iterative HDP approach. IEEE Transactions on Cybernetics, 2018, 48(10): 2948–2961.
Article Google Scholar
A. Heydari, S. N. Balakrishnan. Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(1): 145–157.
Article Google Scholar
R. Beard. Improving the Closed-loop Performance of Nonlinear Systems. Ph.D. dissertation. New York: Rensselaer Polytechnic Institute, 1995.
Google Scholar
T. Cheng, F. L. Lewis, M. Abu-Khalaf. A neural network solution for fixed-final time optimal control of nonlinear systems. Automatica, 2007, 43(3): 482–490.
Article MathSciNet MATH Google Scholar
Q. M. Zhao, H. Xu, S. Jagannathan. Neural network-based finitehorizon optimal control of uncertain affine nonlinear discretetime systems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(3): 486–499.
Article MathSciNet Google Scholar
P. Frihauf, M. Krstic, T. Basar. Finite-horizon LQ control for unknown discrete-time linear systems via extremum seeking. European Journal of Control, 2013, 19(5): 399–407.
Article MathSciNet MATH Google Scholar
S. J. Liu, M. Krstic, T. Basar. Batch-to-batch finite-horizon LQ control for unknown discrete-time linear systems via stochastic extremum seeking. IEEE Transactions on Automatic Control, 2017, 62(8): 4116–4123.
Article MathSciNet MATH Google Scholar
J. Fong, Y. Tan, V. Crocher, et al. Dual-loop iterative optimal control for the finite horizon LQR problem with unknown dynamics. Systems & Control Letters, 2018, 111: 49–57.
Article MathSciNet MATH Google Scholar
G. De Nicolao. On the time-varying Riccati difference equation of optimal filtering. SIAM Journal on Control and Optimization, 1992, 30(6): 1251–1269.
Article MathSciNet MATH Google Scholar
E. Emre, G. Knowles. A Newton-like approximation algorithm for the steady-state solution of the riccati equation for time-varying systems. Control Applications and Methods, 1987, 8(2): 191–197.
Article MathSciNet MATH Google Scholar
G. Hewer. An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Transactions on Automatic Control, 1971, 16(4): 382–384.
Article Google Scholar
D. Kleinman. On an iterative technique for Riccati equation computations. IEEE Transactions on Automatic Control, 1968, 13(1): 114–115.
Article Google Scholar
D. Kleinman. Suboptimal Design of Linear Regulator Systems Subject to Computer Storage Limitations. Ph.D. dissertation. Cambridge: Massachusetts Institute of Technology, 1967.
Google Scholar
P. Lancaster, L. Rodman. Algebraic Riccati Equations. Oxford: Oxford University Press, 1995.
MATH Google Scholar
S. J. Bradtke, B. E. Ydstie, A. G. Barto. Adaptive linear quadratic control using policy iteration. Proceedings of the American Control Conference, Baltimore: IEEE, 1994: 3475–3479.
Google Scholar
W. Gao, Y. Jiang, Z. P. Jiang, et al. Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica, 2016, 72: 37–45.
Article MathSciNet MATH Google Scholar
L. V. Kantorovich, G. P. Akilov. Functional Analysis in Normed Spaces. New York: Macmillan, 1964.
MATH Google Scholar
S. Bittanti, P. Colaneri, G. De Nicolao. The difference periodic Riccati equation for the periodic prediction problem. IEEE Transactions on Automatic Control, 1988, 33(8): 706–712.
Article MathSciNet MATH Google Scholar
Y. Yang. An efficient LQR design for discrete-time linear periodic system based on a novel lifting method. Automatica, 2018, 87: 383–388.
Article MathSciNet MATH Google Scholar
Y. Jiang, Z. P. Jiang. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48(10): 2699–2704.
Article MathSciNet MATH Google Scholar
R. Okano, T. Kida. Stability and stabilization of extending space structures. Transactions of the Society of Instrument and Control Engineers, 2002, 38(3): 284–292.
Article Google Scholar
A. Long, M. Richards, D. E. Hastings. On-orbit servicing: a new value proposition for satellite design and operation. Journal of Spacecraft and Rockets, 2007, 44(4): 964–976.
Article Google Scholar
L. Zhang, G. R. Duan. Robust poles assignment for a kind of second-order linear time-varying systems. Proceedings of the Chinese Control Conference, Hefei: IEEE, 2012: 2602–2606.
Google Scholar

Download references

Author information

Authors and Affiliations

Control and Networks (CAN) Lab, Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, Brooklyn, NY, 11201, USA
Bo Pang & Zhong-Ping Jiang
Bank of America Merrill Lynch, One Bryant Park, New York, NY, 10036, USA
Tao Bian

Authors

Bo Pang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Bian
View author publications
You can also search for this author in PubMed Google Scholar
Zhong-Ping Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Pang.

Additional information

The work of B. Pang and Z.-P. Jiang has been supported in part by the National Science Foundation (No. ECCS-1501044).

Bo PANG received the B.Sc. degree in Automation from the Beihang University, Beijing, China, in 2014, and the M.Sc. degree in Control Science and Engineering from Shanghai Jiao Tong University, Shanghai, China, in 2017. He is currently working toward the Ph.D. degree with the Control and Networks Lab, Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, Brooklyn, NY, U.S.A. His research interests include optimal control, approximate/adaptive dynamic programming. and reinforcement learning.

Tao BIAN received the B.Eng. degree in Automation from Huazhong University of Science and Technology, Wuhan, China, in 2012, and the M.Sc. and the Ph.D. degree in Electrical Engineering from Tandon School of Engineering, New York University, Brooklyn, NY, in 2014 and 2017, respectively. He is currently a quantitative finance analyst, assistant vice president, at Bank of America Merrill Lynch, One Bryant Park, New York. His research interests include reinforcement learning, control and optimization of stochastic systems.

Zhong-Ping JIANG received the B.Sc. degree in Mathematics from the University of Wuhan, Wuhan, China, in 1988, the M.Sc. degree in Statistics from the University of Paris XI, Paris, France, in 1989, and the Ph.D. degree in Automatic Control and Mathematics from the ´ Ecole des Mines de Paris, Paris, in 1993. He is currently a Professor of electrical and computer engineering with the Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, Brooklyn, NY, U.S.A. He was a named a Highly Cited Researcher by Web of Science (2018) and has coauthored Stability and Stabilization of Nonlinear Systems (Springer, 2011), Nonlinear Control of Dynamic Networks (Taylor & Francis, 2014), Robust Adaptive Dynamic Programming (Wiley-IEEE Press, 2017) and Nonlinear Control Under Information Constraints (Science Press, 2018). His current research interests include stability theory, robust/adaptive/distributed nonlinear control, adaptive dynamic programming, and their applications to information, mechanical, and biological systems. Dr. Jiang is an IEEE Fellow and an IFAC Fellow.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pang, B., Bian, T. & Jiang, ZP. Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems. Control Theory Technol. 17, 73–84 (2019). https://doi.org/10.1007/s11768-019-8168-8

Download citation

Received: 15 August 2018
Revised: 24 October 2018
Accepted: 26 October 2018
Published: 25 January 2019
Issue Date: February 2019
DOI: https://doi.org/10.1007/s11768-019-8168-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems

Abstract

Access this article

Similar content being viewed by others

Minimum-Time Spacecraft Attitude Motion Planning Using Objective Alternation in Derivative-Free Optimization

Finite Horizon Optimal Nonlinear Spacecraft Attitude Control

Nonlinear Output-Feedback H ∞ Control for Spacecraft Attitude Control

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems

Abstract

Access this article

Similar content being viewed by others

Minimum-Time Spacecraft Attitude Motion Planning Using Objective Alternation in Derivative-Free Optimization

Finite Horizon Optimal Nonlinear Spacecraft Attitude Control

Nonlinear Output-Feedback H ∞ Control for Spacecraft Attitude Control

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation