An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Zhang, Heng

doi:10.1007/s12190-023-01857-9

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Original Research
Published: 05 April 2023

Volume 69, pages 2741–2760, (2023)
Cite this article

Journal of Applied Mathematics and Computing Aims and scope Submit manuscript

Heng Zhang ORCID: orcid.org/0000-0003-2508-1137¹

363 Accesses
4 Citations
Explore all metrics

Abstract

This paper develops a novel adaptive dynamic programming (ADP)-based model-free policy iteration (PI) algorithm to solve an infinite-horizon continuous-time linear quadratic stochastic (LQS) optimal control problem, where the diffusion term in system dynamics contains both control and state variables. First, we apply Ito’s lemma and take expectations to describe a relationship among the state trajectory, the control input and the matrices to be solved. Then, without needing the information of all system coefficient matrices, the ADP-based model-free algorithm is developed to approximate the optimal control from the collected data. Moreover, we give the convergence analysis under some mild conditions. Finally, a numerical example and an illustrative application are served to show that the proposed algorithm is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

Local Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems

Optimal Control of Unknown Discrete-Time Linear Systems with Additive Noise

Article 19 April 2023

References

Ait Rami, M., Moore, J.B., Zhou, X.: Indefinite stochastic linear quadratic control and generilized differential Riccati equation. SIAM J. Control Optim. 40, 1296–1311 (2001)
Article MathSciNet MATH Google Scholar
Ait Rami, M., Zhou, X.: Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls. IEEE Trans. Autom. Control 45(6), 1131–1143 (2000)
Article MathSciNet MATH Google Scholar
Ait Rami, M., Zhou, X., Moore, J.B.: Well-posedness and attainability of indefinite stochastic linear quadratic control in infinite time horizon. Syst. Control Lett. 41(2), 123–133 (2000)
Article MathSciNet MATH Google Scholar
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica 43(3), 473–481 (2007)
Article MathSciNet MATH Google Scholar
Bian, T., Jiang, Y., Jiang, Z.P.: Adaptive dynamic programming for stochastic systems with state and control dependent noise. IEEE Trans. Autom. Control 61(12), 4170–4175 (2016)
Article MathSciNet MATH Google Scholar
Bian, T., Jiang, Z.P.: Stochastic adaptive dynamic programming for robust optimal control design, in Control of Complex Systems: Theory and Applications, K.G. Vamvoudakis and S. Jagannathan, eds. Butterworth-Heinemann, Cambridge, MA, pp. 211–245 (2016)
Bian, T., Jiang, Z.P.: Continuous-time robust dynamic programming. SIAM J. Control Optim. 57(6), 4150–4174 (2019)
Article MathSciNet MATH Google Scholar
Bradtke, S.J.: Reinforcement learning applied to linear quadratic regulation. Adv. Neural Inf. Process. Syst. 5, 295–302 (1993)
Google Scholar
Damm, T., Hinrichsen, D.: Newton’s method for a rational matrix equation occuring in stochastic control. Linear Algebra Appl. 332–334, 81–109 (2001)
Article MATH Google Scholar
Freiling, G., Hochhaus, A.: On a class of rational matrix differential equations arising in stochastic control. Linear Algebra Appl. 379(1–3), 43–68 (2004)
Article MathSciNet MATH Google Scholar
Freiling, G., Hochhaus, A.: Properties of the solutions of rational matrix difference equations. Comput. Math. Appl. 45(6), 1137–1154 (2003)
Article MathSciNet MATH Google Scholar
Ge, Y., Liu, X., Li, Y.: Optimal control for unknown mean-field discrete-time system based on Q-Learning. Int. J. Syst. Sci. 52(15), 1–15 (2021)
Article MathSciNet MATH Google Scholar
Ivanov, I.G.: Iterations for solving a rational Riccati equation arising in stochastic control. Comput. Math. Appl. 53(6), 977–988 (2007)
Article MathSciNet MATH Google Scholar
Jiang, Y., Jiang, Z.P.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10), 2699–2704 (2012)
Article MathSciNet MATH Google Scholar
Jin, T., Xia, H., Chen, H.: Optimal control problem of the uncertain second-order circuit based on first hitting criteria. Math. Method. Appl. Sci. 44(1), 882–900 (2021)
Article MathSciNet MATH Google Scholar
Jin, T., Xia, H., Deng, W., Li, Y., Chen, H.: Uncertain fractional-order multi-objective optimization based on reliability analysis and application to fractional-order circuit with caputo type. Circ. Syst. Signal Process. 40(12), 5955–5982 (2021)
Article MATH Google Scholar
Jin, T., Xia, H., Gao, S.: Reliability analysis of the uncertain fractional-order dynamic system with state constraint. Math. Method. Appl. Sci. 45(5), 2615–2637 (2022)
Article MathSciNet Google Scholar
Kiumarsi, B., Vamvoudakis, K.G., Modares, H., Lewis, F.L.: Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2042–2062 (2017)
Article MathSciNet Google Scholar
Kleinman, D.: Optimal stationary control of linear systems with control-dependent noise. IEEE Trans. Autom. Control. 14(6), 673–677 (1969)
Article MathSciNet Google Scholar
Kolmanovsky, V.B., Shaikhet, L.E.: Control of Systems with Aftereffect. Trans. Math. Monogr. (1996)
Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Wiley, New York, NY, USA (2013)
Google Scholar
Li, B., Xu, J., Jin, T., Shu, Y.: Piecewise parameterization for multifactor uncertain system and uncertain inventory-promotion optimization. Knowl-based Syst. 255, 109683 (2022)
Article Google Scholar
Li, B., Zhang, R., Sun, Y.: Multi-period portfolio selection based on uncertainty theory with bankruptcy control and liquidity. Automatica 147, 110751 (2023)
Article MathSciNet MATH Google Scholar
Li, M., Qin, J., Zheng, W., Wang, Y., Kang, Y.: Model-free design of stochastic LQR controller from a primal-dual optimization perspective. Automatica 140, 110253 (2022)
Article MathSciNet MATH Google Scholar
Liu, D., Wei, Q., Wang, D., Yang, X., Li, H.: Adaptive Dynamic Programming with Applications in Optimal Control. Springer, Cham, Switzerland (2017)
Book MATH Google Scholar
Liu, X., Ge, Y., Li, Y.: Stackelberg games for model-free continuous-time stochastic systems based on adaptive dynamic programming. Appl. Math. Comput. 363, 1–19 (2019)
Article MathSciNet MATH Google Scholar
Luo, B., Liu, D., Wu, H., Wang, D., Lewis, F.L.: Policy gradient adaptive dynamic programming for data-based optimal control. IEEE Trans. Cybern. 47(10), 3341–3354 (2017)
Article Google Scholar
Mukherjee, S., Bai, H., Chakrabortty, A.: Model-based and model-free designs for an extended continuous-time LQR with exogenous inputs. Syst. Control Lett. 154, 1–9 (2021)
Article MathSciNet MATH Google Scholar
Ni, Y., Fang, H.: Policy iteration algorithm for singular controlled diffusion processes. SIAM J. Control Optim. 51(5), 3844–3862 (2013)
Article MathSciNet MATH Google Scholar
Øksendal, B. (sixth ed.): Stochastic Differential Equations: An Introduction with Applications. Springer Berlin (2014)
Palanisamy, M., Modares, H., Lewis, F.L., Aurangzeb, M.: Continuous-time q-learning for infinite-horizon discounted cost linear quadratic regulator problems. IEEE Trans. Cybern. 45(2), 165–176 (2015)
Article Google Scholar
Pang, B., Bian, T., Jiang, Z.P.: Robust policy iteration for continuous-time linear quadratic regulation. IEEE Trans. Autom. Control 67(1), 504–511 (2022)
Article MathSciNet MATH Google Scholar
Pang, B., Jiang, Z.P.: Reinforcement learning for adaptive optimal stationary control of linear stochastic systems. IEEE Trans. Autom. Control, Early Access (2022)
Vamvoudakis, K.G.: Q-learning for continuous-time linear systems: a model-free infinite horizon optimal control approach. Syst. Control Lett. 100, 14–20 (2017)
Article MathSciNet MATH Google Scholar
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)
Article MathSciNet MATH Google Scholar
Wang, T., Zhang, H., Luo, Y.: Infinite-time stochastic linear quadratic optimal control for unknown discrete-time systems using adaptive dynamic programming approach. Neurocomputing 171, 379–386 (2016)
Article Google Scholar
Wang, T., Zhang, H., Luo, Y.: Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm. Neurocomputing 312, 1–8 (2018)
Article Google Scholar
Wang, Y., Ni, Y., Chen, Z., Zhang, J.: Probabilistic Framework of Howard’s Policy Iteration: BML Evaluation and Robust Convergence Analysis. arXiv:2210.07473
Wei, Q., Liu, D., Lin, H.: Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans. Cybern. 46(3), 840–853 (2016)
Article Google Scholar
Werbos, P.J.: Beyond regression: new tools for prediction and analysis in the behavioural sciences. Ph.D. Thesis, Harvard University (1974)
Wonham, W.M.: On a matrix Riccati equation of stochastic control. SIAM J. Control 6(4), 681–697 (1968)
Article MathSciNet MATH Google Scholar
Xie, K., Yu, X., Lan, W.: Optimal output regulation for unknown continuous-time linear systems by internal model and adaptive dynamic programming. Automatica 146, 110564 (2022)
Article MathSciNet MATH Google Scholar
Xu, H., Jagannathan, S., Lewis, F.L.: Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses. Automatica 48(6), 1017–1030 (2012)
Article MathSciNet MATH Google Scholar
Zhang, W.: Study on Algebraic Riccati Equation Arising from Infinite Horizon Stochastic LQ Optimal Control. Ph.D. Thesis, Zhejiang University (1998)
Zhou, X., Li, D.: Continuous-time mean-variance portfolio selection: a stochastic LQ framework. Appl. Math. Optim. 42(1), 19–33 (2000)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The author thanks the reviewers for their insightful suggestions, which greatly improved the quality of this work. The author also appreciates the time and efforts of Professor Guangchen Wang for giving many valuable suggestions and carefully revising the contents of this paper.

Funding

The author acknowledges the financial support from the National Key R &D Program of China under Grant No. 2022YFA1006103, the NSFC under Grant Nos. 61821004, 61925306, 11831010, and the NSF of Shandong Province under Grant Nos. ZR2019ZD42, ZR2020ZD24.

Author information

Authors and Affiliations

School of Control Science and Engineering, Shandong University, Jingshi Road, No. 17923, Jinan, 250061, Shandong Province, China
Heng Zhang

Authors

Heng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heng Zhang.

Ethics declarations

Conflict of interest

The author declares that he has no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, H. An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems. J. Appl. Math. Comput. 69, 2741–2760 (2023). https://doi.org/10.1007/s12190-023-01857-9

Download citation

Received: 27 October 2022
Revised: 23 March 2023
Accepted: 25 March 2023
Published: 05 April 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s12190-023-01857-9

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Abstract

Access this article

Similar content being viewed by others

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

Local Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems

Optimal Control of Unknown Discrete-Time Linear Systems with Additive Noise

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Abstract

Access this article

Similar content being viewed by others

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

Local Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems

Optimal Control of Unknown Discrete-Time Linear Systems with Additive Noise

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation