Abstract
This paper develops a novel adaptive dynamic programming (ADP)-based model-free policy iteration (PI) algorithm to solve an infinite-horizon continuous-time linear quadratic stochastic (LQS) optimal control problem, where the diffusion term in system dynamics contains both control and state variables. First, we apply Ito’s lemma and take expectations to describe a relationship among the state trajectory, the control input and the matrices to be solved. Then, without needing the information of all system coefficient matrices, the ADP-based model-free algorithm is developed to approximate the optimal control from the collected data. Moreover, we give the convergence analysis under some mild conditions. Finally, a numerical example and an illustrative application are served to show that the proposed algorithm is effective.
Similar content being viewed by others
References
Ait Rami, M., Moore, J.B., Zhou, X.: Indefinite stochastic linear quadratic control and generilized differential Riccati equation. SIAM J. Control Optim. 40, 1296–1311 (2001)
Ait Rami, M., Zhou, X.: Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls. IEEE Trans. Autom. Control 45(6), 1131–1143 (2000)
Ait Rami, M., Zhou, X., Moore, J.B.: Well-posedness and attainability of indefinite stochastic linear quadratic control in infinite time horizon. Syst. Control Lett. 41(2), 123–133 (2000)
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica 43(3), 473–481 (2007)
Bian, T., Jiang, Y., Jiang, Z.P.: Adaptive dynamic programming for stochastic systems with state and control dependent noise. IEEE Trans. Autom. Control 61(12), 4170–4175 (2016)
Bian, T., Jiang, Z.P.: Stochastic adaptive dynamic programming for robust optimal control design, in Control of Complex Systems: Theory and Applications, K.G. Vamvoudakis and S. Jagannathan, eds. Butterworth-Heinemann, Cambridge, MA, pp. 211–245 (2016)
Bian, T., Jiang, Z.P.: Continuous-time robust dynamic programming. SIAM J. Control Optim. 57(6), 4150–4174 (2019)
Bradtke, S.J.: Reinforcement learning applied to linear quadratic regulation. Adv. Neural Inf. Process. Syst. 5, 295–302 (1993)
Damm, T., Hinrichsen, D.: Newton’s method for a rational matrix equation occuring in stochastic control. Linear Algebra Appl. 332–334, 81–109 (2001)
Freiling, G., Hochhaus, A.: On a class of rational matrix differential equations arising in stochastic control. Linear Algebra Appl. 379(1–3), 43–68 (2004)
Freiling, G., Hochhaus, A.: Properties of the solutions of rational matrix difference equations. Comput. Math. Appl. 45(6), 1137–1154 (2003)
Ge, Y., Liu, X., Li, Y.: Optimal control for unknown mean-field discrete-time system based on Q-Learning. Int. J. Syst. Sci. 52(15), 1–15 (2021)
Ivanov, I.G.: Iterations for solving a rational Riccati equation arising in stochastic control. Comput. Math. Appl. 53(6), 977–988 (2007)
Jiang, Y., Jiang, Z.P.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10), 2699–2704 (2012)
Jin, T., Xia, H., Chen, H.: Optimal control problem of the uncertain second-order circuit based on first hitting criteria. Math. Method. Appl. Sci. 44(1), 882–900 (2021)
Jin, T., Xia, H., Deng, W., Li, Y., Chen, H.: Uncertain fractional-order multi-objective optimization based on reliability analysis and application to fractional-order circuit with caputo type. Circ. Syst. Signal Process. 40(12), 5955–5982 (2021)
Jin, T., Xia, H., Gao, S.: Reliability analysis of the uncertain fractional-order dynamic system with state constraint. Math. Method. Appl. Sci. 45(5), 2615–2637 (2022)
Kiumarsi, B., Vamvoudakis, K.G., Modares, H., Lewis, F.L.: Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2042–2062 (2017)
Kleinman, D.: Optimal stationary control of linear systems with control-dependent noise. IEEE Trans. Autom. Control. 14(6), 673–677 (1969)
Kolmanovsky, V.B., Shaikhet, L.E.: Control of Systems with Aftereffect. Trans. Math. Monogr. (1996)
Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Wiley, New York, NY, USA (2013)
Li, B., Xu, J., Jin, T., Shu, Y.: Piecewise parameterization for multifactor uncertain system and uncertain inventory-promotion optimization. Knowl-based Syst. 255, 109683 (2022)
Li, B., Zhang, R., Sun, Y.: Multi-period portfolio selection based on uncertainty theory with bankruptcy control and liquidity. Automatica 147, 110751 (2023)
Li, M., Qin, J., Zheng, W., Wang, Y., Kang, Y.: Model-free design of stochastic LQR controller from a primal-dual optimization perspective. Automatica 140, 110253 (2022)
Liu, D., Wei, Q., Wang, D., Yang, X., Li, H.: Adaptive Dynamic Programming with Applications in Optimal Control. Springer, Cham, Switzerland (2017)
Liu, X., Ge, Y., Li, Y.: Stackelberg games for model-free continuous-time stochastic systems based on adaptive dynamic programming. Appl. Math. Comput. 363, 1–19 (2019)
Luo, B., Liu, D., Wu, H., Wang, D., Lewis, F.L.: Policy gradient adaptive dynamic programming for data-based optimal control. IEEE Trans. Cybern. 47(10), 3341–3354 (2017)
Mukherjee, S., Bai, H., Chakrabortty, A.: Model-based and model-free designs for an extended continuous-time LQR with exogenous inputs. Syst. Control Lett. 154, 1–9 (2021)
Ni, Y., Fang, H.: Policy iteration algorithm for singular controlled diffusion processes. SIAM J. Control Optim. 51(5), 3844–3862 (2013)
Øksendal, B. (sixth ed.): Stochastic Differential Equations: An Introduction with Applications. Springer Berlin (2014)
Palanisamy, M., Modares, H., Lewis, F.L., Aurangzeb, M.: Continuous-time q-learning for infinite-horizon discounted cost linear quadratic regulator problems. IEEE Trans. Cybern. 45(2), 165–176 (2015)
Pang, B., Bian, T., Jiang, Z.P.: Robust policy iteration for continuous-time linear quadratic regulation. IEEE Trans. Autom. Control 67(1), 504–511 (2022)
Pang, B., Jiang, Z.P.: Reinforcement learning for adaptive optimal stationary control of linear stochastic systems. IEEE Trans. Autom. Control, Early Access (2022)
Vamvoudakis, K.G.: Q-learning for continuous-time linear systems: a model-free infinite horizon optimal control approach. Syst. Control Lett. 100, 14–20 (2017)
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)
Wang, T., Zhang, H., Luo, Y.: Infinite-time stochastic linear quadratic optimal control for unknown discrete-time systems using adaptive dynamic programming approach. Neurocomputing 171, 379–386 (2016)
Wang, T., Zhang, H., Luo, Y.: Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm. Neurocomputing 312, 1–8 (2018)
Wang, Y., Ni, Y., Chen, Z., Zhang, J.: Probabilistic Framework of Howard’s Policy Iteration: BML Evaluation and Robust Convergence Analysis. arXiv:2210.07473
Wei, Q., Liu, D., Lin, H.: Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans. Cybern. 46(3), 840–853 (2016)
Werbos, P.J.: Beyond regression: new tools for prediction and analysis in the behavioural sciences. Ph.D. Thesis, Harvard University (1974)
Wonham, W.M.: On a matrix Riccati equation of stochastic control. SIAM J. Control 6(4), 681–697 (1968)
Xie, K., Yu, X., Lan, W.: Optimal output regulation for unknown continuous-time linear systems by internal model and adaptive dynamic programming. Automatica 146, 110564 (2022)
Xu, H., Jagannathan, S., Lewis, F.L.: Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses. Automatica 48(6), 1017–1030 (2012)
Zhang, W.: Study on Algebraic Riccati Equation Arising from Infinite Horizon Stochastic LQ Optimal Control. Ph.D. Thesis, Zhejiang University (1998)
Zhou, X., Li, D.: Continuous-time mean-variance portfolio selection: a stochastic LQ framework. Appl. Math. Optim. 42(1), 19–33 (2000)
Acknowledgements
The author thanks the reviewers for their insightful suggestions, which greatly improved the quality of this work. The author also appreciates the time and efforts of Professor Guangchen Wang for giving many valuable suggestions and carefully revising the contents of this paper.
Funding
The author acknowledges the financial support from the National Key R &D Program of China under Grant No. 2022YFA1006103, the NSFC under Grant Nos. 61821004, 61925306, 11831010, and the NSF of Shandong Province under Grant Nos. ZR2019ZD42, ZR2020ZD24.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, H. An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems. J. Appl. Math. Comput. 69, 2741–2760 (2023). https://doi.org/10.1007/s12190-023-01857-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12190-023-01857-9