Skip to main content
Log in

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

  • Original Research
  • Published:
Journal of Applied Mathematics and Computing Aims and scope Submit manuscript

Abstract

This paper develops a novel adaptive dynamic programming (ADP)-based model-free policy iteration (PI) algorithm to solve an infinite-horizon continuous-time linear quadratic stochastic (LQS) optimal control problem, where the diffusion term in system dynamics contains both control and state variables. First, we apply Ito’s lemma and take expectations to describe a relationship among the state trajectory, the control input and the matrices to be solved. Then, without needing the information of all system coefficient matrices, the ADP-based model-free algorithm is developed to approximate the optimal control from the collected data. Moreover, we give the convergence analysis under some mild conditions. Finally, a numerical example and an illustrative application are served to show that the proposed algorithm is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ait Rami, M., Moore, J.B., Zhou, X.: Indefinite stochastic linear quadratic control and generilized differential Riccati equation. SIAM J. Control Optim. 40, 1296–1311 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  2. Ait Rami, M., Zhou, X.: Linear matrix inequalities, Riccati equations, and indefinite stochastic linear quadratic controls. IEEE Trans. Autom. Control 45(6), 1131–1143 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  3. Ait Rami, M., Zhou, X., Moore, J.B.: Well-posedness and attainability of indefinite stochastic linear quadratic control in infinite time horizon. Syst. Control Lett. 41(2), 123–133 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  4. Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica 43(3), 473–481 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bian, T., Jiang, Y., Jiang, Z.P.: Adaptive dynamic programming for stochastic systems with state and control dependent noise. IEEE Trans. Autom. Control 61(12), 4170–4175 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bian, T., Jiang, Z.P.: Stochastic adaptive dynamic programming for robust optimal control design, in Control of Complex Systems: Theory and Applications, K.G. Vamvoudakis and S. Jagannathan, eds. Butterworth-Heinemann, Cambridge, MA, pp. 211–245 (2016)

  7. Bian, T., Jiang, Z.P.: Continuous-time robust dynamic programming. SIAM J. Control Optim. 57(6), 4150–4174 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bradtke, S.J.: Reinforcement learning applied to linear quadratic regulation. Adv. Neural Inf. Process. Syst. 5, 295–302 (1993)

    Google Scholar 

  9. Damm, T., Hinrichsen, D.: Newton’s method for a rational matrix equation occuring in stochastic control. Linear Algebra Appl. 332–334, 81–109 (2001)

    Article  MATH  Google Scholar 

  10. Freiling, G., Hochhaus, A.: On a class of rational matrix differential equations arising in stochastic control. Linear Algebra Appl. 379(1–3), 43–68 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  11. Freiling, G., Hochhaus, A.: Properties of the solutions of rational matrix difference equations. Comput. Math. Appl. 45(6), 1137–1154 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  12. Ge, Y., Liu, X., Li, Y.: Optimal control for unknown mean-field discrete-time system based on Q-Learning. Int. J. Syst. Sci. 52(15), 1–15 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  13. Ivanov, I.G.: Iterations for solving a rational Riccati equation arising in stochastic control. Comput. Math. Appl. 53(6), 977–988 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  14. Jiang, Y., Jiang, Z.P.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10), 2699–2704 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  15. Jin, T., Xia, H., Chen, H.: Optimal control problem of the uncertain second-order circuit based on first hitting criteria. Math. Method. Appl. Sci. 44(1), 882–900 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  16. Jin, T., Xia, H., Deng, W., Li, Y., Chen, H.: Uncertain fractional-order multi-objective optimization based on reliability analysis and application to fractional-order circuit with caputo type. Circ. Syst. Signal Process. 40(12), 5955–5982 (2021)

    Article  MATH  Google Scholar 

  17. Jin, T., Xia, H., Gao, S.: Reliability analysis of the uncertain fractional-order dynamic system with state constraint. Math. Method. Appl. Sci. 45(5), 2615–2637 (2022)

    Article  MathSciNet  Google Scholar 

  18. Kiumarsi, B., Vamvoudakis, K.G., Modares, H., Lewis, F.L.: Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2042–2062 (2017)

    Article  MathSciNet  Google Scholar 

  19. Kleinman, D.: Optimal stationary control of linear systems with control-dependent noise. IEEE Trans. Autom. Control. 14(6), 673–677 (1969)

    Article  MathSciNet  Google Scholar 

  20. Kolmanovsky, V.B., Shaikhet, L.E.: Control of Systems with Aftereffect. Trans. Math. Monogr. (1996)

  21. Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. Wiley, New York, NY, USA (2013)

    Google Scholar 

  22. Li, B., Xu, J., Jin, T., Shu, Y.: Piecewise parameterization for multifactor uncertain system and uncertain inventory-promotion optimization. Knowl-based Syst. 255, 109683 (2022)

    Article  Google Scholar 

  23. Li, B., Zhang, R., Sun, Y.: Multi-period portfolio selection based on uncertainty theory with bankruptcy control and liquidity. Automatica 147, 110751 (2023)

    Article  MathSciNet  MATH  Google Scholar 

  24. Li, M., Qin, J., Zheng, W., Wang, Y., Kang, Y.: Model-free design of stochastic LQR controller from a primal-dual optimization perspective. Automatica 140, 110253 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  25. Liu, D., Wei, Q., Wang, D., Yang, X., Li, H.: Adaptive Dynamic Programming with Applications in Optimal Control. Springer, Cham, Switzerland (2017)

    Book  MATH  Google Scholar 

  26. Liu, X., Ge, Y., Li, Y.: Stackelberg games for model-free continuous-time stochastic systems based on adaptive dynamic programming. Appl. Math. Comput. 363, 1–19 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  27. Luo, B., Liu, D., Wu, H., Wang, D., Lewis, F.L.: Policy gradient adaptive dynamic programming for data-based optimal control. IEEE Trans. Cybern. 47(10), 3341–3354 (2017)

    Article  Google Scholar 

  28. Mukherjee, S., Bai, H., Chakrabortty, A.: Model-based and model-free designs for an extended continuous-time LQR with exogenous inputs. Syst. Control Lett. 154, 1–9 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  29. Ni, Y., Fang, H.: Policy iteration algorithm for singular controlled diffusion processes. SIAM J. Control Optim. 51(5), 3844–3862 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  30. Øksendal, B. (sixth ed.): Stochastic Differential Equations: An Introduction with Applications. Springer Berlin (2014)

  31. Palanisamy, M., Modares, H., Lewis, F.L., Aurangzeb, M.: Continuous-time q-learning for infinite-horizon discounted cost linear quadratic regulator problems. IEEE Trans. Cybern. 45(2), 165–176 (2015)

    Article  Google Scholar 

  32. Pang, B., Bian, T., Jiang, Z.P.: Robust policy iteration for continuous-time linear quadratic regulation. IEEE Trans. Autom. Control 67(1), 504–511 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  33. Pang, B., Jiang, Z.P.: Reinforcement learning for adaptive optimal stationary control of linear stochastic systems. IEEE Trans. Autom. Control, Early Access (2022)

  34. Vamvoudakis, K.G.: Q-learning for continuous-time linear systems: a model-free infinite horizon optimal control approach. Syst. Control Lett. 100, 14–20 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  35. Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  36. Wang, T., Zhang, H., Luo, Y.: Infinite-time stochastic linear quadratic optimal control for unknown discrete-time systems using adaptive dynamic programming approach. Neurocomputing 171, 379–386 (2016)

    Article  Google Scholar 

  37. Wang, T., Zhang, H., Luo, Y.: Stochastic linear quadratic optimal control for model-free discrete-time systems based on Q-learning algorithm. Neurocomputing 312, 1–8 (2018)

    Article  Google Scholar 

  38. Wang, Y., Ni, Y., Chen, Z., Zhang, J.: Probabilistic Framework of Howard’s Policy Iteration: BML Evaluation and Robust Convergence Analysis. arXiv:2210.07473

  39. Wei, Q., Liu, D., Lin, H.: Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans. Cybern. 46(3), 840–853 (2016)

    Article  Google Scholar 

  40. Werbos, P.J.: Beyond regression: new tools for prediction and analysis in the behavioural sciences. Ph.D. Thesis, Harvard University (1974)

  41. Wonham, W.M.: On a matrix Riccati equation of stochastic control. SIAM J. Control 6(4), 681–697 (1968)

    Article  MathSciNet  MATH  Google Scholar 

  42. Xie, K., Yu, X., Lan, W.: Optimal output regulation for unknown continuous-time linear systems by internal model and adaptive dynamic programming. Automatica 146, 110564 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  43. Xu, H., Jagannathan, S., Lewis, F.L.: Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses. Automatica 48(6), 1017–1030 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  44. Zhang, W.: Study on Algebraic Riccati Equation Arising from Infinite Horizon Stochastic LQ Optimal Control. Ph.D. Thesis, Zhejiang University (1998)

  45. Zhou, X., Li, D.: Continuous-time mean-variance portfolio selection: a stochastic LQ framework. Appl. Math. Optim. 42(1), 19–33 (2000)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The author thanks the reviewers for their insightful suggestions, which greatly improved the quality of this work. The author also appreciates the time and efforts of Professor Guangchen Wang for giving many valuable suggestions and carefully revising the contents of this paper.

Funding

The author acknowledges the financial support from the National Key R &D Program of China under Grant No. 2022YFA1006103, the NSFC under Grant Nos. 61821004, 61925306, 11831010, and the NSF of Shandong Province under Grant Nos. ZR2019ZD42, ZR2020ZD24.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Zhang.

Ethics declarations

Conflict of interest

The author declares that he has no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H. An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems. J. Appl. Math. Comput. 69, 2741–2760 (2023). https://doi.org/10.1007/s12190-023-01857-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12190-023-01857-9

Keywords

Mathematics Subject Classification

Navigation