A Lyapunov characterization of robust policy optimization

Cui, Leilei; Jiang, Zhong-Ping

doi:10.1007/s11768-023-00163-w

A Lyapunov characterization of robust policy optimization

Research Article
Published: 22 September 2023

Volume 21, pages 374–389, (2023)
Cite this article

Control Theory and Technology Aims and scope Submit manuscript

Leilei Cui¹ &
Zhong-Ping Jiang¹

493 Accesses
Explore all metrics

Abstract

In this paper, we study the robustness property of policy optimization (particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning) subject to noise at each iteration. By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method, it is shown that, if the noise is sufficiently small, the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration. Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided. Based on Willems’ fundamental lemma, a learning-based policy iteration algorithm is proposed. The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal. The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration. Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of the optimization landscape of Linear Quadratic Gaussian (LQG) control

Article 20 March 2023

Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

Article 24 August 2021

Data availability

The data that support the findings of this study are available from the corresponding author, L. Cui, upon reasonable request.

References

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.
MATH Google Scholar
Fazel, M., Ge, R., Kakade, S., & Mesbahi, M. (2018). Global convergence of policy gradient methods for the linear quadratic regulator. In Proceedings of the 35th international conference on machine learning. vol. 80, pp. 1467–1476.
Bu, J., Mesbahi, A., Fazel, M., & Mesbahi, M. (2019). LQR through the lens of first order methods: Discrete-time case. arXiv:1907.08921 (arXiv e-preprint).
Hu, B., Zhang, K., Li, N., Mesbahi, M., Fazel, M., & Başar, T. (2022). Towards a theoretical foundation of policy optimization for learning control policies. Annual Review of Control, Robotics, and Autonomous Systems, 6(1), 123–158. https://doi.org/10.1146/annurev-control-042920-020021
Article Google Scholar
Mohammadi, H., Zare, A., Soltanolkotabi, M., & Jovanovic, M. R. (2022). Convergence and sample complexity of gradient methods for the model-free linear-quadratic regulator problem. IEEE Transactions on Automatic Control, 67(5), 2435–2450.
Article MathSciNet MATH Google Scholar
Kleinman, D. (1968). On an iterative technique for Riccati equation computations. IEEE Transactions on Automatic Control, 13(1), 114–115. https://doi.org/10.1109/TAC.1968.1098829
Article Google Scholar
Hewer, G. (1971). An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Transactions on Automatic Control, 16(4), 382–384. https://doi.org/10.1109/TAC.1971.1099755
Article Google Scholar
Bertsekas, D. P. (1995). Dynamic programming and optimal control (Vol. 2). Athena Scientific.
MATH Google Scholar
Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Athena Scientific.
MATH Google Scholar
Jiang, Y., & Jiang, Z. P. (2012). Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 48(10), 2699–2704. https://doi.org/10.1016/j.automatica.2012.06.096
Article MathSciNet MATH Google Scholar
Jiang, Y., & Jiang, Z. P. (2015). Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Transactions on Automatic Control, 60(11), 2917–2929. https://doi.org/10.1109/TAC.2015.2414811
Article MathSciNet MATH Google Scholar
Cui, L., Pang, B., & Jiang, Z. P. (2023). Learning-based adaptive optimal control of linear time-delay systems: A policy iteration approach. IEEE Transactions on Automatic Control. https://doi.org/10.1109/TAC.2023.3273786
Article Google Scholar
Pang, B., Jiang, Z. P., & Mareels, I. (2020). Reinforcement learning for adaptive optimal control of continuous-time linear periodic systems. Automatica, 118, 109035. https://doi.org/10.1016/j.automatica.2020.109035
Article MathSciNet MATH Google Scholar
Gao, W., & Jiang, Z. P. (2016). Adaptive dynamic programming and adaptive optimal output regulation of linear systems. IEEE Transactions on Automatic Control, 61(12), 4164–4169. https://doi.org/10.1109/TAC.2016.2548662
Article MathSciNet MATH Google Scholar
Pang, B., Cui, L., & Jiang, Z. P. (2022). Human motor learning is robust to control-dependent noise. Biological Cybernetics, 116(12), 307–325.
Article MATH Google Scholar
Liu, T., Cui, L., Pang, B., & Jiang, Z. P. (2021). Data-driven adaptive optimal control of mixed-traffic connected vehicles in a ring road. In 2021 60th IEEE conference on decision and control (CDC), pp. 77–82. https://doi.org/10.1109/CDC45484.2021.9683024.
Cui, L., Ozbay, K., & Jiang, Z. P. (2021) Combined longitudinal and lateral control of autonomous vehicles based on reinforcement learning. In: 2021 American control conference (ACC), pp. 1929–1934. https://doi.org/10.23919/ACC50511.2021.9483388.
Ljung, L. (1998). System identification (pp. 163–173). Birkhäuser. https://doi.org/10.1007/978-1-4612-1768-8_11
Book Google Scholar
Jiang, Z. P., Bian, T., & Gao, W. (2020). Learning-based control: A tutorial and some recent results. Foundations and Trends in Systems and Control, 8(3), 176–284. https://doi.org/10.1561/2600000023
Article Google Scholar
Cui, L., Başar, T., & Jiang, Z. P. (2022). A reinforcement learning look risk-sensitive linear quadratic Gaussian control. In 5th Annual Learning for Dynamics and Control Conference, pp. 534–546.
Pang, B., & Jiang, Z. P. (2023). Reinforcement learning for adaptive optimal stationary control of linear stochastic systems. IEEE Transactions on Automatic Control, 68(4), 2383–2390. https://doi.org/10.1109/TAC.2022.3172250
Article MathSciNet MATH Google Scholar
Cui, L., Wang, S., Zhang, J., Zhang, D., Lai, J., Zheng, Y., Zhang, Z., & Jiang, Z. P. (2021). Learning-based balance control of wheel-legged robots. IEEE Robotics and Automation Letters, 6(4), 7667–7674. https://doi.org/10.1109/LRA.2021.3100269
Article Google Scholar
Sontag, E. (2008). Input to state stability: Basic concepts and results. Lecture notes in mathematics (pp. 163–220). Springer.
MATH Google Scholar
Pang, B., & Jiang, Z. P. (2021). Robust reinforcement learning: A case study in linear quadratic regulation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 9303–9311.
Article Google Scholar
Pang, B., Bian, T., & Jiang, Z. P. (2022). Robust policy iteration for continuous-time linear quadratic regulation. IEEE Transactions on Automatic Control, 67(1), 504–511. https://doi.org/10.1109/TAC.2021.3085510
Article MathSciNet MATH Google Scholar
Chen, B. M. (2013). In J. Baillieul & T. Samad (Eds.), H2 optimal control. Springer. https://doi.org/10.1007/978-1-4471-5102-9_204-1
Chen, C.-T. (1999). Linear system theory and design. Oxford University Press.
Mori, T. (1988). Comments on “A matrix inequality associated with bounds on solutions of algebraic Riccati and Lyapunov equation’’ by J. M. Saniuk and I.B. Rhodes. IEEE Transactions on Automatic Control, 33(11), 1088. https://doi.org/10.1109/9.14428
Article MATH Google Scholar
Hespanha, J. P. (2018). Linear systems theory. Princeton University Press.
Zhou, K., Doyle, J. C., & Glover, K. (1996). Robust and optimal control. Prentice Hall.
MATH Google Scholar
Willems, J. C., Rapisarda, P., Markovsky, I., & De Moor, B. L. M. (2005). A note on persistency of excitation. Systems and Control Letters, 54(4), 325–329. https://doi.org/10.1016/j.sysconle.2004.09.003
Article MathSciNet MATH Google Scholar
Gahinet, P. M., Laub, A. J., Kenney, C. S., & Hewer, G. A. (1990). Sensitivity of the stable discrete-time Lyapunov equation. IEEE Transactions on Automatic Control, 35(11), 1209–1217. https://doi.org/10.1109/9.59806
Article MathSciNet MATH Google Scholar
Anderson, C. W. (1989). Learning to control an inverted pendulum using neural networks. IEEE Control Systems Magazine, 9(3), 31–37. https://doi.org/10.1109/37.24809

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, New York University, Brooklyn, NY, 11201, USA
Leilei Cui & Zhong-Ping Jiang

Authors

Leilei Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zhong-Ping Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leilei Cui.

Additional information

This work was supported in part by the National Science Foundation (Nos. ECCS-2210320, CNS-2148304).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cui, L., Jiang, ZP. A Lyapunov characterization of robust policy optimization. Control Theory Technol. 21, 374–389 (2023). https://doi.org/10.1007/s11768-023-00163-w

Download citation

Received: 16 May 2023
Revised: 26 June 2023
Accepted: 27 June 2023
Published: 22 September 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s11768-023-00163-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Lyapunov characterization of robust policy optimization

Abstract

Access this article

Similar content being viewed by others

Analysis of the optimization landscape of Linear Quadratic Gaussian (LQG) control

Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Lyapunov characterization of robust policy optimization

Abstract

Access this article

Similar content being viewed by others

Analysis of the optimization landscape of Linear Quadratic Gaussian (LQG) control

Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise

Adaptive optimal control of unknown discrete-time linear systems with guaranteed prescribed degree of stability using reinforcement learning

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation