Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

Rao, Jun; Wang, Jingcheng; Xu, Jiahui; Zhao, Shangwei

doi:10.1007/s11071-023-08909-6

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

Original Paper
Published: 29 September 2023

Volume 111, pages 20041–20053, (2023)
Cite this article

Nonlinear Dynamics Aims and scope Submit manuscript

Jun Rao¹,
Jingcheng Wang ORCID: orcid.org/0000-0002-4277-1263¹,
Jiahui Xu¹ &
…
Shangwei Zhao¹

320 Accesses
2 Citations
Explore all metrics

Abstract

Optimal control of nonlinear systems by using adaptive dynamic programming (ADP) methods is always a hot topic in recent years. However, unknown nonlinear systems with limited data and the infinite horizon of optimal control performance problems make many proposed methods no longer efficient or applicable. To solve the above issues, a novel model-free T-DPG(\(\lambda \)) with ET method has been presented for a class of affine discrete-time nonlinear systems. By utilizing eligibility traces, the new method can expand the information of limited data and guides the control faster in the optimal direction. The finite terms, rather than infinite terms, of the optimal performance, are used to solve the infinite-horizon optimal control problems. Considering the unknown dynamic systems, a model-free algorithm sampling sequences only obtaining the control signal and state transition process is proposed. Furthermore, the convergence and boundedness of the algorithm are roughly proved. With a neural network based actor-critic architecture, the optimal policy is well-approximated by actor networks. Finally, the effectiveness of the proposed algorithm is demonstrated by two simulation examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Article 05 April 2023

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Article 17 August 2022

Finite Horizon Robust Optimal Tracking Control Based on Approximate Dynamic Programming for Switched Systems with Uncertainties

Article 02 April 2022

Data availability

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.

References

Papachristos, S.G.: Adaptive Dynamic Programming in Inventory Control. The University of Manchester, Manchester (1977)
MATH Google Scholar
Liu, D., Xue, S., Zhao, B., Luo, B., Wei, Q.: Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans. Syst. Man Cyber. Syst. 51(1), 142–160 (2020)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
MATH Google Scholar
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circ. Syst. Mag. 9(3), 32–50 (2009)
Article Google Scholar
Kiumarsi, B., Vamvoudakis, K.G., Modares, H., Lewis, F.L.: Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans. Neural Networks Learn. Syst. 29(6), 2042–2062 (2017)
Article MathSciNet Google Scholar
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)
Article MathSciNet MATH Google Scholar
Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 38(4), 937–942 (2008)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
Article Google Scholar
Cui, R., Yang, C., Li, Y., Sharma, S.: Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning. IEEE Trans. Syst. Man Cyber. Syst. 47(6), 1019–1029 (2017)
Article Google Scholar
Lu, J., Wei, Q., Wang, F.-Y.: Parallel control for optimal tracking via adaptive dynamic programming. IEEE/CAA J. Automatica Sinica 7(6), 1662–1674 (2020)
Article MathSciNet Google Scholar
Wei, Q., Song, R., Liao, Z., Li, B., Lewis, F.L.: Discrete-time impulsive adaptive dynamic programming. IEEE Trans. Cyber. 50(10), 4293–4306 (2019)
Article Google Scholar
Lin, H., Wei, Q., Liu, D.: Online identifier-actor-critic algorithm for optimal control of nonlinear systems. Opt. Control Appl. Methods 38(3), 317–335 (2017)
Article MathSciNet MATH Google Scholar
Wang, W., Chen, X.: Model-free optimal containment control of multi-agent systems based on actor-critic framework. Neurocomputing 314, 242–250 (2018)
Article Google Scholar
Zhang, H., Wang, H., Niu, B., Zhang, L., Ahmad, A.M.: Sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with average dwell time. Inf. Sci. 580, 756–774 (2021)
Article MathSciNet Google Scholar
Cao, X., Zhang, C., Zhao, D., Li, Y.: Guaranteed cost positive consensus for multi-agent systems with multiple time-varying delays and MDADT switching. Nonlinear Dyn. 107(4), 3557–3572 (2022)
Article Google Scholar
Ma, L., Xu, N., Zhao, X., Zong, G., Huo, X.: Small-gain technique-based adaptive neural output-feedback fault-tolerant control of switched nonlinear systems with unmodeled dynamics. IEEE Trans. Syst. Man Cyber. Syst. 51(11), 7051–7062 (2020)
Article Google Scholar
Wen, G., Chen, C.P., Ge, S.S.: Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans. Cyber. 51(9), 4567–4580 (2020)
Article Google Scholar
Zhang, L., Liu, M., Xie, B.: Optimal control of an SIQRS epidemic model with three measures on networks. Nonlinear Dyn. 103(2), 2097–2107 (2021)
Article MATH Google Scholar
Tang, F., Niu, B., Zong, G., Zhao, X., Xu, N.: Periodic event-triggered adaptive tracking control design for nonlinear discrete-time systems via reinforcement learning. Neural Netw. 154, 43–55 (2022)
Article Google Scholar
Tousain, R., Boissy, J.-C., Norg, M., Steinbuch, M., Bosgra, O.: Suppressing non-periodically repeating disturbances in mechanical servo systems. In: Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No. 98CH36171), vol. 3, pp. 2541–2542. IEEE (1998)
Lee, J.M., Lee, J.H.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41(7), 1281–1288 (2005)
Article MathSciNet MATH Google Scholar
Yang, Y., Kiumarsi, B., Modares, H., Xu, C.: Model-free \(\lambda \)-policy iteration for discrete-time linear quadratic regulation. IEEE Trans. Neural Networks Learn. Syst. (2021)
Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Networks 22(12), 2226–2236 (2011)
Article Google Scholar
de Jesus Rubio, J., Yu, W.: Stability analysis of nonlinear system identification via delayed neural networks. IEEE Trans. Circ. Syst. II Express Briefs 54(2), 161–165 (2007)
Google Scholar
Liu, S., Niu, B., Zong, G., Zhao, X., Xu, N.: Data-driven-based event-triggered optimal control of unknown nonlinear systems with input constraints. Nonlinear Dyn. 1–19 (2022)
Bayer, M., Kaufhold, M.-A., Reuter, C.: A survey on data augmentation for text classification. ACM Comput. Surveys. (2021)
Nalepa, J., Marcinkiewicz, M., Kawulok, M.: Data augmentation for brain-tumor segmentation: a review. Front. Comput. Neurosci. 13, 83 (2019)
Article Google Scholar
Antoniou, A., Storkey, A., Edwards, H.: Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017)
Yang, Y., Pan, Y., Xu, C.-Z., Wunsch, D.C.: Hamiltonian-driven adaptive dynamic programming with efficient experience replay. IEEE Trans. Neural Networks Learn. Syst. (2022)
Li, T., Zhao, D., Yi, J.: Heuristic dynamic programming strategy with eligibility traces. In: 2008 American Control Conference, pp. 4535–4540 . IEEE (2008)
Van Hasselt, H., Madjiheurem, S., Hessel, M., Silver, D., Barreto, A., Borsa, D.: Expected eligibility traces. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 9997–10005 (2021)
Bi, W., Xuelian, L., Zhiqiang, G., Yang, C.: Gradient compensation traces based temporal difference learning. Neurocomputing 442, 221–235 (2021)
Article Google Scholar
Ye, J., Bian, Y., Xu, B., Qin, Z., Hu, M.: Online optimal control of discrete-time systems based on globalized dual heuristic programming with eligibility traces. In: 2021 3rd International Conference on Industrial Artificial Intelligence (IAI), pp. 1–6. IEEE (2021)
Duque, E.M.S., Giraldo, J.S., Vergara, P.P., Nguyen, P., van der Molen, A., Slootweg, H.: Community energy storage operation via reinforcement learning with eligibility traces. Electr. Power Syst. Res. 212, 108515 (2022)
Article Google Scholar
Yuan, J., Wan, J., Zhang, X., Xu, Y., Zeng, Y., Ren, Y.: A second-order dynamic and static ship path planning model based on reinforcement learning and heuristic search algorithms. EURASIP J. Wirel. Commun. Netw. 2022(1), 1–29 (2022)
Article Google Scholar
Padrao, P., Dominguez, A., Bobadilla, L., Smith, R.N.: Towards learning ocean models for long-term navigation in dynamic environments. In: OCEANS 2022-Chennai, pp. 1–6 . IEEE (2022)
Wei, Q., Liu, D., Yang, X.: Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans. Neural Networks Learn. Syst. 26(4), 866–879 (2015)
Article MathSciNet Google Scholar
Luo, B., Liu, D., Huang, T., Yang, X., Ma, H.: Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems. Inf. Sci. 411, 66–83 (2017)
Article MATH Google Scholar
Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)
Article MathSciNet MATH Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62273234.

Author information

Authors and Affiliations

Department of Automation and Key Laboratory of System Control and Information Processing, Shanghai Jiao Tong University, Shanghai, 200240, China
Jun Rao, Jingcheng Wang, Jiahui Xu & Shangwei Zhao

Authors

Jun Rao
View author publications
You can also search for this author in PubMed Google Scholar
Jingcheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shangwei Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingcheng Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rao, J., Wang, J., Xu, J. et al. Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces. Nonlinear Dyn 111, 20041–20053 (2023). https://doi.org/10.1007/s11071-023-08909-6

Download citation

Received: 15 December 2022
Accepted: 11 September 2023
Published: 29 September 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s11071-023-08909-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

Abstract

Access this article

Similar content being viewed by others

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Finite Horizon Robust Optimal Tracking Control Based on Approximate Dynamic Programming for Switched Systems with Uncertainties

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

Abstract

Access this article

Similar content being viewed by others

An adaptive dynamic programming-based algorithm for infinite-horizon linear quadratic stochastic optimal control problems

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Finite Horizon Robust Optimal Tracking Control Based on Approximate Dynamic Programming for Switched Systems with Uncertainties

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation