Abstract
Optimal control of nonlinear systems by using adaptive dynamic programming (ADP) methods is always a hot topic in recent years. However, unknown nonlinear systems with limited data and the infinite horizon of optimal control performance problems make many proposed methods no longer efficient or applicable. To solve the above issues, a novel model-free T-DPG(\(\lambda \)) with ET method has been presented for a class of affine discrete-time nonlinear systems. By utilizing eligibility traces, the new method can expand the information of limited data and guides the control faster in the optimal direction. The finite terms, rather than infinite terms, of the optimal performance, are used to solve the infinite-horizon optimal control problems. Considering the unknown dynamic systems, a model-free algorithm sampling sequences only obtaining the control signal and state transition process is proposed. Furthermore, the convergence and boundedness of the algorithm are roughly proved. With a neural network based actor-critic architecture, the optimal policy is well-approximated by actor networks. Finally, the effectiveness of the proposed algorithm is demonstrated by two simulation examples.
Similar content being viewed by others
Data availability
Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.
References
Papachristos, S.G.: Adaptive Dynamic Programming in Inventory Control. The University of Manchester, Manchester (1977)
Liu, D., Xue, S., Zhao, B., Luo, B., Wei, Q.: Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans. Syst. Man Cyber. Syst. 51(1), 142–160 (2020)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circ. Syst. Mag. 9(3), 32–50 (2009)
Kiumarsi, B., Vamvoudakis, K.G., Modares, H., Lewis, F.L.: Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans. Neural Networks Learn. Syst. 29(6), 2042–2062 (2017)
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)
Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 38(4), 937–942 (2008)
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
Cui, R., Yang, C., Li, Y., Sharma, S.: Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning. IEEE Trans. Syst. Man Cyber. Syst. 47(6), 1019–1029 (2017)
Lu, J., Wei, Q., Wang, F.-Y.: Parallel control for optimal tracking via adaptive dynamic programming. IEEE/CAA J. Automatica Sinica 7(6), 1662–1674 (2020)
Wei, Q., Song, R., Liao, Z., Li, B., Lewis, F.L.: Discrete-time impulsive adaptive dynamic programming. IEEE Trans. Cyber. 50(10), 4293–4306 (2019)
Lin, H., Wei, Q., Liu, D.: Online identifier-actor-critic algorithm for optimal control of nonlinear systems. Opt. Control Appl. Methods 38(3), 317–335 (2017)
Wang, W., Chen, X.: Model-free optimal containment control of multi-agent systems based on actor-critic framework. Neurocomputing 314, 242–250 (2018)
Zhang, H., Wang, H., Niu, B., Zhang, L., Ahmad, A.M.: Sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with average dwell time. Inf. Sci. 580, 756–774 (2021)
Cao, X., Zhang, C., Zhao, D., Li, Y.: Guaranteed cost positive consensus for multi-agent systems with multiple time-varying delays and MDADT switching. Nonlinear Dyn. 107(4), 3557–3572 (2022)
Ma, L., Xu, N., Zhao, X., Zong, G., Huo, X.: Small-gain technique-based adaptive neural output-feedback fault-tolerant control of switched nonlinear systems with unmodeled dynamics. IEEE Trans. Syst. Man Cyber. Syst. 51(11), 7051–7062 (2020)
Wen, G., Chen, C.P., Ge, S.S.: Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans. Cyber. 51(9), 4567–4580 (2020)
Zhang, L., Liu, M., Xie, B.: Optimal control of an SIQRS epidemic model with three measures on networks. Nonlinear Dyn. 103(2), 2097–2107 (2021)
Tang, F., Niu, B., Zong, G., Zhao, X., Xu, N.: Periodic event-triggered adaptive tracking control design for nonlinear discrete-time systems via reinforcement learning. Neural Netw. 154, 43–55 (2022)
Tousain, R., Boissy, J.-C., Norg, M., Steinbuch, M., Bosgra, O.: Suppressing non-periodically repeating disturbances in mechanical servo systems. In: Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No. 98CH36171), vol. 3, pp. 2541–2542. IEEE (1998)
Lee, J.M., Lee, J.H.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41(7), 1281–1288 (2005)
Yang, Y., Kiumarsi, B., Modares, H., Xu, C.: Model-free \(\lambda \)-policy iteration for discrete-time linear quadratic regulation. IEEE Trans. Neural Networks Learn. Syst. (2021)
Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Networks 22(12), 2226–2236 (2011)
de Jesus Rubio, J., Yu, W.: Stability analysis of nonlinear system identification via delayed neural networks. IEEE Trans. Circ. Syst. II Express Briefs 54(2), 161–165 (2007)
Liu, S., Niu, B., Zong, G., Zhao, X., Xu, N.: Data-driven-based event-triggered optimal control of unknown nonlinear systems with input constraints. Nonlinear Dyn. 1–19 (2022)
Bayer, M., Kaufhold, M.-A., Reuter, C.: A survey on data augmentation for text classification. ACM Comput. Surveys. (2021)
Nalepa, J., Marcinkiewicz, M., Kawulok, M.: Data augmentation for brain-tumor segmentation: a review. Front. Comput. Neurosci. 13, 83 (2019)
Antoniou, A., Storkey, A., Edwards, H.: Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017)
Yang, Y., Pan, Y., Xu, C.-Z., Wunsch, D.C.: Hamiltonian-driven adaptive dynamic programming with efficient experience replay. IEEE Trans. Neural Networks Learn. Syst. (2022)
Li, T., Zhao, D., Yi, J.: Heuristic dynamic programming strategy with eligibility traces. In: 2008 American Control Conference, pp. 4535–4540 . IEEE (2008)
Van Hasselt, H., Madjiheurem, S., Hessel, M., Silver, D., Barreto, A., Borsa, D.: Expected eligibility traces. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 9997–10005 (2021)
Bi, W., Xuelian, L., Zhiqiang, G., Yang, C.: Gradient compensation traces based temporal difference learning. Neurocomputing 442, 221–235 (2021)
Ye, J., Bian, Y., Xu, B., Qin, Z., Hu, M.: Online optimal control of discrete-time systems based on globalized dual heuristic programming with eligibility traces. In: 2021 3rd International Conference on Industrial Artificial Intelligence (IAI), pp. 1–6. IEEE (2021)
Duque, E.M.S., Giraldo, J.S., Vergara, P.P., Nguyen, P., van der Molen, A., Slootweg, H.: Community energy storage operation via reinforcement learning with eligibility traces. Electr. Power Syst. Res. 212, 108515 (2022)
Yuan, J., Wan, J., Zhang, X., Xu, Y., Zeng, Y., Ren, Y.: A second-order dynamic and static ship path planning model based on reinforcement learning and heuristic search algorithms. EURASIP J. Wirel. Commun. Netw. 2022(1), 1–29 (2022)
Padrao, P., Dominguez, A., Bobadilla, L., Smith, R.N.: Towards learning ocean models for long-term navigation in dynamic environments. In: OCEANS 2022-Chennai, pp. 1–6 . IEEE (2022)
Wei, Q., Liu, D., Yang, X.: Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans. Neural Networks Learn. Syst. 26(4), 866–879 (2015)
Luo, B., Liu, D., Huang, T., Yang, X., Ma, H.: Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems. Inf. Sci. 411, 66–83 (2017)
Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)
Funding
This work was supported by the National Natural Science Foundation of China under Grant 62273234.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rao, J., Wang, J., Xu, J. et al. Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces. Nonlinear Dyn 111, 20041–20053 (2023). https://doi.org/10.1007/s11071-023-08909-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11071-023-08909-6