Skip to main content
Log in

Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces

  • Original Paper
  • Published:
Nonlinear Dynamics Aims and scope Submit manuscript

Abstract

Optimal control of nonlinear systems by using adaptive dynamic programming (ADP) methods is always a hot topic in recent years. However, unknown nonlinear systems with limited data and the infinite horizon of optimal control performance problems make many proposed methods no longer efficient or applicable. To solve the above issues, a novel model-free T-DPG(\(\lambda \)) with ET method has been presented for a class of affine discrete-time nonlinear systems. By utilizing eligibility traces, the new method can expand the information of limited data and guides the control faster in the optimal direction. The finite terms, rather than infinite terms, of the optimal performance, are used to solve the infinite-horizon optimal control problems. Considering the unknown dynamic systems, a model-free algorithm sampling sequences only obtaining the control signal and state transition process is proposed. Furthermore, the convergence and boundedness of the algorithm are roughly proved. With a neural network based actor-critic architecture, the optimal policy is well-approximated by actor networks. Finally, the effectiveness of the proposed algorithm is demonstrated by two simulation examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Algorithm 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.

References

  1. Papachristos, S.G.: Adaptive Dynamic Programming in Inventory Control. The University of Manchester, Manchester (1977)

    MATH  Google Scholar 

  2. Liu, D., Xue, S., Zhao, B., Luo, B., Wei, Q.: Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans. Syst. Man Cyber. Syst. 51(1), 142–160 (2020)

    Article  Google Scholar 

  3. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)

    MATH  Google Scholar 

  4. Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circ. Syst. Mag. 9(3), 32–50 (2009)

    Article  Google Scholar 

  5. Kiumarsi, B., Vamvoudakis, K.G., Modares, H., Lewis, F.L.: Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans. Neural Networks Learn. Syst. 29(6), 2042–2062 (2017)

    Article  MathSciNet  Google Scholar 

  6. Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 38(4), 937–942 (2008)

  8. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)

    Article  Google Scholar 

  9. Cui, R., Yang, C., Li, Y., Sharma, S.: Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning. IEEE Trans. Syst. Man Cyber. Syst. 47(6), 1019–1029 (2017)

    Article  Google Scholar 

  10. Lu, J., Wei, Q., Wang, F.-Y.: Parallel control for optimal tracking via adaptive dynamic programming. IEEE/CAA J. Automatica Sinica 7(6), 1662–1674 (2020)

    Article  MathSciNet  Google Scholar 

  11. Wei, Q., Song, R., Liao, Z., Li, B., Lewis, F.L.: Discrete-time impulsive adaptive dynamic programming. IEEE Trans. Cyber. 50(10), 4293–4306 (2019)

    Article  Google Scholar 

  12. Lin, H., Wei, Q., Liu, D.: Online identifier-actor-critic algorithm for optimal control of nonlinear systems. Opt. Control Appl. Methods 38(3), 317–335 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  13. Wang, W., Chen, X.: Model-free optimal containment control of multi-agent systems based on actor-critic framework. Neurocomputing 314, 242–250 (2018)

    Article  Google Scholar 

  14. Zhang, H., Wang, H., Niu, B., Zhang, L., Ahmad, A.M.: Sliding-mode surface-based adaptive actor-critic optimal control for switched nonlinear systems with average dwell time. Inf. Sci. 580, 756–774 (2021)

    Article  MathSciNet  Google Scholar 

  15. Cao, X., Zhang, C., Zhao, D., Li, Y.: Guaranteed cost positive consensus for multi-agent systems with multiple time-varying delays and MDADT switching. Nonlinear Dyn. 107(4), 3557–3572 (2022)

    Article  Google Scholar 

  16. Ma, L., Xu, N., Zhao, X., Zong, G., Huo, X.: Small-gain technique-based adaptive neural output-feedback fault-tolerant control of switched nonlinear systems with unmodeled dynamics. IEEE Trans. Syst. Man Cyber. Syst. 51(11), 7051–7062 (2020)

    Article  Google Scholar 

  17. Wen, G., Chen, C.P., Ge, S.S.: Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans. Cyber. 51(9), 4567–4580 (2020)

    Article  Google Scholar 

  18. Zhang, L., Liu, M., Xie, B.: Optimal control of an SIQRS epidemic model with three measures on networks. Nonlinear Dyn. 103(2), 2097–2107 (2021)

    Article  MATH  Google Scholar 

  19. Tang, F., Niu, B., Zong, G., Zhao, X., Xu, N.: Periodic event-triggered adaptive tracking control design for nonlinear discrete-time systems via reinforcement learning. Neural Netw. 154, 43–55 (2022)

    Article  Google Scholar 

  20. Tousain, R., Boissy, J.-C., Norg, M., Steinbuch, M., Bosgra, O.: Suppressing non-periodically repeating disturbances in mechanical servo systems. In: Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No. 98CH36171), vol. 3, pp. 2541–2542. IEEE (1998)

  21. Lee, J.M., Lee, J.H.: Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41(7), 1281–1288 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  22. Yang, Y., Kiumarsi, B., Modares, H., Xu, C.: Model-free \(\lambda \)-policy iteration for discrete-time linear quadratic regulation. IEEE Trans. Neural Networks Learn. Syst. (2021)

  23. Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Networks 22(12), 2226–2236 (2011)

    Article  Google Scholar 

  24. de Jesus Rubio, J., Yu, W.: Stability analysis of nonlinear system identification via delayed neural networks. IEEE Trans. Circ. Syst. II Express Briefs 54(2), 161–165 (2007)

    Google Scholar 

  25. Liu, S., Niu, B., Zong, G., Zhao, X., Xu, N.: Data-driven-based event-triggered optimal control of unknown nonlinear systems with input constraints. Nonlinear Dyn. 1–19 (2022)

  26. Bayer, M., Kaufhold, M.-A., Reuter, C.: A survey on data augmentation for text classification. ACM Comput. Surveys. (2021)

  27. Nalepa, J., Marcinkiewicz, M., Kawulok, M.: Data augmentation for brain-tumor segmentation: a review. Front. Comput. Neurosci. 13, 83 (2019)

    Article  Google Scholar 

  28. Antoniou, A., Storkey, A., Edwards, H.: Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017)

  29. Yang, Y., Pan, Y., Xu, C.-Z., Wunsch, D.C.: Hamiltonian-driven adaptive dynamic programming with efficient experience replay. IEEE Trans. Neural Networks Learn. Syst. (2022)

  30. Li, T., Zhao, D., Yi, J.: Heuristic dynamic programming strategy with eligibility traces. In: 2008 American Control Conference, pp. 4535–4540 . IEEE (2008)

  31. Van Hasselt, H., Madjiheurem, S., Hessel, M., Silver, D., Barreto, A., Borsa, D.: Expected eligibility traces. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 9997–10005 (2021)

  32. Bi, W., Xuelian, L., Zhiqiang, G., Yang, C.: Gradient compensation traces based temporal difference learning. Neurocomputing 442, 221–235 (2021)

    Article  Google Scholar 

  33. Ye, J., Bian, Y., Xu, B., Qin, Z., Hu, M.: Online optimal control of discrete-time systems based on globalized dual heuristic programming with eligibility traces. In: 2021 3rd International Conference on Industrial Artificial Intelligence (IAI), pp. 1–6. IEEE (2021)

  34. Duque, E.M.S., Giraldo, J.S., Vergara, P.P., Nguyen, P., van der Molen, A., Slootweg, H.: Community energy storage operation via reinforcement learning with eligibility traces. Electr. Power Syst. Res. 212, 108515 (2022)

    Article  Google Scholar 

  35. Yuan, J., Wan, J., Zhang, X., Xu, Y., Zeng, Y., Ren, Y.: A second-order dynamic and static ship path planning model based on reinforcement learning and heuristic search algorithms. EURASIP J. Wirel. Commun. Netw. 2022(1), 1–29 (2022)

    Article  Google Scholar 

  36. Padrao, P., Dominguez, A., Bobadilla, L., Smith, R.N.: Towards learning ocean models for long-term navigation in dynamic environments. In: OCEANS 2022-Chennai, pp. 1–6 . IEEE (2022)

  37. Wei, Q., Liu, D., Yang, X.: Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans. Neural Networks Learn. Syst. 26(4), 866–879 (2015)

    Article  MathSciNet  Google Scholar 

  38. Luo, B., Liu, D., Huang, T., Yang, X., Ma, H.: Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems. Inf. Sci. 411, 66–83 (2017)

    Article  MATH  Google Scholar 

  39. Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62273234.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingcheng Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rao, J., Wang, J., Xu, J. et al. Optimal control of nonlinear system based on deterministic policy gradient with eligibility traces. Nonlinear Dyn 111, 20041–20053 (2023). https://doi.org/10.1007/s11071-023-08909-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11071-023-08909-6

Keywords

Navigation