Off-Policy Neuro-Optimal Control for Unknown Complex-Valued Nonlinear Systems

  • Ruizhuo Song
  • Qinglai Wei
  • Qing Li
Part of the Studies in Systems, Decision and Control book series (SSDC, volume 166)


This chapter establishes an optimal control of unknown complex-valued system. Policy iteration (PI) is used to obtain the solution of the Hamilton–Jacobi–Bellman (HJB) equation. Off-policy learning allows the iterative performance index and iterative control to be obtained by completely unknown dynamics. Critic and action networks are used to get the iterative control and iterative performance index, which execute policy evaluation and policy improvement. Asymptotic stability of the closed-loop system and the convergence of the iterative performance index function are proven. By Lyapunov technique, the uniformly ultimately bounded (UUB) of the weight error is proven. Simulation study demonstrates the effectiveness of the proposed optimal control method.


  1. 1.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction, A Bradford Book. The MIT Press, Cambridge (2005)Google Scholar
  2. 2.
    Lewis, F., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)CrossRefGoogle Scholar
  3. 3.
    Al-Tamimi, A., Lewis, F., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. Syst. Man Cybern. B Cybern. 38(4), 943–949 (2008)CrossRefGoogle Scholar
  4. 4.
    Murray, J., Cox, C., Lendaris, G., Saeks, R.: Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Syst. 32(2), 140–153 (2002)CrossRefGoogle Scholar
  5. 5.
    Modares, H., Lewis, F., Jiang, Z.: \(H_\infty \) tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 26(10), 2550–2562 (2015)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(9), 1733–1739 (2014)CrossRefGoogle Scholar
  7. 7.
    Wang, J., Xu, X., Liu, D., Sun, Z., Chen, Q.: Self-learning cruise control using kernel-based least squares policy iteration. IEEE Trans. Control Syst. Technol. 22(3), 1078–1087 (2014)CrossRefGoogle Scholar
  8. 8.
    Luo, B., Wu, H., Huang, T., Liu, D.: Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica 50(12), 3281–3290 (2014)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Modares, H., Lewis, F.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans. Autom. Control 59, 3051–3056 (2014)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Kiumarsi, B., Lewis, F., Modares, H., Karimpur, A., Naghibi-Sistani, M.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Abu-Khalaf, M., Lewis, F.: Nearly optimal control laws for nonlinear systems withsaturating actuators using a neural network HJB approach. Automatica 41, 779–791 (2005)CrossRefGoogle Scholar

Copyright information

© Science Press, Beijing and Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.University of Science and Technology BeijingBeijingChina
  2. 2.Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations