Off-Policy Neuro-Optimal Control for Unknown Complex-Valued Nonlinear Systems
This chapter establishes an optimal control of unknown complex-valued system. Policy iteration (PI) is used to obtain the solution of the Hamilton–Jacobi–Bellman (HJB) equation. Off-policy learning allows the iterative performance index and iterative control to be obtained by completely unknown dynamics. Critic and action networks are used to get the iterative control and iterative performance index, which execute policy evaluation and policy improvement. Asymptotic stability of the closed-loop system and the convergence of the iterative performance index function are proven. By Lyapunov technique, the uniformly ultimately bounded (UUB) of the weight error is proven. Simulation study demonstrates the effectiveness of the proposed optimal control method.
- 1.Sutton, R., Barto, A.: Reinforcement Learning: An Introduction, A Bradford Book. The MIT Press, Cambridge (2005)Google Scholar