Abstract
In recent times, a variety of reinforcement learning (RL) algorithms have been proposed for optimal tracking problem of continuous time nonlinear systems with input constraints. Most of these algorithms are based on the notion of uniform ultimate boundedness (UUB) stability, in which normally higher learning rates are avoided in order to restrict oscillations in state error to smaller values. However, this comes at the cost of higher convergence time of critic neural network weights. This paper addresses that problem by proposing a novel tuning law containing a variable gain gradient descent for critic neural network that can adjust the learning rate based on Hamilton–Jacobi–Bellman (HJB) approximation error. By allowing high learning rate the proposed variable gain gradient descent tuning law could improve the convergence time of critic neural network weights. Simultaneously, it also results in tighter residual set, on which trajectories of augmented system converge to, leading to smaller oscillations in state error. A tighter bound for UUB stability of the proposed update mechanism is proved. Numerical studies are then furnished to validate the variable gain gradient descent-based update law presented in this paper on a continuous time nonlinear system.
Similar content being viewed by others
References
Abad, E.C., Alonso, J.M., García, M.J.G., García-Prada, J.C.: Methodology for the navigation optimization of a terrain-adaptive unmanned ground vehicle. Int. J. Adv. Robot. Syst. 15(1), 1729881417752726 (2018)
Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)
Abu-Khalaf, M., Lewis, F.L., Huang, J.: Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans. Neural Netw. 19(7), 1243–1252 (2008)
Beard, R.W., McLain, T.W.: Small Unmanned Aircraft: Theory and Practice. Princeton University Press, Princeton (2012)
Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1), 82–92 (2013)
Corral, E., García, M., Castejon, C., Meneses, J., Gismeros, R.: Dynamic modeling of the dissipative contact and friction forces of a passive biped-walking robot. Appl. Sci. 10(7), 2342 (2020)
Dierks, T., Jagannathan, S.: Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the 2010 American Control Conference, pp. 1568–1573. IEEE (2010)
Hendzel, Z.: An adaptive critic neural network for motion control of a wheeled mobile robot. Nonlinear Dyn. 50(4), 849–855 (2007)
Heydari, A., Balakrishnan, S.N.: Fixed-final-time optimal tracking control of input-affine nonlinear systems. Neurocomputing 129, 528–539 (2014)
Hornik, K., Stinchcombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 3(5), 551–560 (1990)
Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 882–893 (2014)
Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M.B.: Reinforcement q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)
Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, vol. 17. John Wiley, Hoboken (2013)
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)
Lin, W.S.: Optimality and convergence of adaptive optimal control by reinforcement synthesis. Automatica 47(5), 1047–1052 (2011)
Liu, D., Yang, X., Li, H.: Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics. Neural Comput. Appl. 23(7–8), 1843–1850 (2013)
Liu, D., Yang, X., Wang, D., Wei, Q.: Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans. Cybern. 45(7), 1372–1385 (2015)
Lyashevskiy, S.: Constrained optimization and control of nonlinear systems: new results in optimal control. In: Proceedings of 35th IEEE Conference on Decision and Control, vol. 1, pp. 541–546. IEEE (1996)
Modares, H., Lewis, F.L.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)
Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)
Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)
Rudin, W., et al.: Principles of Mathematical Analysis, vol. 3. McGraw-hill, New York (1964)
Sadeghi, M., Abaspour, A., Sadati, S.H.: A novel integrated guidance and control system design in formation flight. J. Aerosp. Technol. Manag. 7(4), 432–442 (2015)
Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)
Vamvoudakis, K.G., Vrabie, D., Lewis, F.L.: Online adaptive algorithm for optimal control with integral reinforcement learning. Int. J. Robust Nonlinear Control 24(17), 2686–2710 (2014)
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)
Wang, N., Gao, Y., Zhao, H., Ahn, C.K.: Reinforcement learning-based optimal tracking control of an unknown unmanned surface vehicle. IEEE Trans. Neural Netw. Learn. Syst. 32, 3034–3035 (2020)
Yang, X., Liu, D., Wang, D.: Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int. J. Control 87(3), 553–566 (2014)
Yang, X., Liu, D., Wei, Q.: Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl. 8(16), 1676–1688 (2014)
Yang, X., Liu, D., Wei, Q.: Robust tracking control of uncertain nonlinear systems using adaptive dynamic programming. In: International Conference on Neural Information Processing, pp. 9–16. Springer (2015)
Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22(12), 2226–2236 (2011)
Zhao, B., Jia, L., Xia, H., Li, Y.: Adaptive dynamic programming-based stabilization of nonlinear systems with unknown actuator saturation. Nonlinear Dyn. 93(4), 2089–2103 (2018)
Zhao, D., Zhu, Y.: MEC-A near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans. Neural Netw. Learn. Syst. 26(2), 346–356 (2014)
Author information
Authors and Affiliations
Ethics declarations
Conflict of interest
This study is not funded by any grant, and the authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
Lemma 1
Following equality holds true:
Proof
Therefore,
where \(u=-u_m\tanh {A(z)}\) is scalar. Now if u is a vector, then,
\(\square \)
Lemma 2
Following inequality holds true:
if \(\psi ^{-1}\) is monotonic odd and increasing and \(R_i>0\). Where \(u_i \in {\mathbb {R}},i=1,2,...,m\)
Proof
If \(\psi ^{-1}\) is monotonic odd and increasing, then,
or
where \(\nu \in {\mathbb {R}}\) and \(u_m>0\). Let \(\theta =1/u_m\). In order to prove that \(2u_m\times \int _{0}^{u_i}\psi ^{-1}(\nu /u_m)R_id\nu \ge 0\), it is enough to prove that, \(\int _{0}^{u_i}\psi ^{-1}(\nu \theta )d\nu \ge 0\). In order to prove this inequality, a variable, \({\mathcal {K}}\in [0,\theta ]\) is assumed. Therefore,
where \(l=\nu \theta \). Similarly,
by utilizing \(l=u_i{\mathcal {K}}\)
Since, \(\psi ^{-1}(u_i{\mathcal {K}})u_i \ge 0\), which implies,
\(\square \)
Lemma 3
Following equation holds true:
where \(\varepsilon _{u^*}=(1/2u_m)R^{-1}{\hat{G}}^T\nabla {\varepsilon }(z)= [\varepsilon _{{u^*}_{1}},\varepsilon _{{u^*}_{2}},...,\varepsilon _{{u^*}_{m}}]^T \in {\mathbb {R}}^m\). \(\tau _{1}(z)=(1/2u_m)R^{-1}{\hat{G}}^T \nabla {\vartheta }^TW=[\tau _{11},...,\tau _{1m}]^T \in {\mathbb {R}}^m\) and \(\varepsilon _{u}=-(1/2)\times ((I_m-diag(\tanh ^2{(q)}))R^{-1}{\hat{G}}^T\nabla {\varepsilon })\) with \(q \in {\mathbb {R}}^m\) and \(q_i \in {\mathbb {R}}\) considered between \(\tau _{1i}+\varepsilon _{u^*_i}\) and \(\varepsilon _{u^*_i}\) i.e., \(i^{th}\) element of \(\varepsilon _{u^*}\).
Proof
Using mean value theorem,
where \(q \in {\mathbb {R}}^m\) and \(q_i \in {\mathbb {R}}\) lying between \(\tau _{1i}\) and \(\tau _{1i}+\varepsilon _{u^*_i}\).
Now, using the expression for \(\varepsilon _{u^*}\) in (82), \(\tanh {(\tau _1+\varepsilon _{u^*})}\) can be rewritten as:
Multiplying \(-u_m\) on both sides,
Hence proved. \(\square \)
Lemma 4
Following vector inequality holds true:
where \(T_m=\sqrt{\sum _{i=1}^mmin(|\tau _{1i}-\tau _{2i}|^2,4)}\), \(\tau _1(z)\) and \(\tau _2(z)\) both belong in \({\mathbb {R}}^m\); therefore, \(\tanh {(\tau _i(z))} \in {\mathbb {R}}^m,~i=1,2\).
Proof
Since, \(\tanh {(.)}\) is 1-Lipschitz, one can write,
Therefore, using the above inequality and the fact that \(-1\le \tanh {(.)} \le 1\)
One can also see, using the absolute upper bound of \(\tanh {(.)}\).
which implies,
\(\square \)
Rights and permissions
About this article
Cite this article
Mishra, A., Ghosh, S. Variable gain gradient descent-based reinforcement learning for robust optimal tracking control of uncertain nonlinear system with input constraints. Nonlinear Dyn 107, 2195–2214 (2022). https://doi.org/10.1007/s11071-021-06908-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11071-021-06908-z