Skip to main content
Log in

Variable gain gradient descent-based reinforcement learning for robust optimal tracking control of uncertain nonlinear system with input constraints

  • Original Paper
  • Published:
Nonlinear Dynamics Aims and scope Submit manuscript

Abstract

In recent times, a variety of reinforcement learning (RL) algorithms have been proposed for optimal tracking problem of continuous time nonlinear systems with input constraints. Most of these algorithms are based on the notion of uniform ultimate boundedness (UUB) stability, in which normally higher learning rates are avoided in order to restrict oscillations in state error to smaller values. However, this comes at the cost of higher convergence time of critic neural network weights. This paper addresses that problem by proposing a novel tuning law containing a variable gain gradient descent for critic neural network that can adjust the learning rate based on Hamilton–Jacobi–Bellman (HJB) approximation error. By allowing high learning rate the proposed variable gain gradient descent tuning law could improve the convergence time of critic neural network weights. Simultaneously, it also results in tighter residual set, on which trajectories of augmented system converge to, leading to smaller oscillations in state error. A tighter bound for UUB stability of the proposed update mechanism is proved. Numerical studies are then furnished to validate the variable gain gradient descent-based update law presented in this paper on a continuous time nonlinear system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availibility

The data used for the simulation in Sect. 5.2 has been considered from Page no. 276 in [4].

References

  1. Abad, E.C., Alonso, J.M., García, M.J.G., García-Prada, J.C.: Methodology for the navigation optimization of a terrain-adaptive unmanned ground vehicle. Int. J. Adv. Robot. Syst. 15(1), 1729881417752726 (2018)

    Article  Google Scholar 

  2. Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)

    Article  MathSciNet  Google Scholar 

  3. Abu-Khalaf, M., Lewis, F.L., Huang, J.: Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans. Neural Netw. 19(7), 1243–1252 (2008)

    Article  Google Scholar 

  4. Beard, R.W., McLain, T.W.: Small Unmanned Aircraft: Theory and Practice. Princeton University Press, Princeton (2012)

    Book  Google Scholar 

  5. Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1), 82–92 (2013)

    Article  MathSciNet  Google Scholar 

  6. Corral, E., García, M., Castejon, C., Meneses, J., Gismeros, R.: Dynamic modeling of the dissipative contact and friction forces of a passive biped-walking robot. Appl. Sci. 10(7), 2342 (2020)

    Article  Google Scholar 

  7. Dierks, T., Jagannathan, S.: Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the 2010 American Control Conference, pp. 1568–1573. IEEE (2010)

  8. Hendzel, Z.: An adaptive critic neural network for motion control of a wheeled mobile robot. Nonlinear Dyn. 50(4), 849–855 (2007)

    Article  Google Scholar 

  9. Heydari, A., Balakrishnan, S.N.: Fixed-final-time optimal tracking control of input-affine nonlinear systems. Neurocomputing 129, 528–539 (2014)

    Article  Google Scholar 

  10. Hornik, K., Stinchcombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 3(5), 551–560 (1990)

    Article  Google Scholar 

  11. Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 882–893 (2014)

    Article  Google Scholar 

  12. Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M.B.: Reinforcement q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)

    Article  MathSciNet  Google Scholar 

  13. Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, vol. 17. John Wiley, Hoboken (2013)

    Google Scholar 

  14. Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)

    Article  Google Scholar 

  15. Lin, W.S.: Optimality and convergence of adaptive optimal control by reinforcement synthesis. Automatica 47(5), 1047–1052 (2011)

    Article  MathSciNet  Google Scholar 

  16. Liu, D., Yang, X., Li, H.: Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics. Neural Comput. Appl. 23(7–8), 1843–1850 (2013)

    Article  Google Scholar 

  17. Liu, D., Yang, X., Wang, D., Wei, Q.: Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans. Cybern. 45(7), 1372–1385 (2015)

    Article  Google Scholar 

  18. Lyashevskiy, S.: Constrained optimization and control of nonlinear systems: new results in optimal control. In: Proceedings of 35th IEEE Conference on Decision and Control, vol. 1, pp. 541–546. IEEE (1996)

  19. Modares, H., Lewis, F.L.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)

    Article  MathSciNet  Google Scholar 

  20. Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)

    Article  Google Scholar 

  21. Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)

    Article  MathSciNet  Google Scholar 

  22. Rudin, W., et al.: Principles of Mathematical Analysis, vol. 3. McGraw-hill, New York (1964)

    MATH  Google Scholar 

  23. Sadeghi, M., Abaspour, A., Sadati, S.H.: A novel integrated guidance and control system design in formation flight. J. Aerosp. Technol. Manag. 7(4), 432–442 (2015)

    Article  Google Scholar 

  24. Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)

    Article  MathSciNet  Google Scholar 

  25. Vamvoudakis, K.G., Vrabie, D., Lewis, F.L.: Online adaptive algorithm for optimal control with integral reinforcement learning. Int. J. Robust Nonlinear Control 24(17), 2686–2710 (2014)

    Article  MathSciNet  Google Scholar 

  26. Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)

    Article  MathSciNet  Google Scholar 

  27. Wang, N., Gao, Y., Zhao, H., Ahn, C.K.: Reinforcement learning-based optimal tracking control of an unknown unmanned surface vehicle. IEEE Trans. Neural Netw. Learn. Syst. 32, 3034–3035 (2020)

    Article  MathSciNet  Google Scholar 

  28. Yang, X., Liu, D., Wang, D.: Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int. J. Control 87(3), 553–566 (2014)

    Article  MathSciNet  Google Scholar 

  29. Yang, X., Liu, D., Wei, Q.: Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl. 8(16), 1676–1688 (2014)

    Article  MathSciNet  Google Scholar 

  30. Yang, X., Liu, D., Wei, Q.: Robust tracking control of uncertain nonlinear systems using adaptive dynamic programming. In: International Conference on Neural Information Processing, pp. 9–16. Springer (2015)

  31. Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22(12), 2226–2236 (2011)

    Article  Google Scholar 

  32. Zhao, B., Jia, L., Xia, H., Li, Y.: Adaptive dynamic programming-based stabilization of nonlinear systems with unknown actuator saturation. Nonlinear Dyn. 93(4), 2089–2103 (2018)

    Article  Google Scholar 

  33. Zhao, D., Zhu, Y.: MEC-A near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans. Neural Netw. Learn. Syst. 26(2), 346–356 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Ethics declarations

Conflict of interest

This study is not funded by any grant, and the authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

Lemma 1

Following equality holds true:

$$\begin{aligned}&2u_m\int _0^{-u_m\tanh {A(z)}}\tanh ^{-1}(\nu /u_m)^TRd\nu \nonumber \\&\quad =2u_m^2A^T(z)R\tanh {A(z)}+ u_m^2 \nonumber \\&\qquad \sum _{i=1}^{m}R_i\ln [1-\tanh ^2{A_i(z)}] \end{aligned}$$
(70)

Proof

$$\begin{aligned} \int \tanh ^{-1}\Big (\frac{x}{a}\Big )&=\frac{1}{2}a\ln {(a^2-x^2)} \nonumber \\&\quad +x\tanh ^{-1}\Big (\frac{x}{a}\Big )+I \end{aligned}$$
(71)

Therefore,

$$\begin{aligned} \int _0^u \tanh ^{-1} \Big (\frac{\nu }{u_m}\Big )d\nu&=\frac{1}{2}u_m\ln {(u_m^2-\nu ^2)} \nonumber \\&\quad +\nu \tanh ^{-1}\Big (\frac{\nu }{u_m}\Big )\Big |_0^u \nonumber \\&\qquad 2u_m\int _0^u \tanh ^{-1} \Big (\frac{\nu }{u_m}\Big )d\nu \nonumber \\&\quad =u_m^2\ln {(u_m^2-\nu ^2)} \nonumber \\&\quad +2u_m\nu \tanh ^{-1}\Big (\frac{\nu }{u_m}\Big )\Big |_0^u \nonumber \\&=u_m^2\ln {(1-\frac{u^2}{u_m^2})} \nonumber \\&\quad +2u_m^2\tanh {A(z)} \nonumber \\&=u_m^2\ln {(1-\tanh ^2{A(z)})} \nonumber \\&\quad +2u_m^2\tanh {A(z)} \end{aligned}$$
(72)

where \(u=-u_m\tanh {A(z)}\) is scalar. Now if u is a vector, then,

$$\begin{aligned}&2u_m\int _0^u \tanh ^{-1} \Big (\frac{\nu }{u_m}\Big )Rd\nu \nonumber \\&=2u_m^2A^T(z)R\tanh {A(z)} \nonumber \\&+ u_m^2\sum _{i=1}^{m}R_i\ln [1-\tanh ^2{A_i(z)}] \end{aligned}$$
(73)

\(\square \)

Lemma 2

Following inequality holds true:

$$\begin{aligned} C(u_i)=2u_m\int _{0}^{u_i}\psi ^{-1}(\frac{\nu }{u_m})R_id\nu \ge 0 \end{aligned}$$
(74)

if \(\psi ^{-1}\) is monotonic odd and increasing and \(R_i>0\). Where \(u_i \in {\mathbb {R}},i=1,2,...,m\)

Proof

If \(\psi ^{-1}\) is monotonic odd and increasing, then,

$$\begin{aligned} \begin{aligned} \Big (\frac{\nu }{u_m}\Big )\psi ^{-1}\Big (\frac{\nu }{u_m}\Big ) \ge 0 \end{aligned} \end{aligned}$$
(75)

or

$$\begin{aligned} \nu \psi ^{-1}\Big (\frac{\nu }{u_m}\Big ) \ge 0 \end{aligned}$$
(76)

where \(\nu \in {\mathbb {R}}\) and \(u_m>0\). Let \(\theta =1/u_m\). In order to prove that \(2u_m\times \int _{0}^{u_i}\psi ^{-1}(\nu /u_m)R_id\nu \ge 0\), it is enough to prove that, \(\int _{0}^{u_i}\psi ^{-1}(\nu \theta )d\nu \ge 0\). In order to prove this inequality, a variable, \({\mathcal {K}}\in [0,\theta ]\) is assumed. Therefore,

$$\begin{aligned} \begin{aligned} \int _{0}^{u_i}\psi ^{-1}(\nu \theta )d\nu =\frac{1}{\theta }\int _{0}^{u_i\theta }\psi ^{-1}(l)dl \end{aligned} \end{aligned}$$
(77)

where \(l=\nu \theta \). Similarly,

$$\begin{aligned} \begin{aligned} \frac{1}{\theta }\int _{0}^{u_i\theta }\psi ^{-1}(l)dl=\frac{1}{\theta }\int _{0}^{\theta }\psi ^{-1}(u_i{\mathcal {K}})u_id{\mathcal {K}} \end{aligned} \end{aligned}$$
(78)

by utilizing \(l=u_i{\mathcal {K}}\)

Since, \(\psi ^{-1}(u_i{\mathcal {K}})u_i \ge 0\), which implies,

$$\begin{aligned} \frac{1}{\theta }\int _{0}^{\theta }\psi ^{-1}(u_i{\mathcal {K}})u_id{\mathcal {K}}\ge 0 \end{aligned}$$
(79)

\(\square \)

Lemma 3

Following equation holds true:

$$\begin{aligned} \begin{aligned} u&=-u_m\tanh {\Big (\frac{1}{2u_m}R^{-1}{\hat{G}}^T \nabla {\vartheta }^TW+\varepsilon _{u^*}\Big )}\\&=-u_m\tanh {(\tau _1(z))}\\&\quad +\varepsilon _{u} \end{aligned} \end{aligned}$$
(80)

where \(\varepsilon _{u^*}=(1/2u_m)R^{-1}{\hat{G}}^T\nabla {\varepsilon }(z)= [\varepsilon _{{u^*}_{1}},\varepsilon _{{u^*}_{2}},...,\varepsilon _{{u^*}_{m}}]^T \in {\mathbb {R}}^m\). \(\tau _{1}(z)=(1/2u_m)R^{-1}{\hat{G}}^T \nabla {\vartheta }^TW=[\tau _{11},...,\tau _{1m}]^T \in {\mathbb {R}}^m\) and \(\varepsilon _{u}=-(1/2)\times ((I_m-diag(\tanh ^2{(q)}))R^{-1}{\hat{G}}^T\nabla {\varepsilon })\) with \(q \in {\mathbb {R}}^m\) and \(q_i \in {\mathbb {R}}\) considered between \(\tau _{1i}+\varepsilon _{u^*_i}\) and \(\varepsilon _{u^*_i}\) i.e., \(i^{th}\) element of \(\varepsilon _{u^*}\).

Proof

$$\begin{aligned} u=-u_m\tanh {(\tau _1+\varepsilon _{u^*})} \end{aligned}$$
(81)

Using mean value theorem,

$$\begin{aligned}&\tanh {(\tau _1+\varepsilon _{u^*})}-\tanh {(\tau _1)}=\tanh ^{'}{(q)}\varepsilon _{u^*} \nonumber \\&=(I_m-diag(\tanh ^2{(q)}))\varepsilon _{u^*} \end{aligned}$$
(82)

where \(q \in {\mathbb {R}}^m\) and \(q_i \in {\mathbb {R}}\) lying between \(\tau _{1i}\) and \(\tau _{1i}+\varepsilon _{u^*_i}\).

Now, using the expression for \(\varepsilon _{u^*}\) in (82), \(\tanh {(\tau _1+\varepsilon _{u^*})}\) can be rewritten as:

$$\begin{aligned} \tanh {(\tau _1+\varepsilon _{u^*})}&=\tanh {(\tau _1)}+(I_m-diag(\tanh ^2{(q)})) \nonumber \\&\quad \times \left( \frac{1}{2u_m}R^{-1}{\hat{G}}^T\nabla {\varepsilon }(z)\right) \end{aligned}$$
(83)

Multiplying \(-u_m\) on both sides,

$$\begin{aligned} -u_m\tanh {(\tau _1+\varepsilon _{u^*})}&=-u_m\tanh {(\tau _1)} \nonumber \\&\quad -\frac{1}{2}(I_m-diag(\tanh ^2{(q)})) \nonumber \\&\quad \times \left( R^{-1}{\hat{G}}^T(z)\nabla {\varepsilon }(z)\right) \end{aligned}$$
(84)

Hence proved. \(\square \)

Lemma 4

Following vector inequality holds true:

$$\begin{aligned} \Vert \tanh {(\tau _1(z))}-\tanh {(\tau _2(z))}\Vert \le T_m \le 2\sqrt{m} \end{aligned}$$
(85)

where \(T_m=\sqrt{\sum _{i=1}^mmin(|\tau _{1i}-\tau _{2i}|^2,4)}\), \(\tau _1(z)\) and \(\tau _2(z)\) both belong in \({\mathbb {R}}^m\); therefore, \(\tanh {(\tau _i(z))} \in {\mathbb {R}}^m,~i=1,2\).

Proof

Since, \(\tanh {(.)}\) is 1-Lipschitz, one can write,

$$\begin{aligned} \begin{aligned} |\tanh {(\tau _{1i})}-\tanh {(\tau _{2i})}| \le |\tau _{1i}-\tau _{2i}| \end{aligned} \end{aligned}$$
(86)

Therefore, using the above inequality and the fact that \(-1\le \tanh {(.)} \le 1\)

$$\begin{aligned} \begin{aligned} \Vert \tanh {(\tau _1(z))}-\tanh {(\tau _2(z))}\Vert ^2&= \sum _{i=1}^m|\tanh {\tau _{1i}}-\tanh {\tau _{2i}}|^2\\&\le \sum _{i=1}^mmin(|\tau _{1i}-\tau _{2i}|,2)^2\\&\le \sum _{i=1}^mmin(|\tau _{1i}-\tau _{2i}|^2,4) \end{aligned} \end{aligned}$$
(87)

One can also see, using the absolute upper bound of \(\tanh {(.)}\).

$$\begin{aligned} \sum _{i=1}^mmin(|\tau _{1i}-\tau _{2i}|^2,4)\le 2\sqrt{m} \end{aligned}$$
(88)

which implies,

$$\begin{aligned} \begin{aligned} \Vert \tanh {(\tau _1(z))}-\tanh {(\tau _2(z))}\Vert \le T_m \le 2\sqrt{m} \end{aligned} \end{aligned}$$
(89)

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mishra, A., Ghosh, S. Variable gain gradient descent-based reinforcement learning for robust optimal tracking control of uncertain nonlinear system with input constraints. Nonlinear Dyn 107, 2195–2214 (2022). https://doi.org/10.1007/s11071-021-06908-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11071-021-06908-z

Keywords

Navigation