Variable gain gradient descent-based reinforcement learning for robust optimal tracking control of uncertain nonlinear system with input constraints

Mishra, Amardeep; Ghosh, Satadal

doi:10.1007/s11071-021-06908-z

Variable gain gradient descent-based reinforcement learning for robust optimal tracking control of uncertain nonlinear system with input constraints

Original Paper
Published: 15 January 2022

Volume 107, pages 2195–2214, (2022)
Cite this article

Nonlinear Dynamics Aims and scope Submit manuscript

599 Accesses
6 Citations
Explore all metrics

Abstract

In recent times, a variety of reinforcement learning (RL) algorithms have been proposed for optimal tracking problem of continuous time nonlinear systems with input constraints. Most of these algorithms are based on the notion of uniform ultimate boundedness (UUB) stability, in which normally higher learning rates are avoided in order to restrict oscillations in state error to smaller values. However, this comes at the cost of higher convergence time of critic neural network weights. This paper addresses that problem by proposing a novel tuning law containing a variable gain gradient descent for critic neural network that can adjust the learning rate based on Hamilton–Jacobi–Bellman (HJB) approximation error. By allowing high learning rate the proposed variable gain gradient descent tuning law could improve the convergence time of critic neural network weights. Simultaneously, it also results in tighter residual set, on which trajectories of augmented system converge to, leading to smaller oscillations in state error. A tighter bound for UUB stability of the proposed update mechanism is proved. Numerical studies are then furnished to validate the variable gain gradient descent-based update law presented in this paper on a continuous time nonlinear system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning-based robust optimal tracking control for disturbed nonlinear systems

Article 12 September 2023

Reinforcement Learning-Based Adaptive Output-Feedback Control for Discrete-Time Strict-Feedback Nonlinear Systems

Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning

Article 03 March 2023

Data availibility

The data used for the simulation in Sect. 5.2 has been considered from Page no. 276 in [4].

References

Abad, E.C., Alonso, J.M., García, M.J.G., García-Prada, J.C.: Methodology for the navigation optimization of a terrain-adaptive unmanned ground vehicle. Int. J. Adv. Robot. Syst. 15(1), 1729881417752726 (2018)
Article Google Scholar
Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)
Article MathSciNet Google Scholar
Abu-Khalaf, M., Lewis, F.L., Huang, J.: Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans. Neural Netw. 19(7), 1243–1252 (2008)
Article Google Scholar
Beard, R.W., McLain, T.W.: Small Unmanned Aircraft: Theory and Practice. Princeton University Press, Princeton (2012)
Book Google Scholar
Bhasin, S., Kamalapurkar, R., Johnson, M., Vamvoudakis, K.G., Lewis, F.L., Dixon, W.E.: A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1), 82–92 (2013)
Article MathSciNet Google Scholar
Corral, E., García, M., Castejon, C., Meneses, J., Gismeros, R.: Dynamic modeling of the dissipative contact and friction forces of a passive biped-walking robot. Appl. Sci. 10(7), 2342 (2020)
Article Google Scholar
Dierks, T., Jagannathan, S.: Optimal control of affine nonlinear continuous-time systems. In: Proceedings of the 2010 American Control Conference, pp. 1568–1573. IEEE (2010)
Hendzel, Z.: An adaptive critic neural network for motion control of a wheeled mobile robot. Nonlinear Dyn. 50(4), 849–855 (2007)
Article Google Scholar
Heydari, A., Balakrishnan, S.N.: Fixed-final-time optimal tracking control of input-affine nonlinear systems. Neurocomputing 129, 528–539 (2014)
Article Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 3(5), 551–560 (1990)
Article Google Scholar
Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 882–893 (2014)
Article Google Scholar
Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M.B.: Reinforcement q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)
Article MathSciNet Google Scholar
Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, vol. 17. John Wiley, Hoboken (2013)
Google Scholar
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9(3), 32–50 (2009)
Article Google Scholar
Lin, W.S.: Optimality and convergence of adaptive optimal control by reinforcement synthesis. Automatica 47(5), 1047–1052 (2011)
Article MathSciNet Google Scholar
Liu, D., Yang, X., Li, H.: Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics. Neural Comput. Appl. 23(7–8), 1843–1850 (2013)
Article Google Scholar
Liu, D., Yang, X., Wang, D., Wei, Q.: Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans. Cybern. 45(7), 1372–1385 (2015)
Article Google Scholar
Lyashevskiy, S.: Constrained optimization and control of nonlinear systems: new results in optimal control. In: Proceedings of 35th IEEE Conference on Decision and Control, vol. 1, pp. 541–546. IEEE (1996)
Modares, H., Lewis, F.L.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)
Article MathSciNet Google Scholar
Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)
Article Google Scholar
Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1), 193–202 (2014)
Article MathSciNet Google Scholar
Rudin, W., et al.: Principles of Mathematical Analysis, vol. 3. McGraw-hill, New York (1964)
MATH Google Scholar
Sadeghi, M., Abaspour, A., Sadati, S.H.: A novel integrated guidance and control system design in formation flight. J. Aerosp. Technol. Manag. 7(4), 432–442 (2015)
Article Google Scholar
Vamvoudakis, K.G., Lewis, F.L.: Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5), 878–888 (2010)
Article MathSciNet Google Scholar
Vamvoudakis, K.G., Vrabie, D., Lewis, F.L.: Online adaptive algorithm for optimal control with integral reinforcement learning. Int. J. Robust Nonlinear Control 24(17), 2686–2710 (2014)
Article MathSciNet Google Scholar
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)
Article MathSciNet Google Scholar
Wang, N., Gao, Y., Zhao, H., Ahn, C.K.: Reinforcement learning-based optimal tracking control of an unknown unmanned surface vehicle. IEEE Trans. Neural Netw. Learn. Syst. 32, 3034–3035 (2020)
Article MathSciNet Google Scholar
Yang, X., Liu, D., Wang, D.: Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints. Int. J. Control 87(3), 553–566 (2014)
Article MathSciNet Google Scholar
Yang, X., Liu, D., Wei, Q.: Online approximate optimal control for affine non-linear systems with unknown internal dynamics using adaptive dynamic programming. IET Control Theory Appl. 8(16), 1676–1688 (2014)
Article MathSciNet Google Scholar
Yang, X., Liu, D., Wei, Q.: Robust tracking control of uncertain nonlinear systems using adaptive dynamic programming. In: International Conference on Neural Information Processing, pp. 9–16. Springer (2015)
Zhang, H., Cui, L., Zhang, X., Luo, Y.: Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method. IEEE Trans. Neural Netw. 22(12), 2226–2236 (2011)
Article Google Scholar
Zhao, B., Jia, L., Xia, H., Li, Y.: Adaptive dynamic programming-based stabilization of nonlinear systems with unknown actuator saturation. Nonlinear Dyn. 93(4), 2089–2103 (2018)
Article Google Scholar
Zhao, D., Zhu, Y.: MEC-A near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans. Neural Netw. Learn. Syst. 26(2), 346–356 (2014)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Student in Dept. of Aerospace Engineering at Indian Institute of Technology, Madras, Chennai, 600036, India
Amardeep Mishra
Faculty in Dept. of Aerospace Engineering at Indian Institute of Technology, Madras, Chennai, 600036, India
Satadal Ghosh

Authors

Amardeep Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Satadal Ghosh
View author publications
You can also search for this author in PubMed Google Scholar

Ethics declarations

Conflict of interest

This study is not funded by any grant, and the authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Lemma 1

Following equality holds true:

$$\begin{aligned}&2u_m\int _0^{-u_m\tanh {A(z)}}\tanh ^{-1}(\nu /u_m)^TRd\nu \nonumber \\&\quad =2u_m^2A^T(z)R\tanh {A(z)}+ u_m^2 \nonumber \\&\qquad \sum _{i=1}^{m}R_i\ln [1-\tanh ^2{A_i(z)}] \end{aligned}$$

(70)

Proof

$$\begin{aligned} \int \tanh ^{-1}\Big (\frac{x}{a}\Big )&=\frac{1}{2}a\ln {(a^2-x^2)} \nonumber \\&\quad +x\tanh ^{-1}\Big (\frac{x}{a}\Big )+I \end{aligned}$$

(71)

Therefore,

$$\begin{aligned} \int _0^u \tanh ^{-1} \Big (\frac{\nu }{u_m}\Big )d\nu&=\frac{1}{2}u_m\ln {(u_m^2-\nu ^2)} \nonumber \\&\quad +\nu \tanh ^{-1}\Big (\frac{\nu }{u_m}\Big )\Big |_0^u \nonumber \\&\qquad 2u_m\int _0^u \tanh ^{-1} \Big (\frac{\nu }{u_m}\Big )d\nu \nonumber \\&\quad =u_m^2\ln {(u_m^2-\nu ^2)} \nonumber \\&\quad +2u_m\nu \tanh ^{-1}\Big (\frac{\nu }{u_m}\Big )\Big |_0^u \nonumber \\&=u_m^2\ln {(1-\frac{u^2}{u_m^2})} \nonumber \\&\quad +2u_m^2\tanh {A(z)} \nonumber \\&=u_m^2\ln {(1-\tanh ^2{A(z)})} \nonumber \\&\quad +2u_m^2\tanh {A(z)} \end{aligned}$$

(72)

where $u=-u_m\tanh {A(z)}$ is scalar. Now if u is a vector, then,

$$\begin{aligned}&2u_m\int _0^u \tanh ^{-1} \Big (\frac{\nu }{u_m}\Big )Rd\nu \nonumber \\&=2u_m^2A^T(z)R\tanh {A(z)} \nonumber \\&+ u_m^2\sum _{i=1}^{m}R_i\ln [1-\tanh ^2{A_i(z)}] \end{aligned}$$

(73)

$\square $

Lemma 2

Following inequality holds true:

$$\begin{aligned} C(u_i)=2u_m\int _{0}^{u_i}\psi ^{-1}(\frac{\nu }{u_m})R_id\nu \ge 0 \end{aligned}$$

(74)

if $\psi ^{-1}$ is monotonic odd and increasing and $R_i>0$. Where $u_i \in {\mathbb {R}},i=1,2,...,m$

Proof

If $\psi ^{-1}$ is monotonic odd and increasing, then,

$$\begin{aligned} \begin{aligned} \Big (\frac{\nu }{u_m}\Big )\psi ^{-1}\Big (\frac{\nu }{u_m}\Big ) \ge 0 \end{aligned} \end{aligned}$$

(75)

or

$$\begin{aligned} \nu \psi ^{-1}\Big (\frac{\nu }{u_m}\Big ) \ge 0 \end{aligned}$$

(76)

where $\nu \in {\mathbb {R}}$ and $u_m>0$. Let $\theta =1/u_m$. In order to prove that $2u_m\times \int _{0}^{u_i}\psi ^{-1}(\nu /u_m)R_id\nu \ge 0$, it is enough to prove that, $\int _{0}^{u_i}\psi ^{-1}(\nu \theta )d\nu \ge 0$. In order to prove this inequality, a variable, ${\mathcal {K}}\in [0,\theta ]$ is assumed. Therefore,

$$\begin{aligned} \begin{aligned} \int _{0}^{u_i}\psi ^{-1}(\nu \theta )d\nu =\frac{1}{\theta }\int _{0}^{u_i\theta }\psi ^{-1}(l)dl \end{aligned} \end{aligned}$$

(77)

where $l=\nu \theta $. Similarly,

$$\begin{aligned} \begin{aligned} \frac{1}{\theta }\int _{0}^{u_i\theta }\psi ^{-1}(l)dl=\frac{1}{\theta }\int _{0}^{\theta }\psi ^{-1}(u_i{\mathcal {K}})u_id{\mathcal {K}} \end{aligned} \end{aligned}$$

(78)

by utilizing $l=u_i{\mathcal {K}}$

Since, $\psi ^{-1}(u_i{\mathcal {K}})u_i \ge 0$, which implies,

$$\begin{aligned} \frac{1}{\theta }\int _{0}^{\theta }\psi ^{-1}(u_i{\mathcal {K}})u_id{\mathcal {K}}\ge 0 \end{aligned}$$

(79)

$\square $

Lemma 3

Following equation holds true:

$$\begin{aligned} \begin{aligned} u&=-u_m\tanh {\Big (\frac{1}{2u_m}R^{-1}{\hat{G}}^T \nabla {\vartheta }^TW+\varepsilon _{u^*}\Big )}\\&=-u_m\tanh {(\tau _1(z))}\\&\quad +\varepsilon _{u} \end{aligned} \end{aligned}$$

(80)

where $\varepsilon _{u^*}=(1/2u_m)R^{-1}{\hat{G}}^T\nabla {\varepsilon }(z)= [\varepsilon _{{u^*}_{1}},\varepsilon _{{u^*}_{2}},...,\varepsilon _{{u^*}_{m}}]^T \in {\mathbb {R}}^m$. $\tau _{1}(z)=(1/2u_m)R^{-1}{\hat{G}}^T \nabla {\vartheta }^TW=[\tau _{11},...,\tau _{1m}]^T \in {\mathbb {R}}^m$ and $\varepsilon _{u}=-(1/2)\times ((I_m-diag(\tanh ^2{(q)}))R^{-1}{\hat{G}}^T\nabla {\varepsilon })$ with $q \in {\mathbb {R}}^m$ and $q_i \in {\mathbb {R}}$ considered between $\tau _{1i}+\varepsilon _{u^*_i}$ and $\varepsilon _{u^*_i}$ i.e., $i^{th}$ element of $\varepsilon _{u^*}$.

Proof

$$\begin{aligned} u=-u_m\tanh {(\tau _1+\varepsilon _{u^*})} \end{aligned}$$

(81)

Using mean value theorem,

$$\begin{aligned}&\tanh {(\tau _1+\varepsilon _{u^*})}-\tanh {(\tau _1)}=\tanh ^{'}{(q)}\varepsilon _{u^*} \nonumber \\&=(I_m-diag(\tanh ^2{(q)}))\varepsilon _{u^*} \end{aligned}$$

(82)

where $q \in {\mathbb {R}}^m$ and $q_i \in {\mathbb {R}}$ lying between $\tau _{1i}$ and $\tau _{1i}+\varepsilon _{u^*_i}$.

Now, using the expression for $\varepsilon _{u^*}$ in (82), $\tanh {(\tau _1+\varepsilon _{u^*})}$ can be rewritten as:

$$\begin{aligned} \tanh {(\tau _1+\varepsilon _{u^*})}&=\tanh {(\tau _1)}+(I_m-diag(\tanh ^2{(q)})) \nonumber \\&\quad \times \left( \frac{1}{2u_m}R^{-1}{\hat{G}}^T\nabla {\varepsilon }(z)\right) \end{aligned}$$

(83)

Multiplying $-u_m$ on both sides,

$$\begin{aligned} -u_m\tanh {(\tau _1+\varepsilon _{u^*})}&=-u_m\tanh {(\tau _1)} \nonumber \\&\quad -\frac{1}{2}(I_m-diag(\tanh ^2{(q)})) \nonumber \\&\quad \times \left( R^{-1}{\hat{G}}^T(z)\nabla {\varepsilon }(z)\right) \end{aligned}$$

(84)

Hence proved. $\square $

Lemma 4

Following vector inequality holds true:

$$\begin{aligned} \Vert \tanh {(\tau _1(z))}-\tanh {(\tau _2(z))}\Vert \le T_m \le 2\sqrt{m} \end{aligned}$$

(85)

where $T_m=\sqrt{\sum _{i=1}^mmin(|\tau _{1i}-\tau _{2i}|^2,4)}$, $\tau _1(z)$ and $\tau _2(z)$ both belong in ${\mathbb {R}}^m$; therefore, $\tanh {(\tau _i(z))} \in {\mathbb {R}}^m,~i=1,2$.

Proof

Since, $\tanh {(.)}$ is 1-Lipschitz, one can write,

$$\begin{aligned} \begin{aligned} |\tanh {(\tau _{1i})}-\tanh {(\tau _{2i})}| \le |\tau _{1i}-\tau _{2i}| \end{aligned} \end{aligned}$$

(86)

Therefore, using the above inequality and the fact that $-1\le \tanh {(.)} \le 1$

$$\begin{aligned} \begin{aligned} \Vert \tanh {(\tau _1(z))}-\tanh {(\tau _2(z))}\Vert ^2&= \sum _{i=1}^m|\tanh {\tau _{1i}}-\tanh {\tau _{2i}}|^2\\&\le \sum _{i=1}^mmin(|\tau _{1i}-\tau _{2i}|,2)^2\\&\le \sum _{i=1}^mmin(|\tau _{1i}-\tau _{2i}|^2,4) \end{aligned} \end{aligned}$$

(87)

One can also see, using the absolute upper bound of $\tanh {(.)}$.

$$\begin{aligned} \sum _{i=1}^mmin(|\tau _{1i}-\tau _{2i}|^2,4)\le 2\sqrt{m} \end{aligned}$$

(88)

which implies,

$$\begin{aligned} \begin{aligned} \Vert \tanh {(\tau _1(z))}-\tanh {(\tau _2(z))}\Vert \le T_m \le 2\sqrt{m} \end{aligned} \end{aligned}$$

(89)

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mishra, A., Ghosh, S. Variable gain gradient descent-based reinforcement learning for robust optimal tracking control of uncertain nonlinear system with input constraints. Nonlinear Dyn 107, 2195–2214 (2022). https://doi.org/10.1007/s11071-021-06908-z

Download citation

Received: 14 December 2020
Accepted: 08 September 2021
Published: 15 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11071-021-06908-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable gain gradient descent-based reinforcement learning for robust optimal tracking control of uncertain nonlinear system with input constraints

Abstract

Access this article

Similar content being viewed by others

Reinforcement learning-based robust optimal tracking control for disturbed nonlinear systems

Reinforcement Learning-Based Adaptive Output-Feedback Control for Discrete-Time Strict-Feedback Nonlinear Systems

Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning

Data availibility

References

Author information

Authors and Affiliations

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendices

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variable gain gradient descent-based reinforcement learning for robust optimal tracking control of uncertain nonlinear system with input constraints

Abstract

Access this article

Similar content being viewed by others

Reinforcement learning-based robust optimal tracking control for disturbed nonlinear systems

Reinforcement Learning-Based Adaptive Output-Feedback Control for Discrete-Time Strict-Feedback Nonlinear Systems

Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning

Data availibility

References

Author information

Authors and Affiliations

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendices

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation