Reference modification for trajectory tracking using hybrid offline and online neural networks learning

In this paper, we propose a hybrid offline/online neural networks learning method, which combines complementary advantages of two types of neural networks (NNs): deep NN (DNN) and single-layer radial basis function NN (RBFNN). Firstly, after analyzing the mechatronic system’s model, we select reasonable features as the input of the DNN to learn the inverse dynamic characteristics of the closed-loop system offline, so as to establish the mapping between the desired trajectory and the reference trajectory of the system. The trained DNN is used to generate a new reference trajectory and compensate for the tracking error in advance, which can speed up the convergence of online learning control based on RBFNN. This reference trajectory is further modified iteratively when the tracking task is repeated. For this purpose, a single-layer RBFNN model is established, and an online learning algorithm is developed to update the RBFNN parameters. The proposed hybrid offline/online NN method can improve the tracking performance of mechatronic systems by modifying the reference trajectory on top of the baseline controller without affecting the system stability. To verify the effectiveness of this method, we conduct experiments on a piezoelectric drive platform.


Introduction
Trajectory tracking is a fundamental problem in control of mechatronic systems, e.g., robot manipulators and piezoelectric actuators (PEAs). In most cases, there are uncertainties and external disturbances, such as friction, sensor noise and variations of payload in the operations of these systems with nonlinear dynamics. A sequence of control methods were proposed to solve these issues, such as adaptive control [1], sliding mode control [2], learning control [3,4], and neural network control [5,6]. Moreover, mechatronic systems with imperfections is an universal control problem. In [7], a control strategy to ensure the optimal working conditions was proposed, which focused on the effects of using chaotic vibrational signals to excite the hidden dynamics of the imperfect system. In [8], the authors focused on a paradigmatic example of imperfect electromechanical structure and developed a control method to ensure coil rotation based on the excitation of the hidden dynamics induced by imperfections, characterizing its influence on the characteristics of the control signal and the power provided to the structure. Imperfections also play an important role in the realization of robust chaos generators based on simple circuits. In [9], a strategy for estimating hidden dynamics parameters was designed and synchronization of imperfect chaotic circuits was achieved. Compared to active research on advanced control approaches in academia, classical linear controllers such as PID control are still playing a crucial part in industries for the sake of implementation simplicity. However, it is well known that PID controller behaves poorly on complex trajectories and on systems with nonlinear dynamics [10]. Therefore, it is interesting to develop a control approach that is built on top of the off-the-shelf linear controllers but improves the tracking performance. Such a control approach has two significantly favorable features. First, most of the controllers provided by the manufacturers do not allow the users to modify low-level position controller but provides access to tunable parameters and a reference trajectory. In a position control task, the desired trajectory is the trajectory predefined for the mechatronic system to actually track. The reference trajectory can be obtained by using the trajectory generator module to modify the desired trajectory and is used as the input signal of the closed-loop mechatronic system. Second, without modifying the available control architecture, the system stability can be in general ensured. In this regard, many state-of-the-art controllers that design the control input such as [11][12][13][14][15][16][17][18][19] are not applicable.
To cope with disturbances and imperfections of mechatronic systems, a lot of research effort has been made on modifying the reference trajectory to improve the tracking performance on top of an available feedback control system. A large group of these works is iterative learning control (ILC) that improves the performance of trajectory tracking with repetition of a same task and using knowledge from previous iterations [20][21][22]. Although learning convergence can be proved in rigor, the information about the system learned by ILC cannot be transferred to another task, similar to adaptive control [23].
As mechatronic systems generally have complicated dynamics, which are influenced by uncertainties [24], there is ample motivation to investigate the effectiveness of machine learning in the control of mechatronic systems [25]. Some researchers were attracted by the excellent capabilities of deep neural networks (DNNs) in function approximation and thus revisited the idea of constructing an NN model for mechatronic systems [26], especially the inverse compensation control based on NN model [27][28][29]. In [30], a polynomial fitting model based on NN was proposed to describe the inverse dynamics of hysteresis in PEA. As a feedforward compensation module, the model is combined with a single neurogenic adaptive proportional integral differential controller to reduce the trajectory tracking error caused by hysteresis in piezoelectric drive mechatronic system. Different from the traditional control framework based on NN inverse model to approximate the open-loop dynamics and modify the control signal of the plant, the offline learning control framework proposed in this paper uses DNN to approximate the inverse dynamic characteristics of the closed-loop mechatronic system and uses the trained DNN as a trajectory generator to modify the reference trajectory, so that the tracking error can be compensated for in advance without changing the structure and stability of the baseline controller. Although the offline learning method can approximate the inverse dynamics of the closed-loop mechatronic system, the DNN model is still subject to modeling error, and the tracking control accuracy needs to be further improved based on online learning.
The online learning control framework based on iterative learning in this paper is suitable for repetitive tasks and can suppress unknown uncertainties. Compared to DNN, single-hidden-layer radial basis function neural networks (RBFNNs) have the advantages of simple structure and high computational efficiency. RBFNN is simple to implement in real time, and its learning convergence and the resultant closed-loop system stability can be strictly analyzed [31,32]. The control schemes based on RBFNN in the closed-loop control system mainly include supervisory control, model reference adaptive control, self-tuning control, etc. In [33], for a class of nonlinear systems with unknown parameters and bounded disturbances, RBFNN combined with single-parameter direct adaptive control was designed to overcome the problems caused by unknown dynamics and external disturbances in nonlinear systems. The traditional control methods based on RBFNN modify the control signal of the controlled plant, and the parameters of the RBFNN need to be updated all the time [34,35]. Different from these works, the online learning control framework based on RBFNN proposed in this paper uses iterative method to update the parameters of the RBFNN and modify the reference trajectory until the tracking error is reduced below a target threshold. The advantages of the method proposed in this paper are as follows: (1) For the repetitive trajectory, the repetitive interference and error in the system can be suppressed. (2) It does not change the structure of the baseline controller and will not affect the stability of the closed-loop system. Thus, it can be easily applied to commercial control systems.
Based on the above discussions, this paper will investigate reference trajectory modification for mechatronic systems, by integrating a DNN for offline learning and a single-layer RBFNN for online learning. First of all, DNN is offline-trained to approximate the inverse dynamics model of mechatronic systems, and the trained DNN is used to obtain the modified reference trajectory as the input of the closed-loop mechatronic system or further modified by online learning of RBFNN. Then, we propose the online learning control framework-based RBFNN, and combined with Lyapunov function, we design the learning law of RBFNN and prove the stability of the system. The offline NN learning method learns the inverse dynamics of the closed-loop system and speeds up the online learning, which can compensate for tracking error in advance. The online learning NN method can deal with uncertainties and disturbances and thus achieve precise trajectory tracking control.
The main contribution of this paper is the hybrid offline/ online learning control framework, which combines complementary advantages of DNN and a single-layer RBFNN. On the one hand, we propose the offline learning control framework with DNN as a reference trajectory generator, which is transferrable and can be used to conduct a new tracking task. Offline learning can provide an initial reference trajectory for online learning and speed up the convergence of RBFNN parameters; on the other hand, we propose online learning control framework with RBFNN to iteratively modify the reference trajectory generated by DNN as the input signal of the closed-loop mechatronic system, and prove its convergence.
The remaining structure of this paper is as below: Section 2 shows the system dynamics and transforms the control problem into mathematical models and introduces the proposed tracking control method based on hybrid offline/online NN. Sections 3 and 4 elaborate the processes of online and offline learning, respectively. Section 5 presents the results of the experiments. Section 6 concludes this work.
2 System description and control strategy

System description
According to [36][37][38], the schematic model of piezoelectric actuator illustrates a reversible transformation from electrical charge to mechanical energy, as shown in Fig. 1, where H, C, T em and x denote the hysteresis effect, the capacitance, the electromechanical transducer and the output displacement of piezoelectric actuator, respectively.
The dynamic equation of piezoelectric actuator can be expressed as: where u in denotes the input voltage, u h the voltage due to the hysteresis, and m z , b z , and k z are the mass, damping, and stiffness of the ceramic, respectively. In practice, there are external disturbances exerting on the piezoelectric drive system besides the nonlinear hysteresis. In order to contain these effects, the piezoelectric drive system can be described as where v n and v d represent all the nonlinear effects and external disturbances, (2), the dynamic model of a mechatronic system (piezoelectric actuator) can be generalized to a second-order system, where u denotes the control input and x denotes the position. Design the control input u as a linear state feedback controller, which is commonly used in a motion controller provided by the manufacturer, i.e., where a [ 0, and K, x d , and x r denote control gain, desired trajectory, and reference trajectory, respectively. When x r ¼ x d , u is a conventional PD controller that can be rewritten as where e denotes the trajectory tracking error. Suppose that x d is a constant and ignore the disturbance vector Dðx; _ xÞ and gravity vector G(x); then, we can obtain the formula as below by considering Eqs. (5) and (3): By checking the above equation, it is straightforward to confirm the system stability with x ! x d when t ! 1. Nevertheless, the PD controller cannot achieve x ! x d if the disturbance Dðx; _ xÞ and gravity G(x) have dramatic affects on the dynamics of the system or if x d is a trajectory. Therefore, we will design x r which considers the uncertainties of the dynamics. In particular, the closed-loop dynamics can be written as below with Eq. (4).
The design of reference trajectory x r includes two learning processes: offline learning (generating x r off ) and online learning (generating x r on ), for which the details are discussed in the following.

Control strategy
The hybrid offline and online learning trajectory tracking control framework proposed in this paper is shown in Fig. 2.
Offline learning refers to learning the inverse dynamic model with a DNN so it modifies the desired trajectory to a new reference trajectory x r off . The online learning part will take the output of the offline learning as the initial reference trajectory. Then, it will use single-layer RBFNNs to obtain reference trajectory x r on to locally compensate for the unmodeled dynamics and uncertainties to further improve the tracking performance.
In this control framework, the control system can be divided into three parts. The first part is a closed-loop mechatronic system, which contains an unaccessible controller. Its input is the reference trajectory x r modified by DNN and RBFNNs, and the output is the actual trajectory x. The controller in the closed-loop mechatronic system ensures that the system can achieve a certain precision of trajectory tracking and has good feedback properties, that is, good robustness, repeatability and anti-interference properties. The second part is the DNN trajectory generation module based on offline learning. Its input is the desired trajectory x d , and the output is the modified reference trajectory x r off . The DNN with powerful approximation capability can be used to learn the inverse dynamics of the closed-loop mechatronic system, making the desired trajectory and the actual trajectory of the whole system approach the identity mapping. The trained DNN is used as an additional trajectory generation module to modify the reference trajectory to reduce the tracking error of the system. The third part is the RBFNN trajectory modification module based on online learning. Its input is the reference trajectory x r off and the output is the reference trajectory x r on . The RBFNN trajectory modification module is for repetitive trajectories, iteratively modifying the reference trajectory x r on to approximate an ideal reference trajectory x Ã r on . Thus, the repetitive disturbance is compensated for in advance. These two learning processes are elaborated in detail in the following two sections, respectively.

Control strategy
The offline NN learning part in the hybrid offline/online control framework is shown in Fig. 3.
First, the DNN trajectory generation module needs to be trained offline using training data. The offline learning control framework is divided into training phase and testing phase, and the information of actual trajectory x is used as the training input of DNN and the desired trajectory x d is the training output.
The transfer function of a closed-loop mechatronic system can be defined as where X r ðsÞ and X(s) denote x r and x in sÀdomain, respectively. If the reference trajectory generator has a transfer function G À1 ðsÞ ¼ X r ðsÞ XðsÞ , then we will obtain XðsÞ ¼ X d ðsÞ, i.e., perfect trajectory tracking. By selecting reasonable features as input, we can train a DNN model to approximate G À1 ðsÞ. This DNN can be then used to generate a reference trajectory x r off with the input of a new desired trajectory x d .

Feature selection
More states are required by the DNN for training to better approximate the characteristics of the system. Nevertheless, the increase of states leads to a large dimension of input, which requires superabundant training data as a result of the curse of dimension [39]. Therefore, we should choose the related features reasonably to minimize the dimension of the input.
In the framework of offline learning, training DNN is to learn the inverse dynamics characteristics of the closedloop mechatronic system and establish the mapping relationship between the desired trajectory and the reference trajectory, i.e., approximating G À1 ðsÞ ¼ X r ðsÞ XðsÞ , to achieve zero tracking error, i.e., x; € xÞ and _ x d , i.e., _ x by analyzing dynamics of the closed-loop system mentioned in Eq. (7). Hence, we should select the triple ðx; _ x; ; € xÞ as the DNN training input. Due to the delay of the mechatronic system, the current x r will affect the future ðx; _ x; ; € xÞ, so the future information ðx; _ x; € xÞ could also be added to training input to improve the performance [40].

Training and testing data
From the structure of Eq. (7), we know there are mass/ inertial, Coriolis and centrifugal terms, gravity and disturbance in the dynamic model of the system, which could also include significant coupling among different axes. Hence, one of the key factors to determine the effectiveness of DNN training is whether the information can fully represent the properties of a mechatronic system. In this sense, random nonuniform rational B-spline surface (NURBS) curve can be used.
We generate the random NURBS trajectory by using random control points [41], which are composed of independent variable vector and dependent variable vector, and the independent variable vector t is as follows We set Dt i ¼ 0:01 þ randð0:02; 0:04Þ, where rand(0.02, 0.04) represents a random number between 0:02 $ 0:04, and we set the dependent variable observation vector x to satisfy the normal distribution with mean and standard deviation of 25 and 10, respectively.
According to the independent variable vector t and the dependent variable vector x, the random control points sequence yðx; tÞ will be obtained as the training or testing trajectories. In the experiments, according to the movement range of the closed-loop piezoelectric drive system, the movement trajectory is mapped to 0 $ 60um, and zerophase filtering is performed to remove the peak point of the speed exceeding the limit, and finally, the training and testing trajectory is obtained.

Training of DNN
The above offline learning belongs to supervised learning, which builds a mapping between the input and output of the system with the knowledge of the desired output and given input [42]. The NN is usually trained with the backpropagation (BP) algorithm whose effect is to minimize the error between the desired and actual outputs of the NN.
Denote W l i;j as the weight which connects the j-th neuron of the l À 1-th layer and the i-th neuron of the l-th layer and b l i as the bias of the i-th neuron in the l-th layer. Then, the input of the i-th neuron from the l-th layer can be described as below where s l denotes the neuron number in the l-th layer and f represents an activation function.
We define an error function as below where m and n denote the number of the training data groups and the outputs, respectively; y k , y Ã k denote actual and desired outputs. After that, we use the gradient descent method to update the weights as below

Control strategy
While offline learning in the previous section establishes an inverse dynamics model that can generate a reference trajectory to improve the tracking performance, unmodeled dynamics or uncertainties may exist so the reference trajectory should be further modified online. For this purpose, in this section we derive an online learning algorithm using single-layer RBFNNs. The online learning RBFNN control framework proposed in this paper is shown in Fig. 4. In theory, RBFNN can fit a continuous function with arbitrary accuracy. When only one RBFNN is used, it is bound to learn the noise and disturbance mean value of the whole trajectory and cannot further reduce the error of each point. In this paper, for repetitive trajectories, a large number of small RBFNNs are used to fit the local dynamics model of each trajectory point. The control framework uses the idea of iteration to update the weight of RBFNNs according to the error between the desired value x d ðkÞ and the actual value x(k) obtained from the last run at each trajectory point and uses updated RBFNNs to generate the reference trajectory point for the next run, so that the generated reference trajectory x r on constantly approaches the ideal reference trajectory x Ã r on .

RBF neural networks
A RBFNN has an input layer, a hidden layer and an output layer [43]. In the input layer, the input signals z ¼ ½z 1 ; z 2 ; :::; z n are moved directly to the next layer. The hidden layer consists of an array of computing units, which are referred to as hidden nodes. Each neuron in the hidden layer is activated by a radial basis function. The output of the hidden layer is computed as follows: where m is the number of hidden nodes, c j ¼ ½c j1 ; :::; c jn is the center vector, b j denotes the standard deviation of the j-th radial basis function, and s j is the Gaussian function. In the output layer, the output signal is a linearly weighted combination as follows: where w j is the weight for the j-th node. In [44], it is shown that for any continuous function f ðzÞ : X z ! R, where X z & R q is a compact set, when the node number m is sufficiently large, there exists an ideal constant weight W, such that for each Ã [ 0 : where jðzÞj\ Ã is the approximation error.

Design of learning law
In this subsection, we explain how to develop an online learning algorithm to update the weights of the RBFNNs. Let us consider a desired controller with the knowledge of the system dynamics: where € x e ¼ € x d À a _ e and _ x e ¼ _ x d À ae. Note that the arguments of M, C, G, and D are omitted, where no confusion is caused. By defining the sliding error It is easy to see from the above equation that e ! 0 and thus e ! 0 when t ! 1 , indicating that trajectory tracking is achieved. Therefore, we design the controller (4) the same as in Eq. (16), i.e., u ¼ u Ã , which leads to The above equation indicates the ideal reference trajectory x Ã r on can achieve trajectory tracking without error under Fig. 4 Control framework of the proposed online neural networks learning the PD controller. However, we note that the dynamics parameters M, C, G and disturbance D are unknown. This motivates us to use a single-hidden-layer NN to approximate x Ã r on , i.e., where W denotes unknown ideal weight, S denotes an activation function, Z is NN input and is the approximation error. Therefore, the reference trajectory can be written as whereŴ is the actual weight that needs to be updated. Based on Lyapunov theory that will be elaborated in the following section, we design an update law ofŴ as below: where Q is a positive-definite matrix. MðÁÞ ¼ ðÁÞðtÞ À ðÁÞðt À TÞ where T is the time duration of a task and ðÁÞ ¼ 0 when its argument is smaller than 0.

Online learning convergence
In this subsection, we show that the proposed learning algorithm guarantees convergence. Substituting Eq. (21) into Eq. (7) and definingW ¼Ŵ À W, we obtain Let us choose a Lyapunov function candidate where vecðÁÞ is the vectorization operation. Considering the first term in J, we have Considering the skew-symmetric property, i.e., and Eq. (23), we have By taking the integral of the above equation from t À T to t, we have Now, we consider the second term in J and have Combining Eqs. (28) and (30), we have By Ineqs. (31), we have where n is the number of iterations. By setting t ¼ nT, we have which leads to Z nT 0 e T Kðe À aÞds Jð0Þ À JðnTÞ Jð0Þ ð34Þ By the definition of J in Eq. (24), we know that J(0) is bounded, so the left-hand side of the above inequality is also bounded. When n ! 1, we have e T Kðe À aÞ ! 0. As can be made arbitrarily small with a large number of RBFNN nodes, e becomes arbitrarily small which indicates almost perfect trajectory tracking.

Parameter initialization of RBFNN
Parameters w j , c j , and b j in Eq. (15) should be in an effective field of mapping, otherwise, the RBFNN will not work properly. However, it is toilsome and impractical to choose the best parameters manually. To solve this problem, gradient decent method is used to initialize the parameters.
Since the reference trajectory should be close to the desired trajectory, we use RBFNN to fit the desired trajectory so as to initialize the parameters for online learning. In particular, the desired trajectory is approximated by the RBFNN as below: The predicted output is presented as: The error function is defined as Then, using the gradient decent method, we have Mb j ðtÞ ¼ Àc

Experimental platform
To validate the effectiveness of the proposed hybrid offline and online learning trajectory tracking control method, a piezoelectric drive platform is used in experiments, as shown in Fig. 5. The piezoelectric drive platform is mainly composed of four parts: piezoelectric controller, PEA, realtime simulation controller, and a computer. The PEA used in the platform is a cylindrical low-voltage PEA PSt150/ 10/60VS15 of the Harbin Core Tomorrow Science and Technology Co., Ltd. , and its physical parameters are given in Table 1. The piezoelectric controller is the E53.B servo controller of the Harbin Core Tomorrow Science and Technology Co., Ltd., equipped with an SGS displacement sensor with sensitivity of 6um/V and measurement accuracy of 0.05um. The real-time simulation controller is the DS1103 PPC Controller Board of dSPACE GmbH. In order to evaluate the performance of trajectory tracking control method proposed in this paper, tracking error e is defined, and the control objective is to reduce e. For tracking error e, the average relative error, maximum absolute error and root mean square error are defined as follows: In the experiment, the training and testing trajectories are generated by Eq. (9), and part of the trajectory is shown in Fig. 6. The proposed control framework and the corresponding experimental implementation are shown in Fig. 7.

Offline learning experiment
According to [40], since nonlinear autoregressive neural network (NARX) has good-fitting ability for time series and allows efficient calculation, we use NARX as DNN trajectory generator. To test the offline learning control framework in Fig. 3, the experiment is performed as follows. i) The desired trajectory is input into the closed-loop PEA, and the corresponding actual trajectory is recorded. ii) According to the analysis of the feature selection and the experimental test, the actual state ðx; _ x; € xÞ at the current time and the next 8 times are selected as the input of the neural network, and the desired trajectory x d at the current time is used as the output to train the DNN to approach G À1 ðsÞ ¼ X r ðsÞ XðsÞ . iii) The trained DNN module is connected in series in front of the reference trajectory storage of the closed-loop PEA, and the desired trajectory is input to the DNN module to generate a reference trajectory.
The structure of NARX is shown in Fig. 8, and the training parameters of the neural networks are shown in Table 2.
Under the offline learning framework as shown in Fig. 3, the desired trajectory, reference trajectory generated by DNN and actual trajectory of the closed-loop PEA are shown in Fig. 9. The DNN trajectory generator modifies the desired trajectory to obtain the reference trajectory, so that the actual trajectory of the system can track the desired trajectory.
Based on the trained DNN, the following three trajectory tracking control methods are compared: PID, linear active disturbance rejection control (LADRC) [45], feedforward compensation control based on DNN inverse model (Forward-DNN) [27], DNN trajectory generation control method based on offline learning (Offline-DNN). The parameters of PID controller are Kp ¼ 0:3, Ki ¼ 1, Kd ¼ 0:05, and the parameters of LADRC are b 0 ¼ 0:3, w 0 ¼ 25, w c ¼ 70. The feedforward inverse compensation control uses DNN to establish the inverse model of the open-loop PEA as the feedforward compensator, combined with PID feedback controller. Here, we use the NURBS trajectory generated by Eq. (9) as the testing desired trajectory. The tracking errors between the desired trajectory and the actual trajectory under the three control frameworks are shown in Fig. 10 and Table 3.  As shown in Table 3, after adding the offline-trained DNN trajectory generator to the closed-loop PEA, the maximum tracking error is reduced from 1.4668um to 0.3718um, which is a reduction of 74:6%. The root mean square error is reduced from 0.4867um to 0.1255um, a reduction of 74:2%. It can also be seen from Table 3 that the Offline-DNN has a lower trajectory tracking error than PID, LADRC and Forward-DNN.

Online learning experiment
To test the effectiveness of the online learning method based on RBFNN as shown in Fig. 4, we use the same desired trajectory in the offline learning experiment, and the experiment is performed as follows. The experimental results of online learning-based RBFNN are shown in Fig. 11. As shown in Fig. 11b, the root mean square error in each iteration reduces as the iteration number increases, and the value is 0.0067um in the 10th iteration. After the tracking error reaches the target threshold, the weights of RBFNNs no longer need to be updated online, and the subsequent reference trajectory x r on will not change.

Hybrid learning experiment
In the hybrid offline/online learning control method as shown in Fig. 2, we use the reference trajectory x r off generated by offline-trained DNN as the initial reference trajectory in online learning control, which is referred to as hybrid learning. We compare the tracking error convergence of the following three trajectory tracking control methods: iterative learning control (ILC) [46], which directly alters the control signal to the plant, online learning and hybrid learning. The ILC uses a P-type learning   Fig. 12 and Table 4. As shown in Fig. 12 and Table 4, in the hybrid learning control method, the offline learning speeds up the learning process with a smaller tracking error in the first iteration. After several iterations, the tracking errors in both online learning and hybrid learning conditions converge, with a slight difference. Moreover, the trajectory tracking accuracy of offline learning is further improved by online learning. After 10 iterations, the maximum tracking error is reduced from 0.3718um to 0.0171um, which is a reduction of 95:4%, and the root mean square error is reduced from 0.1255um to 0.0050um, a reduction of 96:0%. It can also be seen from the results that the hybrid learning has faster convergence and lower tracking error than ILC.

Conclusion
In this paper, we proposed two types of reference trajectory modification methods based on neural networks for a mechatronic system, which were used to learn inverse dynamics and thus to achieve precise trajectory tracking. They were combined to formulate a hybrid offline and online learning paradigm, which includes the following features: (i) it is applicable to various mechatronic systems (a piezoelectric drive platform in this paper); (ii) it adopts an embedded feedback controller whose access is not required; (iii) the offline-trained DNN model owns the generalization capability which means it can be applied to different tasks with new trajectories; (iv) the use of small RBFNNs makes online learning efficient and robust to system uncertainties; and finally, (v) the offline learning can benefit online learning by providing a ''good'' initial  reference trajectory. Note that offline learning and online learning can be used separately for different cases, e.g., online learning can be used when a task is repetitive and offline learning can be used when a system is not affected significantly by unknown disturbances. Our future works include testing the proposed approach on a robot manipulator with high nonlinearity.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.