1 Introduction

Trajectory tracking is a fundamental problem in control of mechatronic systems, e.g., robot manipulators and piezoelectric actuators (PEAs). In most cases, there are uncertainties and external disturbances, such as friction, sensor noise and variations of payload in the operations of these systems with nonlinear dynamics. A sequence of control methods were proposed to solve these issues, such as adaptive control [1], sliding mode control [2], learning control [3, 4], and neural network control [5, 6]. Moreover, mechatronic systems with imperfections is an universal control problem. In [7], a control strategy to ensure the optimal working conditions was proposed, which focused on the effects of using chaotic vibrational signals to excite the hidden dynamics of the imperfect system. In [8], the authors focused on a paradigmatic example of imperfect electromechanical structure and developed a control method to ensure coil rotation based on the excitation of the hidden dynamics induced by imperfections, characterizing its influence on the characteristics of the control signal and the power provided to the structure. Imperfections also play an important role in the realization of robust chaos generators based on simple circuits. In [9], a strategy for estimating hidden dynamics parameters was designed and synchronization of imperfect chaotic circuits was achieved. Compared to active research on advanced control approaches in academia, classical linear controllers such as PID control are still playing a crucial part in industries for the sake of implementation simplicity. However, it is well known that PID controller behaves poorly on complex trajectories and on systems with nonlinear dynamics [10]. Therefore, it is interesting to develop a control approach that is built on top of the off-the-shelf linear controllers but improves the tracking performance. Such a control approach has two significantly favorable features. First, most of the controllers provided by the manufacturers do not allow the users to modify low-level position controller but provides access to tunable parameters and a reference trajectory. In a position control task, the desired trajectory is the trajectory predefined for the mechatronic system to actually track. The reference trajectory can be obtained by using the trajectory generator module to modify the desired trajectory and is used as the input signal of the closed-loop mechatronic system. Second, without modifying the available control architecture, the system stability can be in general ensured. In this regard, many state-of-the-art controllers that design the control input such as [11,12,13,14,15,16,17,18,19] are not applicable.

To cope with disturbances and imperfections of mechatronic systems, a lot of research effort has been made on modifying the reference trajectory to improve the tracking performance on top of an available feedback control system. A large group of these works is iterative learning control (ILC) that improves the performance of trajectory tracking with repetition of a same task and using knowledge from previous iterations [20,21,22]. Although learning convergence can be proved in rigor, the information about the system learned by ILC cannot be transferred to another task, similar to adaptive control [23].

As mechatronic systems generally have complicated dynamics, which are influenced by uncertainties [24], there is ample motivation to investigate the effectiveness of machine learning in the control of mechatronic systems [25]. Some researchers were attracted by the excellent capabilities of deep neural networks (DNNs) in function approximation and thus revisited the idea of constructing an NN model for mechatronic systems [26], especially the inverse compensation control based on NN model [27,28,29]. In [30], a polynomial fitting model based on NN was proposed to describe the inverse dynamics of hysteresis in PEA. As a feedforward compensation module, the model is combined with a single neurogenic adaptive proportional integral differential controller to reduce the trajectory tracking error caused by hysteresis in piezoelectric drive mechatronic system. Different from the traditional control framework based on NN inverse model to approximate the open-loop dynamics and modify the control signal of the plant, the offline learning control framework proposed in this paper uses DNN to approximate the inverse dynamic characteristics of the closed-loop mechatronic system and uses the trained DNN as a trajectory generator to modify the reference trajectory, so that the tracking error can be compensated for in advance without changing the structure and stability of the baseline controller. Although the offline learning method can approximate the inverse dynamics of the closed-loop mechatronic system, the DNN model is still subject to modeling error, and the tracking control accuracy needs to be further improved based on online learning.

The online learning control framework based on iterative learning in this paper is suitable for repetitive tasks and can suppress unknown uncertainties. Compared to DNN, single-hidden-layer radial basis function neural networks (RBFNNs) have the advantages of simple structure and high computational efficiency. RBFNN is simple to implement in real time, and its learning convergence and the resultant closed-loop system stability can be strictly analyzed [31, 32]. The control schemes based on RBFNN in the closed-loop control system mainly include supervisory control, model reference adaptive control, self-tuning control, etc. In [33], for a class of nonlinear systems with unknown parameters and bounded disturbances, RBFNN combined with single-parameter direct adaptive control was designed to overcome the problems caused by unknown dynamics and external disturbances in nonlinear systems. The traditional control methods based on RBFNN modify the control signal of the controlled plant, and the parameters of the RBFNN need to be updated all the time [34, 35]. Different from these works, the online learning control framework based on RBFNN proposed in this paper uses iterative method to update the parameters of the RBFNN and modify the reference trajectory until the tracking error is reduced below a target threshold. The advantages of the method proposed in this paper are as follows: (1) For the repetitive trajectory, the repetitive interference and error in the system can be suppressed. (2) It does not change the structure of the baseline controller and will not affect the stability of the closed-loop system. Thus, it can be easily applied to commercial control systems.

Based on the above discussions, this paper will investigate reference trajectory modification for mechatronic systems, by integrating a DNN for offline learning and a single-layer RBFNN for online learning. First of all, DNN is offline-trained to approximate the inverse dynamics model of mechatronic systems, and the trained DNN is used to obtain the modified reference trajectory as the input of the closed-loop mechatronic system or further modified by online learning of RBFNN. Then, we propose the online learning control framework-based RBFNN, and combined with Lyapunov function, we design the learning law of RBFNN and prove the stability of the system. The offline NN learning method learns the inverse dynamics of the closed-loop system and speeds up the online learning, which can compensate for tracking error in advance. The online learning NN method can deal with uncertainties and disturbances and thus achieve precise trajectory tracking control.

The main contribution of this paper is the hybrid offline/online learning control framework, which combines complementary advantages of DNN and a single-layer RBFNN. On the one hand, we propose the offline learning control framework with DNN as a reference trajectory generator, which is transferrable and can be used to conduct a new tracking task. Offline learning can provide an initial reference trajectory for online learning and speed up the convergence of RBFNN parameters; on the other hand, we propose online learning control framework with RBFNN to iteratively modify the reference trajectory generated by DNN as the input signal of the closed-loop mechatronic system, and prove its convergence.

The remaining structure of this paper is as below: Section 2 shows the system dynamics and transforms the control problem into mathematical models and introduces the proposed tracking control method based on hybrid offline/online NN. Sections 3 and 4 elaborate the processes of online and offline learning, respectively. Section 5 presents the results of the experiments. Section 6 concludes this work.

2 System description and control strategy

2.1 System description

According to [36,37,38], the schematic model of piezoelectric actuator illustrates a reversible transformation from electrical charge to mechanical energy, as shown in Fig. 1, where H, C, \(T_{em}\) and x denote the hysteresis effect, the capacitance, the electromechanical transducer and the output displacement of piezoelectric actuator, respectively.

Fig. 1
figure 1

Schematic model of piezoelectric actuator

The dynamic equation of piezoelectric actuator can be expressed as:

$$\begin{aligned} m_z \ddot{x} + b_z {\dot{x}} + k_zx = T_{em}(u_{in} - u_h) \end{aligned}$$
(1)

where \(u_{in}\) denotes the input voltage, \(u_h\) the voltage due to the hysteresis, and \(m_z\), \(b_z\), and \(k_z\) are the mass, damping, and stiffness of the ceramic, respectively.

In practice, there are external disturbances exerting on the piezoelectric drive system besides the nonlinear hysteresis. In order to contain these effects, the piezoelectric drive system can be described as

$$\begin{aligned} m \ddot{x} + b {\dot{x}} + kx + v_n + v_d = u_{in} \end{aligned}$$
(2)

where \(v_n\) and \(v_d\) represent all the nonlinear effects and external disturbances, \(m = m_z/T_{em}\), \(b = b_z/T_{em}\), and \(k = k_z/T_{em}\).

From Eq. (2), the dynamic model of a mechatronic system (piezoelectric actuator) can be generalized to a second-order system,

$$\begin{aligned} M(x) \ddot{x} + C(x,\dot{x}) {\dot{x}} + G(x) + D(x,\dot{x}) = u \end{aligned}$$
(3)

where u denotes the control input and x denotes the position. Design the control input u as a linear state feedback controller, which is commonly used in a motion controller provided by the manufacturer, i.e.,

$$\begin{aligned} u=-K[(\dot{x}-\dot{x}_d)+\alpha (x-x_r)] \end{aligned}$$
(4)

where \(\alpha >0\), and K, \(x_d\), and \(x_r\) denote control gain, desired trajectory, and reference trajectory, respectively. When \(x_r=x_d\), u is a conventional PD controller that can be rewritten as

$$\begin{aligned} u=-K(\dot{e}+\alpha e),~e=x-x_d \end{aligned}$$
(5)

where e denotes the trajectory tracking error. Suppose that \(x_d\) is a constant and ignore the disturbance vector \(D(x,\dot{x})\) and gravity vector G(x); then, we can obtain the formula as below by considering Eqs. (5) and (3):

$$\begin{aligned} M(x) \ddot{x} + [C(x,\dot{x})+K] {\dot{x}} + K\alpha e = 0 \end{aligned}$$
(6)

By checking the above equation, it is straightforward to confirm the system stability with \(x\rightarrow x_d\) when \(t\rightarrow \infty\). Nevertheless, the PD controller cannot achieve \(x\rightarrow x_d\) if the disturbance \(D(x,\dot{x})\) and gravity G(x) have dramatic affects on the dynamics of the system or if \(x_d\) is a trajectory. Therefore, we will design \(x_r\) which considers the uncertainties of the dynamics. In particular, the closed-loop dynamics can be written as below with Eq. (4).

$$\begin{aligned} M(x) \ddot{x} + [C(x,\dot{x})+K] {\dot{x}} + K\alpha x + G(x) + D(x,\dot{x})= K\dot{x}_d+K\alpha x_r \end{aligned}$$
(7)

The design of reference trajectory \(x_r\) includes two learning processes: offline learning (generating \({x_{r\_off}}\)) and online learning (generating \({x_{r\_on}}\)), for which the details are discussed in the following.

2.2 Control strategy

The hybrid offline and online learning trajectory tracking control framework proposed in this paper is shown in Fig. 2.

Fig. 2
figure 2

Control framework of the proposed hybrid offline and online neural networks learning

Offline learning refers to learning the inverse dynamic model with a DNN so it modifies the desired trajectory to a new reference trajectory \({x_{r\_off}}\). The online learning part will take the output of the offline learning as the initial reference trajectory. Then, it will use single-layer RBFNNs to obtain reference trajectory \({x_{r\_on}}\) to locally compensate for the unmodeled dynamics and uncertainties to further improve the tracking performance.

In this control framework, the control system can be divided into three parts. The first part is a closed-loop mechatronic system, which contains an unaccessible controller. Its input is the reference trajectory \({x_{r}}\) modified by DNN and RBFNNs, and the output is the actual trajectory x. The controller in the closed-loop mechatronic system ensures that the system can achieve a certain precision of trajectory tracking and has good feedback properties, that is, good robustness, repeatability and anti-interference properties. The second part is the DNN trajectory generation module based on offline learning. Its input is the desired trajectory \({x_d}\), and the output is the modified reference trajectory \({x_{r\_off}}\). The DNN with powerful approximation capability can be used to learn the inverse dynamics of the closed-loop mechatronic system, making the desired trajectory and the actual trajectory of the whole system approach the identity mapping. The trained DNN is used as an additional trajectory generation module to modify the reference trajectory to reduce the tracking error of the system. The third part is the RBFNN trajectory modification module based on online learning. Its input is the reference trajectory \({x_{r\_off}}\) and the output is the reference trajectory \({x_{r\_on}}\). The RBFNN trajectory modification module is for repetitive trajectories, iteratively modifying the reference trajectory \({x_{r\_on}}\) to approximate an ideal reference trajectory \(x_{r\_on}^*\). Thus, the repetitive disturbance is compensated for in advance. These two learning processes are elaborated in detail in the following two sections, respectively.

3 Offline learning

3.1 Control strategy

The offline NN learning part in the hybrid offline/online control framework is shown in Fig. 3.

Fig. 3
figure 3

Control framework of the proposed offline neural networks learning

First, the DNN trajectory generation module needs to be trained offline using training data. The offline learning control framework is divided into training phase and testing phase, and the information of actual trajectory x is used as the training input of DNN and the desired trajectory \({x_d}\) is the training output.

The transfer function of a closed-loop mechatronic system can be defined as

$$\begin{aligned} G(s)=\frac{X(s)}{X_r(s)} \end{aligned}$$
(8)

where \(X_r(s)\) and X(s) denote \(x_r\) and x in \(s-\)domain, respectively. If the reference trajectory generator has a transfer function \(G^{-1}(s)=\frac{X_r(s)}{X(s)}\), then we will obtain \(X(s)=X_d(s)\), i.e., perfect trajectory tracking. By selecting reasonable features as input, we can train a DNN model to approximate \(G^{-1}(s)\). This DNN can be then used to generate a reference trajectory \({x_{r\_off}}\) with the input of a new desired trajectory \(x_d\).

3.2 Feature selection

More states are required by the DNN for training to better approximate the characteristics of the system. Nevertheless, the increase of states leads to a large dimension of input, which requires superabundant training data as a result of the curse of dimension [39]. Therefore, we should choose the related features reasonably to minimize the dimension of the input.

In the framework of offline learning, training DNN is to learn the inverse dynamics characteristics of the closed-loop mechatronic system and establish the mapping relationship between the desired trajectory and the reference trajectory, i.e., approximating \(G^{-1}( s)=\frac{X_r(s)}{X(s)}\), to achieve zero tracking error, i.e., \({x_d} = x\). It is noticed that \({x_r}\) is related to \((x,{\dot{x}},\ddot{x})\) and \(\dot{x}_d\), i.e., \(\dot{x}\) by analyzing dynamics of the closed-loop system mentioned in Eq. (7). Hence, we should select the triple \((x,{\dot{x}},,\ddot{x})\) as the DNN training input. Due to the delay of the mechatronic system, the current \({x_r}\) will affect the future \((x,{\dot{x}},,\ddot{x})\), so the future information \((x,{\dot{x}},\ddot{x})\) could also be added to training input to improve the performance [40].

3.3 Training and testing data

From the structure of Eq. (7), we know there are mass/inertial, Coriolis and centrifugal terms, gravity and disturbance in the dynamic model of the system, which could also include significant coupling among different axes. Hence, one of the key factors to determine the effectiveness of DNN training is whether the information can fully represent the properties of a mechatronic system. In this sense, random nonuniform rational B-spline surface (NURBS) curve can be used.

We generate the random NURBS trajectory by using random control points [41], which are composed of independent variable vector and dependent variable vector, and the independent variable vector \(\mathbf {t}\) is as follows

$$\begin{aligned} \mathbf {t} = [{t_0},{t_0} + \Delta {t_1}, \cdots ,{t_0} + \sum \limits _{i = 1}^n {\Delta {t_i}}] \end{aligned}$$
(9)

We set \(\Delta {t_i} = 0.01 + rand(0.02, 0.04)\), where rand(0.02, 0.04) represents a random number between \(0.02 \sim 0.04\), and we set the dependent variable observation vector \(\mathbf {x}\) to satisfy the normal distribution with mean and standard deviation of 25 and 10, respectively.

According to the independent variable vector \(\mathbf {t}\) and the dependent variable vector \(\mathbf {x}\), the random control points sequence \(y(\mathbf {x},\mathbf {t})\) will be obtained as the training or testing trajectories. In the experiments, according to the movement range of the closed-loop piezoelectric drive system, the movement trajectory is mapped to \(0\sim 60um\), and zero-phase filtering is performed to remove the peak point of the speed exceeding the limit, and finally, the training and testing trajectory is obtained.

3.4 Training of DNN

The above offline learning belongs to supervised learning, which builds a mapping between the input and output of the system with the knowledge of the desired output and given input [42]. The NN is usually trained with the backpropagation (BP) algorithm whose effect is to minimize the error between the desired and actual outputs of the NN.

Denote \(W_{i,j}^{l}\) as the weight which connects the j-th neuron of the \(l-1\)-th layer and the i-th neuron of the l-th layer and \(b_i^{l}\) as the bias of the i-th neuron in the l-th layer. Then, the input of the i-th neuron from the l-th layer can be described as below

$$\begin{aligned} net_i^{l}=\sum _{j=1}^{s_l-1} W_{ij}^{l} h_j^{l-1}+b_i^{l}, ~h_i^{l}=f(net_i^{l}) \end{aligned}$$
(10)

where \(s_l\) denotes the neuron number in the l-th layer and f represents an activation function.

We define an error function as below

$$\begin{aligned} E=\frac{1}{m} \sum _{i=1}^{m} E(i), ~ E(i)=\frac{1}{2} \sum _{k=1}^{n} (y_k(i)-y^*_k(i))^2 \end{aligned}$$
(11)

where m and n denote the number of the training data groups and the outputs, respectively; \(y_k\), \(y^*_k\) denote actual and desired outputs. After that, we use the gradient descent method to update the weights as below

$$\begin{aligned} W_{ij}^{l}=W_{ij}^{l}-\beta \frac{\partial E}{\partial W_{ij}^{l}},~ b_i^{l}=b_i^{l}-\beta \frac{\partial E}{\partial b_i^{l}} \end{aligned}$$
(12)

where \(\beta >0\).

4 Online learning

4.1 Control strategy

While offline learning in the previous section establishes an inverse dynamics model that can generate a reference trajectory to improve the tracking performance, unmodeled dynamics or uncertainties may exist so the reference trajectory should be further modified online. For this purpose, in this section we derive an online learning algorithm using single-layer RBFNNs.

Fig. 4
figure 4

Control framework of the proposed online neural networks learning

The online learning RBFNN control framework proposed in this paper is shown in Fig. 4. In theory, RBFNN can fit a continuous function with arbitrary accuracy. When only one RBFNN is used, it is bound to learn the noise and disturbance mean value of the whole trajectory and cannot further reduce the error of each point. In this paper, for repetitive trajectories, a large number of small RBFNNs are used to fit the local dynamics model of each trajectory point. The control framework uses the idea of iteration to update the weight of RBFNNs according to the error between the desired value \({x_d}(k)\) and the actual value x(k) obtained from the last run at each trajectory point and uses updated RBFNNs to generate the reference trajectory point for the next run, so that the generated reference trajectory \({x_{r\_on}}\) constantly approaches the ideal reference trajectory \(x_{r\_on}^*\).

4.2 RBF neural networks

A RBFNN has an input layer, a hidden layer and an output layer [43]. In the input layer, the input signals \(z=[z_1,z_2,...,z_n]\) are moved directly to the next layer. The hidden layer consists of an array of computing units, which are referred to as hidden nodes. Each neuron in the hidden layer is activated by a radial basis function. The output of the hidden layer is computed as follows:

$$\begin{aligned} s_j(z)=exp\left(-\frac{(||z-c_j||)^2}{2 b_j^2}\right) ,~ j=1,...,m \end{aligned}$$
(13)

where m is the number of hidden nodes, \(c_j=[c_{j1},...,c_{jn}]\) is the center vector, \(b_j\) denotes the standard deviation of the j-th radial basis function, and \(s_j\) is the Gaussian function. In the output layer, the output signal is a linearly weighted combination as follows:

$$\begin{aligned} y(z)=\sum _{j=1}^{m} w_{j} s_j(z) \end{aligned}$$
(14)

where \(w_{j}\) is the weight for the j-th node.

In [44], it is shown that for any continuous function \(f(z):\Omega _z \rightarrow R\), where \(\Omega _z \subset R^q\) is a compact set, when the node number m is sufficiently large, there exists an ideal constant weight W, such that for each \(\epsilon ^*>0\) :

$$\begin{aligned} f(z)=\sum _{j=1}^{m} w_j s_j (z)=W^T S(z)+\epsilon (z), \forall z \in \Omega _z \end{aligned}$$
(15)

where \(\vert \epsilon (z)\vert < {\epsilon ^*}\) is the approximation error.

4.3 Design of learning law

In this subsection, we explain how to develop an online learning algorithm to update the weights of the RBFNNs. Let us consider a desired controller with the knowledge of the system dynamics:

$$\begin{aligned} u^*= -K ({\dot{e}}+\alpha e)+M \ddot{x_e}+C \dot{x_e}+G+D \end{aligned}$$
(16)

where \(\ddot{x}_e=\ddot{x}_d-\alpha {\dot{e}}\) and \({\dot{x}}_e={\dot{x}}_d-\alpha e\). Note that the arguments of M, C, G, and D are omitted, where no confusion is caused. By defining the sliding error

$$\begin{aligned} \varepsilon ={\dot{e}}+\alpha e \end{aligned}$$
(17)

and substituting Eq. (16) into Eq. (3), we have desired closed-loop dynamics

$$\begin{aligned} M {\dot{\varepsilon }}+(C+K) \varepsilon =0 \end{aligned}$$
(18)

It is easy to see from the above equation that \(\varepsilon \rightarrow 0\) and thus \(e\rightarrow 0\) when \(t\rightarrow \infty\) , indicating that trajectory tracking is achieved. Therefore, we design the controller (4) the same as in Eq. (16), i.e., \(u=u^*\), which leads to

$$\begin{aligned} x_{r\_on}^*=\frac{1}{\alpha K } (M \ddot{x_d}+C \dot{x_d}+G+D)+x_d \end{aligned}$$
(19)

The above equation indicates the ideal reference trajectory \(x_{r\_on}^*\) can achieve trajectory tracking without error under the PD controller. However, we note that the dynamics parameters M, C, G and disturbance D are unknown. This motivates us to use a single-hidden-layer NN to approximate \(x_{r\_on}^*\), i.e.,

$$\begin{aligned} x_{r\_on}^*=W^T S(Z)+\epsilon \end{aligned}$$
(20)

where W denotes unknown ideal weight, S denotes an activation function, Z is NN input and \(\epsilon\) is the approximation error. Therefore, the reference trajectory can be written as

$$\begin{aligned} x_{r\_on}={\hat{W}}^T S(Z) \end{aligned}$$
(21)

where \({{\hat{W}}}\) is the actual weight that needs to be updated. Based on Lyapunov theory that will be elaborated in the following section, we design an update law of \({{\hat{W}}}\) as below:

$$\begin{aligned} \triangle {\hat{W}}=- \alpha K Q \varepsilon ^T S(Z) \end{aligned}$$
(22)

where Q is a positive-definite matrix. \(\triangle (\cdot )=(\cdot )(t)-(\cdot )(t-T)\) where T is the time duration of a task and \((\cdot )=0\) when its argument is smaller than 0.

4.4 Online learning convergence

In this subsection, we show that the proposed learning algorithm guarantees convergence. Substituting Eq. (21) into Eq. (7) and defining \({\tilde{W}}={\hat{W}}-W\), we obtain

$$\begin{aligned} M {\dot{\varepsilon }}+(C+K) \varepsilon =K \alpha ({\tilde{W}}^T S-\epsilon ) \end{aligned}$$
(23)

Let us choose a Lyapunov function candidate

$$\begin{aligned} \begin{aligned} J =J_\varepsilon +J_W =\frac{1}{2} \varepsilon ^T M \varepsilon +\frac{1}{2}\int _{t-T}^{t} \text{ vec}^T({\tilde{W}}) Q^{-1} \text{ vec }({\tilde{W}}) d \tau \end{aligned} \end{aligned}$$
(24)

where vec\((\cdot )\) is the vectorization operation. Considering the first term in J, we have

$$\begin{aligned} \dot{J_\varepsilon }=\varepsilon ^T M {\dot{\varepsilon }}+\frac{1}{2} \varepsilon ^T {\dot{M}} \varepsilon \end{aligned}$$
(25)

Considering the skew-symmetric property, i.e.,

$$\begin{aligned} z^T {\dot{M}} z =2 z^T C z, ~\forall z \end{aligned}$$
(26)

and Eq. (23), we have

$$\begin{aligned} \begin{aligned} \dot{J_\varepsilon }=\varepsilon ^T (M {\dot{\varepsilon }}+C \varepsilon ) =\varepsilon ^T [-K \varepsilon +K \alpha ({\tilde{W}}^T S-\epsilon )] \end{aligned} \end{aligned}$$
(27)

By taking the integral of the above equation from \(t-T\) to t, we have

$$\begin{aligned} \triangle J_\varepsilon = \int _{t-T}^{t} \varepsilon ^T[-K \varepsilon +K \alpha ({\tilde{W}}^T S-\epsilon )] d \tau \end{aligned}$$
(28)

Now, we consider the second term in J and have

$$\begin{aligned} \begin{aligned} \triangle J_W&=J_W(t)-J_W(t-T) \\&=\int _{t-T}^{t} (\text{ vec}^T({\tilde{W}})(t) Q^{-1} \text{ vec }({\tilde{W}})(t) \\&\quad - \text{ vec}^T({\tilde{W}})(t) Q^{-1} \text{ vec }({\tilde{W}})(t-T) \\&\quad +\text{ vec}^T{\tilde{W}}(t) Q^{-1} \text{ vec }({\tilde{W}})(t-T) \\&\quad - \text{ vec}^T({\tilde{W}})(t-T) Q^{-1} \text{ vec }({\tilde{W}})(t-T)) d \tau \\&=\frac{1}{2} \int _{t-T}^{t} (2 \text{ vec}^T({\tilde{W}})(t) - \text{ vec}^T(\triangle {\tilde{W}})) Q^{-1} \text{ vec }(\triangle {\tilde{W}}) d \tau \\&\le \int _{t-T}^{t} \text{ vec}^T({\tilde{W}})(t) Q^{-1} \text{ vec }(\triangle {\tilde{W}}) d \tau \end{aligned} \end{aligned}$$
(29)

where \(\triangle {\tilde{W}}={\tilde{W}}(t)-{\tilde{W}}(t-T)\). Substituting update law in Eq. (22) into the above inequality, we obtain

$$\begin{aligned} \begin{aligned} \triangle J_W \le \alpha \int _{t-T}^{t}\varepsilon ^T K{\tilde{W}}^T(t) S d \tau \end{aligned} \end{aligned}$$
(30)

Combining Eqs. (28) and (30), we have

$$\begin{aligned} \triangle J=\triangle J_\varepsilon +\triangle J_W \le \int _{t-T}^{t} -\varepsilon ^T K (\varepsilon -\alpha \epsilon )d \tau \end{aligned}$$
(31)

By Ineqs. (31), we have

$$\begin{aligned} J(t)-J(t-nT)\le \int _{t-nT}^{t} -\varepsilon ^T K (\varepsilon -\alpha \epsilon )d \tau \end{aligned}$$
(32)

where n is the number of iterations. By setting \(t=nT\), we have

$$\begin{aligned} J(nT)-J(0)\le \int _{0}^{nT} -\varepsilon ^T K (\varepsilon -\alpha \epsilon )d \tau \end{aligned}$$
(33)

which leads to

$$\begin{aligned} \int _{0}^{nT} \varepsilon ^T K (\varepsilon -\alpha \epsilon )d \tau \le J(0)-J(nT)\le J(0) \end{aligned}$$
(34)

By the definition of J in Eq. (24), we know that J(0) is bounded, so the left-hand side of the above inequality is also bounded. When \(n\rightarrow \infty\), we have \(\varepsilon ^T K (\varepsilon -\alpha \epsilon )\rightarrow 0\). As \(\epsilon\) can be made arbitrarily small with a large number of RBFNN nodes, \(\varepsilon\) becomes arbitrarily small which indicates almost perfect trajectory tracking.

4.5 Parameter initialization of RBFNN

Parameters \(w_j\), \(c_j\), and \(b_j\) in Eq. (15) should be in an effective field of mapping, otherwise, the RBFNN will not work properly. However, it is toilsome and impractical to choose the best parameters manually. To solve this problem, gradient decent method is used to initialize the parameters.

Since the reference trajectory should be close to the desired trajectory, we use RBFNN to fit the desired trajectory so as to initialize the parameters for online learning. In particular, the desired trajectory is approximated by the RBFNN as below:

$$\begin{aligned} x_d=W^T S(x)+\epsilon \end{aligned}$$
(35)

The predicted output is presented as:

$$\begin{aligned} {\hat{x}}_d={{\hat{W}}}^T S(x) \end{aligned}$$
(36)

The error function is defined as

$$\begin{aligned} E(t)=\frac{1}{2} (x_d-{\hat{x}}_d)^2 \end{aligned}$$
(37)

Then, using the gradient decent method, we have

$$\begin{aligned} \begin{aligned} \triangle w_j(t)&=- \gamma \frac{\partial E}{\partial w_j}=\gamma (x_d-{\hat{x}}_d) s_j(x)\\ \triangle b_j(t)&=- \gamma \frac{\partial E}{\partial b_j}=\gamma (x_d-{\hat{x}}_d) w_j s_j(x) \frac{||x-c_j||^2}{b_j^3}\\ \triangle c_{ji}(t)&=- \gamma \frac{\partial E}{\partial c_{j}}=\gamma (x_r-\hat{x_r}) w_j s_j(x) \frac{x_j-c_{j}}{b_j^2} \end{aligned} \end{aligned}$$
(38)

5 Experiments

5.1 Experimental platform

To validate the effectiveness of the proposed hybrid offline and online learning trajectory tracking control method, a piezoelectric drive platform is used in experiments, as shown in Fig. 5. The piezoelectric drive platform is mainly composed of four parts: piezoelectric controller, PEA, real-time simulation controller, and a computer. The PEA used in the platform is a cylindrical low-voltage PEA PSt150/10/60VS15 of the Harbin Core Tomorrow Science and Technology Co., Ltd. , and its physical parameters are given in Table 1. The piezoelectric controller is the E53.B servo controller of the Harbin Core Tomorrow Science and Technology Co., Ltd., equipped with an SGS displacement sensor with sensitivity of 6um/V and measurement accuracy of 0.05um. The real-time simulation controller is the DS1103 PPC Controller Board of dSPACE GmbH.

Fig. 5
figure 5

Closed-loop PEA experiment platform

Table 1 Physical parameters of PEA

In order to evaluate the performance of trajectory tracking control method proposed in this paper, tracking error e is defined, and the control objective is to reduce e. For tracking error e, the average relative error, maximum absolute error and root mean square error are defined as follows:

$$\begin{aligned} {e}&= x - {x_d} \end{aligned}$$
(39)
$$\begin{aligned} {e_{\max }}&= \max (\left\| {x - {x_d}} \right\| ) \end{aligned}$$
(40)
$$\begin{aligned} {e_{rms}}&=\sqrt{\frac{1}{n}\sum \limits _{k=1}^n{\left\| x\left( k \right) - x_d\left( k \right) \right\| ^2}} \end{aligned}$$
(41)
$$\begin{aligned} {e_{raver}}&= \frac{1}{n}\sum \limits _{k = 1}^n {\frac{{\left\| {x(k)} - {x_d}(k) \right\| }}{{{x_d}(k)}}} \times 100\% \end{aligned}$$
(42)

In the experiment, the training and testing trajectories are generated by Eq. (9), and part of the trajectory is shown in Fig. 6. The proposed control framework and the corresponding experimental implementation are shown in Fig. 7.

Fig. 6
figure 6

NURBS trajectory

Fig. 7
figure 7

The experiments block diagram of the proposed control framework

5.2 Offline learning experiment

According to [40], since nonlinear autoregressive neural network (NARX) has good-fitting ability for time series and allows efficient calculation, we use NARX as DNN trajectory generator. To test the offline learning control framework in Fig. 3, the experiment is performed as follows. i) The desired trajectory is input into the closed-loop PEA, and the corresponding actual trajectory is recorded. ii) According to the analysis of the feature selection and the experimental test, the actual state \((x,{\dot{x}},\ddot{x})\) at the current time and the next 8 times are selected as the input of the neural network, and the desired trajectory \({x_d}\) at the current time is used as the output to train the DNN to approach \(G^{-1}( s)=\frac{X_r(s)}{X(s)}\). iii) The trained DNN module is connected in series in front of the reference trajectory storage of the closed-loop PEA, and the desired trajectory is input to the DNN module to generate a reference trajectory.

The structure of NARX is shown in Fig. 8, and the training parameters of the neural networks are shown in Table 2.

Fig. 8
figure 8

Neural networks structure

Table 2 Training parameters of the neural networks

Under the offline learning framework as shown in Fig. 3, the desired trajectory, reference trajectory generated by DNN and actual trajectory of the closed-loop PEA are shown in Fig. 9. The DNN trajectory generator modifies the desired trajectory to obtain the reference trajectory, so that the actual trajectory of the system can track the desired trajectory.

Fig. 9
figure 9

Trajectory of closed-loop PEA under offline learning. (a) Whole trajectory. (b) Local zoom trajectory

Based on the trained DNN, the following three trajectory tracking control methods are compared: PID, linear active disturbance rejection control (LADRC) [45], feedforward compensation control based on DNN inverse model (Forward-DNN) [27], DNN trajectory generation control method based on offline learning (Offline-DNN). The parameters of PID controller are \(Kp=0.3\), \(Ki=1\), \(Kd=0.05\), and the parameters of LADRC are \(b_0=0.3\), \(w_0=25\), \(w_c=70\). The feedforward inverse compensation control uses DNN to establish the inverse model of the open-loop PEA as the feedforward compensator, combined with PID feedback controller. Here, we use the NURBS trajectory generated by Eq. (9) as the testing desired trajectory. The tracking errors between the desired trajectory and the actual trajectory under the three control frameworks are shown in Fig. 10 and Table 3.

Fig. 10
figure 10

Tracking error under different trajectory tracking control methods

Table 3 Tracking error under different control methods

As shown in Table 3, after adding the offline-trained DNN trajectory generator to the closed-loop PEA, the maximum tracking error is reduced from 1.4668um to 0.3718um, which is a reduction of \(74.6\%\). The root mean square error is reduced from 0.4867um to 0.1255um, a reduction of \(74.2\%\). It can also be seen from Table 3 that the Offline-DNN has a lower trajectory tracking error than PID, LADRC and Forward-DNN.

5.3 Online learning experiment

To test the effectiveness of the online learning method based on RBFNN as shown in Fig. 4, we use the same desired trajectory in the offline learning experiment, and the experiment is performed as follows. (i) Input the desired trajectory \({x_d}\) as the initial reference trajectory and initialize the weight parameters of RBFNNs. (ii) Run the closed-loop system with the reference trajectory. (iii) Calculate the error between the desired trajectory and the actual trajectory; if the error is less than the target threshold, stop updating the weight of RBFNNs, and save the current reference trajectory to the reference trajectory storage as the final input signal of the closed-loop PEA; otherwise, the weight of RBFNNs is updated according to the error. (iv) Generate the next reference trajectory with the updated RBFNN module. (v) Repeat step (ii). The parameters in Eq. (22) are set as: \(\alpha =110,K=0.04,Q=0.0001\).

The experimental results of online learning-based RBFNN are shown in Fig. 11. As shown in Fig. 11b, the root mean square error in each iteration reduces as the iteration number increases, and the value is 0.0067um in the 10th iteration. After the tracking error reaches the target threshold, the weights of RBFNNs no longer need to be updated online, and the subsequent reference trajectory \({x_{r\_on}}\) will not change.

Fig. 11
figure 11

Tracking error of closed-loop PEA under online learning. (a) Tracking error. (b) RMSE tracking error

5.4 Hybrid learning experiment

In the hybrid offline/online learning control method as shown in Fig. 2, we use the reference trajectory \({x_{r\_off}}\) generated by offline-trained DNN as the initial reference trajectory in online learning control, which is referred to as hybrid learning. We compare the tracking error convergence of the following three trajectory tracking control methods: iterative learning control (ILC) [46], which directly alters the control signal to the plant, online learning and hybrid learning. The ILC uses a P-type learning law, and the learning rate is 0.8. The results of tracking error convergence are shown in Fig. 12 and Table 4.

Fig. 12
figure 12

Comparison of tracking error convergence under different control methods after 10 iterations. (a) MAX tracking error. (b) RMSE tracking error. (b) Raver tracking error

Table 4 Tracking error under different control methods in the 10th iteration

As shown in Fig. 12 and Table 4, in the hybrid learning control method, the offline learning speeds up the learning process with a smaller tracking error in the first iteration. After several iterations, the tracking errors in both online learning and hybrid learning conditions converge, with a slight difference. Moreover, the trajectory tracking accuracy of offline learning is further improved by online learning. After 10 iterations, the maximum tracking error is reduced from 0.3718um to 0.0171um, which is a reduction of \(95.4\%\), and the root mean square error is reduced from 0.1255um to 0.0050um, a reduction of \(96.0\%\). It can also be seen from the results that the hybrid learning has faster convergence and lower tracking error than ILC.

6 Conclusion

In this paper, we proposed two types of reference trajectory modification methods based on neural networks for a mechatronic system, which were used to learn inverse dynamics and thus to achieve precise trajectory tracking. They were combined to formulate a hybrid offline and online learning paradigm, which includes the following features: (i) it is applicable to various mechatronic systems (a piezoelectric drive platform in this paper); (ii) it adopts an embedded feedback controller whose access is not required; (iii) the offline-trained DNN model owns the generalization capability which means it can be applied to different tasks with new trajectories; (iv) the use of small RBFNNs makes online learning efficient and robust to system uncertainties; and finally, (v) the offline learning can benefit online learning by providing a “good” initial reference trajectory. Note that offline learning and online learning can be used separately for different cases, e.g., online learning can be used when a task is repetitive and offline learning can be used when a system is not affected significantly by unknown disturbances. Our future works include testing the proposed approach on a robot manipulator with high nonlinearity.