1 Introduction

Predicting future behaviors in dynamic systems is a complicated task. Many advanced control techniques could be used to achieve this goal. The system modeling can be done through different methods, applying the laws of physics that govern the behavior of the system is one of the most used [1]. In order to apply this method, it is necessary to have sufficient knowledge of the system to determine the parameters of physic equations, which in most cases is difficult to achieve. In general, modeling or identification of systems is to find relationships between the inputs and outputs of the system. These relationships can often be represented as a convolution operation, such as time-invariant linear systems. Time series forecasting methods can be used for the system identification, but the inputs and outputs of the system are joined into one time series, and the convolution relation lost [2].

Neural networks are black box models, because they only need to know input and output data. This data-driven modeling method still has problems. There are still not effective methods for determining the hyper-parameters of the neural network, such as the number of hidden layers and the number of nodes in each of the layers [3]. The universal approximation theorem ensures that a neural model can approximate virtually any continuous system using some number of nodes in one hidden layer. However, increasing the number of nodes in this layer leads to overfitting problem [4]. As an alternative, instead of increasing the number of nodes in the hidden layer, it is recommended to add more hidden layers. This is the deep neural networks [5]. In recent years the models of deep learning have been performed in many areas, and they archive better results than existing models theoretically and practically [6].

One of the most popular deep learning models is convolutional neural network (CNN) [7], which is used mainly in images processing. CNNs are used for image classification in [8,9,10]. The CNNs give almost the highest performances, and they have been applied in medical images [11, 12]. There are also applications in other areas, such as time series prediction [13, 14]. Many variation architectures of CNNs have been presented. [15] presents a CNN in the frequency domain, which differs from the classical CNNs from the 90’s [3, 16].

A very common problem of systems modeling are large uncertainties affecting them, such as unknown parts of the data, which are commonly up to \(50\%\) of total data, It can be produced by any cause, a fault, a human error, an accidental or provoked deletion, etc. Data loss is most often caused by physical failure of sensors, followed by human error. The others large uncertainties are the external disturbance,such as the measurement noise which can be around \(20\%\) of the output amplitude, this type of noise is generated mainly by a malfunction of the sensors used in the system and it can be periodic, intermittent, or random. Data processing can reduce these effects on the process. We can also reduce these influences through our neural networks, for example multilayer perceptions (MLP) can increase theirs nodes and layer numbers. But we have to train a greater number of parameters, and there is overfitting problem. CNNs can reduce noise directly in the convolutional layers. CNN are use more frequently in the others tasks than nonlinear dynamic system modeling [17].

System identification can be formed into time series prediction [18, 19]. In [20], the time series is in the self-regulating model to improve performance. [21] presents a model based on neural networks to estimate COVID-19 cases in the following months. In [22] proposes a new LSTM network architecture for modeling dynamic systems. In this same area, very few publications are presented in which a CNN is used for this purpose. In [23], the identification of the nonlinear system is transferred to time series using ARMAX model. In our previous work [24], a modeling scheme for nonlinear systems is proposed using CNN with real values. In [25], CNN is trained to model uncertainties in the dynamic system. These uncertainties are over passed with the CNN because of the properties of share weights and sparse connectivity.

Most of deep learning algorithms work with real values, such as multilayer networks (MLP), support vector machine (SVM), long-short term memories (LSTM) and CNN. Complex valued neural networks have many advantage over classical NNs, such as complex valued MLP [26], complex valued RBF networks [27] and complex valued convolutional neural networks (CVCNN) [28, 29].

In order to use complex numbers, the normal neural networks have to be modified. The modifications are in the structure of NNs and in learning methods [30]. A general training algorithm for the complex valued NN is introduced in [28]. In [31], the complex valued NN is regarded as two real valued NNs, which expand the learning information. It is more difficult to give theoretical analysis for complex values [32]. For image processing, CVCNN is given for image denoising [33] and for image classification [34]. In [35] CVNN is proposed as classifier based on range-beam-Doppler tensor. In [36] classification of human activities are proposed using CVCNN, where the motion of human are captured and transformed to frequency domain. To the best of our knowledge, there are not published results using CVCNN for dynamic system modeling [37].

From our point of view, the main advantage of the CVCNN is that we can have same structure as the real valued CNN, but for the cases of missing data and big noises, CVCNN will give better performances. In order to create a neural network who can model nonlinear systems with missing data and large uncertainties with CNN, in this paper, we make the following contributions:

  1. (1)

    Novel architecture of CVCNN for non linear dynamic system modeling is proposed.

  2. (2)

    Learning method of the CVCNN is given.

To show the advantages of the CVCNN, comparisons with the other state-of-art methods are made by using several benchmarks. The comparison results show that: (1) the novel model has better modeling performances than the other algorithms in the cases of large uncertainties; (2) the proposed method has fast convergence speed, and it can be realized easily.

2 Nonlinear Dynamic System Modeling with Complex Valued CNN

2.1 Deep Nneural Networks for Dynamic System Modeling

Let consider an unknown discrete-time nonlinear system

$$\begin{aligned} {\bar{x}}(k+1)=f\left[ {\bar{x}}\left( k\right) ,u\left( k\right) \right] ,\quad y(k)=g\left[ {\bar{x}}\left( k\right) \right] \end{aligned}$$
(1)

where \(u\left( k\right) \ \)is the input vector, \({\bar{x}}\left( k\right) \) is the internal state vector, \(y\left( k\right) \) is the output vector. f and g are general nonlinear smooth functions, \(f,g\in C^{\infty }\). Denoting

$$\begin{aligned} \begin{aligned} Y(k)&=\left[ y\left( k\right) ,y\left( k+1\right) ,\cdots y\left( k+n-1\right) \right] \\ U(k)&=\left[ u\left( k\right) ,u\left( k+1\right) ,\cdots u\left( k+n-2\right) \right] \end{aligned} \end{aligned}$$

if \(\frac{\partial Y}{\partial {\bar{x}}}\) is non-singular at \({\bar{x}}=0\) and \(U=0,\) leads to the following nonlinear auto regressive exogenous model (NARX)

$$\begin{aligned} y(k)=\varUpsilon \left[ x\left( k\right) \right] \end{aligned}$$
(2)

where \(\varUpsilon \left( \cdot \right) \) represent an unknown nonlinear difference equation corresponding to the plant dynamics,

$$\begin{aligned} x\left( k\right) =[y\left( k-1\right) ,\cdots y\left( k-m_{y}\right) ,u\left( k\right) ,\cdots u\left( k-m_{u}\right) ]^{T} \end{aligned}$$
(3)

\(x\left( k\right) =\left[ x_{1}\cdots x_{l}\right] ^{T},\) \(l=m_{y} +m_{u}+1,\) \(k=1\cdots N,\) N is the total data for this particular system.

To model the nonlinear system (2) two types of models can be used:

  1. (1)

    The first type is

    $$\begin{aligned} \begin{aligned}&{\hat{y}}(k)=N[u\left( k\right) ,\cdots ,u\left( k-n_{u}\right) ]\\&\text {or }{\hat{y}}(k)=N[{\hat{y}}(k-1),\cdots ,{\hat{y}}(k-n_{y}),\\&u\left( k\right) ,\cdots ,u\left( k-n_{u}\right) ] \end{aligned} \end{aligned}$$
    (4)

    where \({\hat{y}}(k)\) is the model output, \(N\left[ \cdot \right] \) is the complex valued CNN. \(n_{y}\) and \(n_{u}\) are the regression orders. These representation is called the parallel model [38]. Here the input to the CNN does not include the real output \(y\left( k-1\right) ,\cdots \)

  2. (2)

    The second type is

    $$\begin{aligned} \begin{aligned}&{\hat{y}}(k)=N[y\left( k-1\right) ,\cdots ,y\left( k-n_{y}\right) ,\\&u\left( k\right) ,\cdots ,u\left( k-n_{u}\right) ] \end{aligned} \end{aligned}$$
    (5)

    It is the series-parallel model [38]. The data regression is the same as the NARX model (3), only \(n_{u}\ne m_{u},\) \(n_{y}\ne m_{y}.\)

Because the parallel model (4) do not use the output \(y\left( k-i\right) ,\) the missing data and the disturbances in the output \(y\left( k-i\right) \) does not affect the neural model. In this paper, we will use the parallel model (4).

The biggest difficult of the parallel model (4) is that it cannot model complex nonlinear system, such as missing data and large disturbances, this is because it is not enough to know only the input of the system. By not knowing the output data, even when the data is presented smoothly, the models used tend to diverge [39]. But we will use complex valued CNN to overcome this problem, the fact of adding the imaginary part to the network, leads to being able to extract different features of the system helping a better understanding by the network.

2.2 Complex Valued CNN

The architecture of the convolutional neural networks (CNN) for system identification is shown in Fig. 1. It is in cascade connection. Each CNN cell includes convolution and sub-sampling operations. The last layer is fully connected with synaptic weights. For complex valued CNN (CVCNN), all parameters (weights) are complex-value.

Fig. 1
figure 1

Three-layer complex valued CNN

CVCNN is an efficient modeling option for nonlinear systems, because the convolution operation in CNN is the same as the input-output relationship of dynamic system [24], and the complex-based representation of the neural network increases the potential of the algorithm. The advantages of using CVCNN for modeling dynamic systems are:

  1. (1)

    To model a complex nonlinear system, large-scale neural networks are needed. The CNN model uses sparse connectivity and shared weight to reduce the number of model parameters. This property is one of the main advantaged of real valued CNN. A CNN does not depend on the type of data used, but on the structure of the network.

  2. (2)

    Each convolutional filter scans the entire data regardless of where the information is in the data. When acquiring features is in the complex domain, the created feature maps are much richer with system information, even the input data are real. Each of these feature maps has some particular system property through the convolution operation with complex values, enlarge by the imaginary part, This can be seen as if more filters are added to the network, corresponding to the imaginary part of them, causing more features to be acquired from the data, providing more information for the modeling of the system [40].

  3. (3)

    The multi-level grouping, such as the maximum operation on the sub sampling layer, has tolerance to the noise in the data. This makes CNN’s model very robust. The calculations carried out in the convolutional layers by the CVCNN obtain more information from the physical system, this due to the fact that the filters have complex components, which allows different features to be generated when performing these operations, which implies that the probability of presenting over fitting, which is a very frequent problem in these types of methods [41].

  4. (4)

    The activation function has the problem of gradient vanish [42], which occurs in deep neural networks. CVCNN also uses modified rectified linear units (ReLU) function to reduce this effect. With this activation function, the gradient passes through these layers without any alteration allowing the deep models to be more reliable. It also reduces the number of parameters to be updated in each iteration, since it is a conditional function which does not depend on any parameter, except for the input to function.

2.3 Complex Valued CNN for Dynamic System Modeling

To model complex nonlinear systems, we design the following operations for the complex valued CNN.

  1. (1)

    Convolutional operation. The parameters of the complex valued CNN, such as the filters and the synaptic weights, are complex values The convolution operation of CVCNN is modified as

    $$\begin{aligned} \varphi _{j}^{(\ell )}=\phi _{j}^{(\ell -1)}*\varGamma _{j}^{(\ell )} \end{aligned}$$
    (6)

    where \(\varphi \) is the \(j-\)th feature map in the layer \(\ell \), produced by the convolution of the input \(\phi ^{(\ell -1)}\), \(\varGamma _{j}^{(\ell )}\) is the filter in current layer, \(\ell =1,2,\cdots ,m\), m is the number of convolutional layers. In each layer, there are n filters with the length \(f_{\ell }\). Each convolutional layer is denoted by \(C_{\ell }\).

    In the first layer, \(\phi ^{(\ell -1)}=\phi ^{(0)}.\) It is the input to the CVCNN. Because convolution is a combined operation, the sum of the product can also be interpreted as a point product between two vectors.

  2. (2)

    Activation function. After the feature maps are created by the convolutional layer, an activation function is applied. For the complex valued CNN, the ReLU function is modified as \({\mathbb {C}}ReLU\) function,

    $$\begin{aligned} \begin{aligned}&{\mathbb {C}}\text {ReLU}(x)=\text {ReLU}(x_{\mathfrak {R}})+i\text {ReLU}(x_{\mathfrak {I}})\\&x=x_{\mathfrak {R}}+ix_{\mathfrak {I}}\in {\mathbb {C}} \end{aligned} \end{aligned}$$
    (7)

    The \({\mathbb {C}}ReLU\) satisfies the Cauchy-Riemann equations, when both real and imaginary are strictly positives or negatives. This condition is also satisfied when \(\theta _{x}\in \left[ 0,\frac{\pi }{2}\right] \) or \(\theta _{x}=\in \left[ \pi ,\frac{3}{2}\pi \right] \), \(\theta _{x}=arg(x),\) \(\theta _{x}\) is the phase or argument of the complex number x.

    Applying the \({\mathbb {C}}ReLU\) activation function to the feature map (6),

    $$\begin{aligned} \phi _{j}^{(\ell )}={\mathbb {C}}ReLU(\varphi _{j}^{\ell }) \end{aligned}$$
    (8)

    where \(\phi _{j}^{(\ell )}\) is the output of the current convolutional layer.

  3. (3)

    Pooling. An additional layer is included to the CVCNN for nonlinear dynamic system modeling. This layer is called pooling layer. Because the complex number has more properties than the real number, the pooling operations, such as maximum grouping and average grouping, have different meanings. Because of cycling property, we cannot only evaluate complex numbers by their phase. Magnitude is a more reliable property in the pooling, which is defined by

    $$\begin{aligned} M(x)=|x|\;\;\;M:{\mathbb {C}}\rightarrow \mathfrak {R} \end{aligned}$$
    (9)

    The pooling uses the magnitude of the complex number as

    $$\begin{aligned} \arg \max _{x\in set}M(x) \end{aligned}$$
    (10)

    (10) allows us to maintain a complex number through the complex valued CNN, because the output of the max-by-magnitude grouping belongs to the same input set.

  4. (4)

    Full connection. The output layer of the CVCNN is a fully connected layer with synaptic weights \(W^{(\ell )}\in {\mathbb {C}}^{n}\). The output of the CVCNN is defined as:

    $$\begin{aligned} {\hat{y}}(k)=W^{(\ell )}\phi ^{(\ell -1)} \end{aligned}$$
    (11)

3 Complex Valued CNN Training

The normal training algorithms of CNN cannot be applied to CVCNN, because:

  1. (1)

    The ReLU of CNN is modified as \({\mathbb {C}}ReLU\) (7), which uses complex value. The backpropagation operation has to be changed in complex domain.

  2. (2)

    The convolution operation in complex domain can be only applied to the real counterpart. The image part has to be redefined to obtain more information from the data.

  3. (3)

    CNN uses the gradient descent algorithm to update the parameters. But for CVCNN, the gradient descent algorithm has to be derived in complex domain.

In this paper, we propose a novel training method of CVCNN for system identification.

3.1 Backpropagation in Complex Domain

The object of nonlinear dynamic system modeling using the complex valued deep neural networks (11) is to update the weights \(W^{(\ell )}\), such that the output of the CNN \({\hat{y}}(k)\) convergence to the system output \(y\left( k\right) \) in (2),

$$\begin{aligned} \arg \min _{W^{(\ell )},V}\sum _{k=1}^{N}\left[ {\hat{y}}\left( k\right) -y\left( k\right) \right] ^{2},\quad \ell =1,2,\cdots ,m \end{aligned}$$

where N the training data number.

In this paper, we use the stochastic gradient descent (SGD). The object is

$$\begin{aligned} \arg \min _{W^{(\ell )},V}\left[ {\hat{y}}\left( k\right) -y\left( k\right) \right] ^{2} \end{aligned}$$

where \(\ell =1,2,\cdots ,m,\) \(k=1,2,\cdots ,N.\)

The performance index is defined as

$$\begin{aligned} J\left( k\right) =\frac{1}{2}e_{o}\left( k\right) ^{2},\quad e_{o}\left( k\right) ={\hat{y}}\left( k\right) -y\left( k\right) \end{aligned}$$
(12)

where \(e_{o}\left( k\right) \) is the modeling error.

There are m convolutional layers, each layer has n filters. So there are \(m\times n\) CNN cells. We should update the weigh of each filter \(W^{(\ell )}\in {\mathbb {C}}^{n}\) and the weight in the full connection layer V,  see Fig. 2.

Fig. 2
figure 2

The hierarchical structure of CNN for system identification

We first discuss the gradient descent algorithm in complex domain. For real values, the weights in the output layer is updated as

$$\begin{aligned} V\left( k+1\right) =V\left( k\right) -\eta \phi ^{(\ell )}e_{o}\left( k\right) \end{aligned}$$
(13)

where \(\phi _{m}^{(\ell )}\left( k\right) =\left[ \phi _{m,1}\cdots \phi _{m,n}\right] ^{T}\) is the output of the convolution operation of the CNN model (11), \(\eta >0\) is the learning rate.

For complex values, we need to calculate the gradient descent of a complex function, which is defined as \(f:{\mathbb {C}}\rightarrow \mathbb {C.}\). The derivatives of complex valued function can be regard as the derivative of a multivariate function, defined as \(f:\mathfrak {R}^{2}\rightarrow \mathfrak {R}^{2}\).

If f(x) and f(xy) are differentiable at points \(a\in \mathfrak {R}^{1}\) and \((a,b)\in \mathfrak {R}^{2},\)

$$\begin{aligned} \begin{aligned}&\lim _{x\rightarrow \alpha }\frac{f(x)-f(\alpha )}{x-\alpha }=\beta \\&\lim _{(x,y)\rightarrow (a,b)}\frac{\Vert f(x,y)-f(a,b)-F(x-a,y-b)\Vert }{\Vert (x-a,y-b)\Vert }=0 \end{aligned} \end{aligned}$$
(14)

where \(x,\alpha ,\beta \in \mathfrak {R}\).

If \(f(x,y)=f\left[ u(x,y),v(x,y)\right] \) and the derivative of f is denoted by Df

$$\begin{aligned} Df= \begin{bmatrix} \frac{\partial u}{\partial x} &{} \frac{\partial u}{\partial y}\\ \frac{\partial v}{\partial x} &{} \frac{\partial v}{\partial y} \end{bmatrix} \end{aligned}$$
(15)

From Cauchy-Riemann Theorem [43]. A complex function \(f(x)=r(x)+is(x),\) \(r,s:\mathfrak {R}\rightarrow \mathfrak {R}\) is differentiable if and only if \(f(x_{\mathfrak {R}},x_{\mathfrak {I}})=\left( r(x_{\mathfrak {R}},x_{\mathfrak {I}}),s(x_{\mathfrak {R}},x_{\mathfrak {I}})\right) \) is differentiable in \(\mathfrak {R}^{2}\rightarrow \mathfrak {R}^{2}\), and its partial derivatives satisfy

$$\begin{aligned} \frac{\partial r}{\partial x_{\mathfrak {R}}}=\frac{\partial s}{\partial x_{\mathfrak {I}}} ,\quad \frac{\partial s}{\partial x_{\mathfrak {R}}}=-\frac{\partial r}{\partial x_{\mathfrak {I}}} \end{aligned}$$
(16)

At the point \(a+ib\), the Wirtinger derive is

$$\begin{aligned} \lim _{x_{\mathfrak {R}}+ix_{\mathfrak {I}}\rightarrow a+ib}\frac{f(x_{\mathfrak {R}}+ix_{\mathfrak {I}})-f(a+ib)}{x_{\mathfrak {R}}+ix_{\mathfrak {I}}-(a+ib)}=\varsigma _{\mathfrak {R}}+i\varsigma _{Im} \end{aligned}$$
(17)

where \(x_{\mathfrak {R}}\) and \(x_{\mathfrak {I}}\) are the variables in real and complex parts. The complex number \(\varsigma =\varsigma _{\mathfrak {R}}+i\varsigma _{Im}\). (17) becomes

$$\begin{aligned} \lim _{\begin{array}{c} x_{\mathfrak {R}}+ix_{\mathfrak {I}}\\ \rightarrow a+ib \end{array}}\frac{\left[ \begin{aligned}&f(x_{\mathfrak {R}}+ix_{\mathfrak {I}})-f(a+ib)\\&-(\varsigma _{\mathfrak {R}}+i\varsigma _{Im})(x_{\mathfrak {R}}+ix_{\mathfrak {I}}-(a+ib)) \end{aligned} \right] }{x_{\mathfrak {R}}+ix_{\mathfrak {I}}-(a+ib)}=0 \end{aligned}$$
(18)

So the complex derivative of function f can be regarded as a linear transformation, which is equivalent to the derivative of the function in \(\mathfrak {R}^{2}\rightarrow \mathfrak {R}^{2}\). The complex derivatives are special linear functions with orthogonal matrices and with positive determinant.

The differential of a complex valued function \(f(x):S\rightarrow {\mathbb {C}}\), \(S\subseteq {\mathbb {C}}\) can be expressed by

$$\begin{aligned} df=\frac{\partial f(x)}{\partial x}dx+i\frac{\partial f(x)}{\partial x^{*} }dx^{*} \end{aligned}$$
(19)

where \(x^{*}\) is in complex domain. Here its derivative of real function has a orthogonal matrix with positive determinant.

In complex domain,

$$\begin{aligned} \frac{\partial J}{\partial V}=\frac{\partial J}{\partial e}\frac{\partial e}{\partial {\hat{y}}}\frac{\partial {\hat{y}}}{\partial V}=e_{o}\left[ V_{j,\mathfrak {R}}+iV_{j,\mathfrak {I}}\right] \end{aligned}$$
(20)

The Cauchy-Riemann equation are satisfy as \(r={\hat{y}}_{\mathfrak {R}}-y\) and \(s=\hat{y}_{\mathfrak {I}}\):

$$\begin{aligned} \begin{vmatrix} \frac{\partial r}{\partial \hat{y_{\mathfrak {R}}}}=1&\frac{\partial r}{\partial \hat{y_{\mathfrak {I}}}}=0\\ \frac{\partial s}{\partial {\hat{y}}_{\mathfrak {R}}}=0&\frac{\partial s}{\partial \hat{y_{\mathfrak {I}}}}=1 \end{vmatrix} >0,\quad \frac{\partial e_{0}}{\partial {\hat{y}}}=1 \end{aligned}$$
(21)

For the fully connected layer, two gradients are required. Since the Cauchy-Riemann equations hold for \(r={\hat{y}}_{\mathfrak {R}}\) and \(s={\hat{y}}_{\mathfrak {I}}\),

$$\begin{aligned} \begin{vmatrix} \frac{\partial r}{\partial V_{\mathfrak {R}}}=\varPhi _{\mathfrak {R}}^{(m)}&\frac{\partial r}{\partial V_{\mathfrak {I}}}=-\varPhi _{\mathfrak {I}}^{(m)}\\ \frac{\partial s}{\partial V}=\varPhi _{\mathfrak {I}}^{(m)}&\frac{\partial s}{\partial V_{\mathfrak {I}}}=\varPhi _{\mathfrak {R}}^{(m)} \end{vmatrix} >0 \end{aligned}$$
(22)

the partial derivative of \({\hat{y}}\) respect to V is

$$\begin{aligned} \frac{\partial {\hat{y}}}{\partial V}=\frac{\partial r}{\partial V_{\mathfrak {R}}} +i\frac{\partial s}{\partial V_{\mathfrak {R}}}=\varPhi _{\mathfrak {R}}^{(m)}+i\varPhi _{\mathfrak {I}}^{(m)} =\varPhi ^{(m)} \end{aligned}$$
(23)

We can calculate the gradient of cost function respect V, and for the rest of the hyper-parameters in the output layer using chain rule

$$\begin{aligned} \frac{\partial J}{\partial V}=\frac{\partial J}{\partial e}\frac{\partial e}{\partial {\hat{y}}}\frac{\partial {\hat{y}}}{\partial V}=(e_{o})(1)\varPhi ^{(m)})=e\varPhi ^{(m)} \end{aligned}$$
(24)

The gradient of \({\hat{y}}\) respect to \(\varPhi ^{(m)}\) is obtain with aid of the Cauchy-Riemann equation, \(r={\hat{y}}_{\mathfrak {R}}\) and \(s={\hat{y}}_{\mathfrak {I}}\):

$$\begin{aligned} \begin{vmatrix} \frac{\partial r}{\partial \varPhi _{\mathfrak {R}}^{(m)}}=V_{\mathfrak {R}}&\frac{\partial r}{\partial \varPhi _{\mathfrak {I}}^{(m)}}=-V_{\mathfrak {I}}\\ \frac{\partial s}{\partial \varPhi _{\mathfrak {R}}^{(m)}}=V_{\mathfrak {I}}&\frac{\partial s}{\partial \varPhi _{\mathfrak {I}}^{(m)}}=V_{\mathfrak {R}} \end{vmatrix} >0 \end{aligned}$$
(25)

So

$$\begin{aligned} \frac{\partial {\hat{y}}}{\partial \varPhi ^{(m)}}=\frac{\partial r}{\partial \varPhi _{\mathfrak {R}}^{(m)}}+i\frac{\partial s}{\partial \varPhi _{\mathfrak {R}}^{(m)}}=V_{\mathfrak {R}}+iV_{\mathfrak {I}}=V \end{aligned}$$
(26)

The updating law for the output layer (13) is

$$\begin{aligned} V\left( k+1\right) =V\left( k\right) -\eta \phi ^{(\ell )}\left[ V_{j,\mathfrak {R}}+iV_{j,\mathfrak {I}}\right] e_{o}\left( k\right) \end{aligned}$$
(27)

The updating law for the parameters in the convolution operation is

$$\begin{aligned} W_{ij}(k+1)=W_{ij}(k)-\eta \frac{\partial J}{\partial W_{ij}} \end{aligned}$$

where J is defined in (12), \(W_{ij}\) is parameter of the convolution operation.

3.2 Backpropagation for Complex Valued Convolution Operation

The error is back-propagated from the layer \(\ell -1\) to the layer \(\ell \) as

$$\begin{aligned} e_{\ell }\left( k\right) =e_{\ell -1}\left( k\right) \frac{\partial \phi ^{(\ell -1)}}{\partial t}W^{(\ell -1)} \end{aligned}$$
(28)

In backward order, the next layer is the activation layer with \({\mathbb {C}} ReLU\). It does not have parameters,

$$\begin{aligned} \frac{\partial J}{\partial \varphi ^{(\ell )}}=\frac{\partial J}{\partial \phi ^{(\ell )}}\varphi ^{(\ell )} \end{aligned}$$
(29)

After activation layer, the convolutional layer with filters is updated.

For classical CNN,

$$\begin{aligned} \frac{\partial J}{\partial \gamma _{\jmath }^{(\ell )}}=\sum _{a=0}^{N-f_{\ell } }\varphi _{a}^{(\ell )}\phi _{\jmath +a}^{(\ell -1)} \end{aligned}$$
(30)

where N is the size of the \(\phi ^{(\ell -1)}\), \(\gamma _{\jmath }^{(\ell )}\) is one of the elements in the filter \(\varGamma ^{(\ell )}.\) The training law is

$$\begin{aligned} \begin{aligned} \gamma _{\jmath }^{(\ell )}(k+1)&=\gamma _{\jmath }^{(\ell )}(k)-\eta \frac{\partial J}{\partial \gamma _{\jmath }^{(\ell )}}\\ \frac{\partial J}{\partial \phi _{\jmath }^{(\ell -1)}}&=\sum _{a=0}^{f_{\ell } -1}\frac{\partial J}{\partial \varphi _{\jmath -a}^{(\ell )}}\frac{\partial \varphi _{\jmath -a}^{(\ell )}}{\partial \phi _{\jmath }^{(\ell -1)}}=\sum _{a=0}^{f_{\ell }-1}\frac{\partial j}{\partial \varphi _{\jmath -a}^{(\ell )} }\gamma _{a}^{(\ell )} \end{aligned} \end{aligned}$$
(31)

The training object of [29] is to improve image classification, while our object is nonlinear dynamic system modeling with respect to measurement noise and missing data. Unlike the CNN and [29], we apply the convolution operation only in sum of the products of cells. We also use two real-value equations for each layer training, we do not use complex-value to train each layer as in [29]. These two differences between the proposed CVCNN and [29] help us to simplify the training and the analysis, at the same time they do not affect modeling accuracy.

3.2.1 Feedforward Operation

The filters and the weights of CVCNN are defined as

$$\begin{aligned} \varGamma ^{(\ell )}= \begin{bmatrix} \gamma _{1,\mathfrak {R}}^{(\ell )}+i\gamma _{1,\mathfrak {I}}^{(\ell )}\\ \vdots \\ \gamma _{p,\mathfrak {R}}^{(\ell )}+i\gamma _{p,\mathfrak {I}}^{(\ell )} \end{bmatrix} \in {\mathbb {C}} \end{aligned}$$
(32)

and

$$\begin{aligned} W^{(\ell )}=\left[ W_{1,\mathfrak {R}}^{(\ell )}+iW_{1,\mathfrak {I}}^{(\ell )}\cdots W_{m,\mathfrak {R}}^{(\ell )}+iW_{m,\mathfrak {I}}^{(\ell )}\right] ^{T}\in {\mathbb {C}}^{m} \end{aligned}$$
(33)

The input of CVCNN is \(\phi ^{(0)}= \begin{bmatrix} \phi _{1}^{(0)}\\ \phi _{2}^{(0)}\\ \cdots \\ \phi _{l}^{(0)} \end{bmatrix}\). In the first layer, the convolution operation leads into the feature map \(\varphi ^{(1)}\),

$$\begin{aligned} \begin{aligned} \varphi _{1}^{(1)}&=\phi ^{(0)}*\varGamma _{1}^{(1)}= \begin{bmatrix} \varPsi _{1,1}^{(1)}\\ \vdots \\ \varPsi _{1,q}^{(1)} \end{bmatrix}\\&= \begin{bmatrix} \phi _{1}^{(0)}\\ \vdots \\ \phi _{q}^{(0)} \end{bmatrix} *\begin{bmatrix} \gamma _{1,\mathfrak {R}}^{(1)}+i\gamma _{1,\mathfrak {I}}^{(1)}\\ \vdots \\ \gamma _{p,\mathfrak {R}}^{(1)}+i\gamma _{p,\mathfrak {I}}^{(1)} \end{bmatrix} \end{aligned} \end{aligned}$$
(34)

where each element \(\varPsi ^{(1)}\in {\mathbb {C}}\) is obtained by the convolution

$$\begin{aligned} \varPsi _{1,j}^{(1)}=\sum _{a=0}^{q}\phi _{a}^{(0)}\gamma _{j-a} \end{aligned}$$
(35)

Then the \({\mathbb {C}}ReLU\) is applied to the feature maps,

$$\begin{aligned} \phi _{1}^{(1)}={\mathbb {C}}\text {ReLU}(\varphi _{1}^{(1)}) \end{aligned}$$
(36)

The elements of this vector are

$$\begin{aligned} \begin{aligned} \phi _{1}^{(1)}&= \begin{bmatrix} \varPhi _{1,1}^{(1)}\\ \vdots \\ \varPhi _{1,q}^{(1)} \end{bmatrix} = \begin{bmatrix} {\mathbb {C}}\text {ReLU}(\varPsi _{1,1}^{(1)})\\ \vdots \\ {\mathbb {C}}\text {ReLU}(\varPsi _{1,q}^{(1)}) \end{bmatrix}\\&= \begin{bmatrix} \text {ReLU}(\varPsi _{1,1,\mathfrak {R}}^{(1)})+i\text {ReLU}(\varPsi _{1,1,\mathfrak {I}}^{(1)})\\ \vdots \\ \text {ReLU}(\varPsi _{1,3,\mathfrak {R}}^{(1)})+i\text {ReLU}(\varPsi _{1,p,\mathfrak {I}}^{(1)}) \end{bmatrix} \end{aligned} \end{aligned}$$
(37)

The last layer of the convolution operation is

$$\begin{aligned} {\hat{y}}=W^{(m)}\phi ^{(m-1)}=\left[ W_{1}^{(m)}\cdots W_{p}^{(m)}\right] \begin{bmatrix} \varPhi _{1}^{(m-1)}\\ \cdots \\ \varPhi _{p}^{(m-1)} \end{bmatrix} \end{aligned}$$
(38)

The output of CVCNN is

$$\begin{aligned} {\hat{y}}={\hat{y}}_{\mathfrak {R}}+i{\hat{y}}_{\mathfrak {I}} \end{aligned}$$

where \({\hat{y}}_{\mathfrak {R}}\) and \({\hat{y}}_{\mathfrak {I}}\) are defined as:

$$\begin{aligned} \begin{matrix} {\hat{y}}_{\mathfrak {R}}=W_{1,\mathfrak {R}}^{(m)}\varPhi _{1,\mathfrak {R}}^{(m-1)}-W_{1,\mathfrak {I}}^{(m)}\varPhi _{1,\mathfrak {I}}^{(m-1)}+\cdots \\ +W_{p,\mathfrak {R}}^{(m)}\varPhi _{p,\mathfrak {R}}^{(m-1)}-W_{p,\mathfrak {I}}^{(m)}\varPhi _{p,\mathfrak {I}}^{(m-1)}\\ \vdots \\ {\hat{y}}_{\mathfrak {I}}=W_{1,\mathfrak {R}}^{(m)}\varPhi _{1,\mathfrak {I}}^{(m-1)}+W_{1,\mathfrak {I}}^{(m)}\varPhi _{1,\mathfrak {R}}^{(m-1)}+\cdots \\ +W_{p,\mathfrak {R}}^{(m)}\varPhi _{p,\mathfrak {I}}^{(m-1)}+W_{p,\mathfrak {I}}^{(m)}\varPhi _{p,\mathfrak {R}}^{(m-1)} \end{matrix} \end{aligned}$$
(39)

here \(\varPhi _{1,\mathfrak {R}}^{(m-1)}\) and \(\varPhi _{1,\mathfrak {I}}^{(m-12)}\) are the real and imaginary part of the element \(\varPhi _{1}^{(m)}\), respectively.

3.2.2 Backpropagation of CVCNN

The backpropagation of the convolution operation needs chain rule and partial derivatives to calculate the gradient of the complex element. In the activation layer, the gradient is

$$\begin{aligned}&r=\text {ReLU}(\varPsi _{\mathfrak {R}})^{(m)},\quad s=\text {ReLU}(\varPsi _{\mathfrak {I}}^{(m)})\nonumber \\&\begin{vmatrix} \frac{\partial r}{\partial \varPsi _{\mathfrak {R}}^{(m)}}=\varPsi _{\mathfrak {R}}^{(m)}&\frac{\partial r}{\partial \varPsi _{\mathfrak {I}}^{(m)}}=0\\ \frac{\partial s}{\partial \varPsi _{\mathfrak {R}}^{(m)}}=0&\frac{\partial s}{\partial \varPsi _{\mathfrak {I}}^{(m)}}=\varPsi _{\mathfrak {I}}^{(m)} \end{vmatrix} >0 \end{aligned}$$
(40)

This leads to

$$\begin{aligned} \frac{\partial \varPhi _{{}}^{(m)}}{\partial \varPsi _{{}}^{(m)}}=\varPsi _{\mathfrak {R}}^{(m)} \end{aligned}$$
(41)

The gradient through a activation layer is determinate by adding \(\varPsi _{\imath j}^{(m)}\) into the corresponding vector \(\varphi _{j}^{(m)},\)

$$\begin{aligned} \frac{\partial J}{\partial \varphi _{\imath }}=\frac{\partial J}{\partial \phi _{\imath }^{(m)}}\varphi _{\imath }^{(m)} \end{aligned}$$
(42)

The convolutional layer is similar to the fully connected layer. The convolution operation can be regarded as the sum of products. Similar with (31),

$$\begin{aligned} \frac{\partial J}{\partial \gamma _{\jmath }^{(\ell )}}=\sum _{a=0}^{N-f_{\ell } }\varphi _{a}^{(\ell )}\phi _{\jmath +a}^{(\ell -1)} \end{aligned}$$
(43)

and the gradient through a convolutional layer is

$$\begin{aligned} \frac{\partial J}{\partial \phi _{\jmath }^{(\ell -1)}}=\sum _{a=0}^{f_{\ell } -1}\frac{\partial J}{\partial \varphi _{\jmath -a}^{(\ell )}}\frac{\partial \varphi _{\jmath -a}^{(\ell )}}{\partial \phi _{\jmath }^{(\ell -1)}}=\sum _{a=0}^{f_{\ell }-1}\frac{\partial j}{\partial \varphi _{\jmath -a}^{(\ell )} }\gamma _{a}^{(\ell )} \end{aligned}$$
(44)

4 Simulations

In this section, we use three benchmarks to show the effectiveness of the complex valued CNN (CVCNN) compared with the classical CNN, classical neural network (MLP), and some other recent methods. The architecture of the CNN is the same as the CVCNN, but the filters and the weights are different, they are real value and complex values.

CNN has two convolutional layer followed by a ReLU activation function and a max-pooling layer. For each benchmark, the number of filters in the convolutional layers are different. We use random walk to find the best possible combination. MLP has one hidden layer, the activation function is \(tanh\left( \cdot \right) \). The hidden nodes are different according to each benchmark. The initial filters of CNN and CVCNN are chosen randomly in the range \([-1,1].\) The weights of MLP are also chosen in \([-1,1].\)

In order to show the advantages of CVCNN for dynamic system modeling, we use the following two neural network models:

  1. (1)

    Series-parallel model as (5)

    $$\begin{aligned} {\hat{y}}(k)=NN[y\left( k-1\right) ,\cdots u\left( k\right) ,\cdots ] \end{aligned}$$
    (45)

    where both the input \(u\left( k\right) \) and the output \(y\left( k-1\right) \) of identified system are fed to the neural network \(NN\left[ \cdot \right] .\)

  2. (2)

    Parallel model as (4)

    $$\begin{aligned} {\hat{y}}(k)=NN[{\hat{y}}(k-1),\cdots +u\left( k\right) ,\cdots ] \end{aligned}$$
    (46)

    where only the input \(u\left( k\right) \) of identified system is fed to the neural network \(NN\left[ \cdot \right] .\) We use the output of the neural networks \({\hat{y}}(k-1)\) as the other part of neural network input. In the case of noise, (45) becomes

    $$\begin{aligned} {\hat{y}}(k)=NN[y\left( k-1\right) +\rho ,\cdots +u\left( k\right) ,\cdots ] \end{aligned}$$
    (47)

    where \(\rho \) is the random noise. (46) becomes

    $$\begin{aligned} {\hat{y}}(k)=NN[{\hat{y}}(k-1)+\rho ,\cdots u\left( k\right) ,\cdots ] \end{aligned}$$

In the case of missing data, (45) becomes

$$\begin{aligned} {\hat{y}}(k)=NN[{\bar{y}}\left( k-1\right) ,\cdots {\bar{u}}\left( k\right) ,\cdots ] \end{aligned}$$
(48)

where \({\bar{y}}\left( k-1\right) \) and \({\bar{u}}\left( k\right) \) are from the data sets \(\left\{ y\left( 1\right) ,\cdots y\left( N\right) \right\} \) and \(\left\{ u\left( 1\right) ,\cdots u\left( N\right) \right\} \). (46) becomes

$$\begin{aligned} {\hat{y}}(k)=NN[{\tilde{y}}\left( k-1\right) ,\cdots {\tilde{u}}\left( k\right) ,\cdots ] \end{aligned}$$

where \({\tilde{y}}\left( k-1\right) \) and \({\tilde{u}}\left( k\right) \) are from the data sets \(\left\{ {\hat{y}}\left( 1\right) ,\cdots {\hat{y}}\left( N\right) \right\} \) and \(\left\{ u\left( 1\right) ,\cdots u\left( N\right) \right\} \).

In this paper, we select \(30\%\) of the data as missing data.

4.1 Searching Hyper-parameters

There are not optimal methods to define the neural structure. Most of them use trial and error to find a good structure. We will use random search method to decide the hyper-parameters of the neural model. It is similar with [44].

We first randomly select the combinations of the hyper-parameters in total set. Then we search the best score, i.e. minimizing a hyper-parameter response function \(\varUpsilon \). The algorithm is as follows for CNN:

  1. 1.

    Choose the number of convolutional layers in CNN from (1,2 or 3 layers).

  2. 2.

    Define the number of filters in each layer (1–100 filters) and its dimension(3-6 of length).

  3. 3.

    Initialize the filters and synaptic weights of the output layer (range: −1 to 1).

  4. 4.

    Carry out the simulation and obtain the score of the function \(\varUpsilon \)

  5. 5.

    Repeat previous steps 15 times with randomly hyper-parameter settings.

  6. 6.

    Choose the structure with the best score.

In case of MLP, the algorithm is as follows:

  1. 1.

    define a two-layer MLP.

  2. 2.

    Define the number of neurons or nodes in the hidden layer (1–100 nodes).

  3. 3.

    Initialize the synaptic weights (range: −1 to 1).

  4. 4.

    Carry out the simulation and obtain the score of the function \(\varUpsilon \)

  5. 5.

    Repeat previous steps 10 times with randomly hyper-parameter settings.

  6. 6.

    Choose the structure with the best score.

In both cases, data set is used without preprocessing. With these algorithms, the hyper-parameters are decided and shown in the following sections, which are the best score in the hyper-parameter response function \(\varUpsilon \) according to the random search algorithm.

4.2 Gas Furnace Modeling

The data set of gas furnace is a benchmark example for nonlinear system modeling [45]. The input signal \(u\left( k\right) \) is the flow rate of the methane gas. The output signal \(y\left( k\right) \) is the concentration of \(CO_{2}.\) There are 296 samples in 9 seconds. We use 200 samples for training, the other 96 samples for testing. We compare our CVCNN with CNN and MLP.

For the series-parallel models (45), (47) and (48), the input vector to the neural models are

$$\begin{aligned} \left[ y\left( k-1\right) ,\cdots y\left( k-5\right) ,u\left( k\right) ,\cdots u\left( k-4\right) \right] \end{aligned}$$

For the parallel model (46), the input vector to the neural models are \(\left[ {\hat{y}}\left( k-1\right) ,u\left( k\right) ,\cdots \right. \left. u\left( k-10\right) \right] \). Each convolutional layer of CVCNN and CNN has 3 filters, the size of each filter is 3. The MLP has 50 nodes in the hidden layer. The amplitude of the noise is about \(10\%\) of the output amplitude. The modeling errors of testing phase for gas furnace with the parallel model (46) are shown in Fig. 3.

The performance of the MLP becomes worse if we reduce the percentage of training data from \(60\%\) to \(40\%\), i.e, data number is from 200 to 120. The root mean square error (RMSE) goes from 0.1257 to 0.2435. The same occurs for CNN, increasing from 0.1466 to 0.1935 and for the CVCNN, the RMSE also increase, from 0.0829 to 0.1458. Reducing the amount of training data does increase the RMSE value. Both CNNs have less problems in deducing training data compared with MLP. The quantity of 200 data is chosen, because this quantity is commonly used for this benchmark.

Fig. 3
figure 3

Modeling errors of parallel model for gas furnace

4.3 First Order Nonlinear System

The following discrete-time first-order nonlinear system is another popular benchmark [38],

$$\begin{aligned} y(k+1)=\frac{y(k)}{1+y^{2}(k)}+u^{3}(k) \end{aligned}$$
(49)

The control input u(k) is periodic, \(u(k)=B\sin \left( \frac{\pi k}{50}\right) +C\sin \left( \frac{\pi k}{20}\right) .\) In training phase, \(B=C=1.\) In testing phase, \(A=0.9,\) \(B=1.1.\) We use 5000 data generated by (49) to train the neural models, and use 100 data for the testing.

For the series-parallel models (45), (47) and (48), the input vector to the neural models is

$$\begin{aligned} \left[ y\left( k-1\right) ,\cdots y\left( k-10\right) ,u\left( k\right) ,\cdots u\left( k-13\right) \right] \end{aligned}$$

For the parallel model (46), the input vector to the neural models are \(\left[ {\hat{y}}\left( k-1\right) ,u\left( k\right) ,\cdots \right. \left. u\left( k-12\right) \right] \). The CVCNN and CNN have 8 filters in each convolutional layer. MLP has 35 nodes in the hidden layer. The amplitude of the noise is around \(10\%\) of the output amplitude.

The system identification of the testing results for the first-order nonlinear system with the parallel model (46) are shown in Fig. 4.

Fig. 4
figure 4

The modeling errors of parallel model for first-order system

4.4 Wiener-Hammerstein System

Wiener-Hammerstein system is the series connection of a linear system, a static non linearity and another linear system. It is a electrical circuit consisting of three cascade blocks [46]. This benchmark system has 14, 000 samples. We use 1, 000 for testing, the other part data are used for training. For the series-parallel models (45), (47) and (48), the input vector to the neural models are \(\left[ y\left( k-1\right) ,\cdots y\left( k-4\right) ,u\left( k\right) ,\cdots u\left( k-5\right) \right] \), with noise amplitude around \(10\%\) of output amplitude For the parallel model (46), the input vector to the neural models are \(\left[ {\hat{y}}\left( k-1\right) ,\cdots \hat{y}\left( k-3\right) ,u\left( k\right) ,\cdots u\left( k-80\right) \right] \). For CVCNN and CNN, each convolutional layer has 15 filters, the seize is 6. The MLP is the same as the model of the gas furnace, using 80 nodes in the hidden layer. The modeling errors of the series-parallel model with noise data (47) are shown in Fig. 5. The modeling errors of the parallel model (46) are shown in Fig. 6. We can see that the parallel model with the CVCNN is alternative in the context of system modeling, due to its better performances compared to other methods.

Fig. 5
figure 5

Modeling errors of series-parallel model with noisy data for the Wiener-Hammerstein system

Fig. 6
figure 6

Modeling errors of parallel model for the Wiener-Hammerstein system

4.5 Discussion

The main metric to evaluate performance is the mean square error (MSE) defined by

$$\begin{aligned} \frac{1}{N}\varSigma _{k=1}^{N}\left[ y(k)-{\hat{y}}(k)\right] ^{2} \end{aligned}$$

We also use the other metrics as

$$\begin{aligned} \begin{aligned}&\text {R}^{2}:1-\frac{\frac{1}{N}\varSigma _{k=1}^{N}\left[ y(k)-\hat{y}(k)\right] ^{2}}{\frac{1}{N}\varSigma _{k=1}^{N}\left[ y(k)-\bar{y}(k)\right] ^{2}}\\&\text {Mean absolute error(MAE): }\frac{1}{N}\varSigma _{k=1}^{N}\left| y(k)-{\hat{y}}(k)\right| \\&\text {Root mean squared error (RMSE): }\sqrt{\frac{1}{N}\varSigma _{k=1} ^{N}\left[ y(k)-{\hat{y}}(k)\right] ^{2}} \end{aligned} \end{aligned}$$

For the above three benchmarks, we use five metrics to compare our CVCNN with CNN and MLP. The MLP has one hidden layer. CNN and CVCNN have the same structure as above.

Tables 1, 2, 3, 4 show the comparison results of the three models. We can see that in the most cases, the proposed CVCNN performance better than the others

Table 1 Performance metrics for gas furnace benchmark using series parallel model
Table 2 Performance metrics for gas furnace benchmark using parallel model
Table 3 Performance metrics for gas furnace benchmark using series parallel model with noise data
Table 4 Performance metrics for gas furnace benchmark using series parallel model with missing data

Table 5 shows the modeling errors of different recent methods. In order to do a fair comparison, we use the same models as the Arima model and PEC-WNN in [47]. We form the problems of prediction, modeling and identification into the same sense: we use the trained model to estimate the next value in the time series, and the objects are the same. We can see that our CVCNN has better performances than the other methods in missing data case. PEC-WNN is better than ours, but it has to use complete data set. None of the other methods considers missing data.

Table 5 Performances of the other recent methods for the gas furnace modeling

For this benchmark we can conclude that:

  • In the cases of big noise and missing data, CVCNN gives good modeling accuracy than CNN and MLP.

  • If the parallel model is used, both CVCNN and CNN work well.

  • If the series-parallel model can be used, both CNN and MLP work well.

Tables 6, 7, 8, 9 shown the comparison results of Wiener-Hammerstein system. We can see that similar with the gas furnace modeling, our method gives better results for Wiener-Hammerstein system. It should be noted that our method has better performances in the cases of missing data and big disturbances. CVCNN, CNN and MLP give good modeling accuracy with series-parallel. For the cases of noise and missing data, the CVCNN and CNN work well, but MLP cannot model correctly.

Table 6 Performance metrics for Wiener-Hammerstein benchmark using series parallel model
Table 7 Performance metrics for Wiener-Hammerstein benchmark using parallel model
Table 8 Performance metrics for Wiener-Hammerstein benchmark using series parallel model with noisy data
Table 9 Performance metrics for Wiener-Hammerstein benchmark using series parallel model with missing data

Table 10 presents the results of other four methods for dynamic system modeling: LSTM and SVM are popular sadistical learning methods, BLA uses Spearman correlation for model optimization, PNLSS uses polynomial nonlinear state-space model. All of these methods use the trained model to estimate the next value in the time series, and the modeling objects are the same. We can see that our CVCNN has very similar results with the other for this benchmark. But for the cases of big noise and missing data, they did not show results.

Table 10 Performance of Wiener-Hammerstein system modeling using the other methods

Tables 11, 12, 13, 14 give the modeling results of the nonlinear system. Clearly, CVCNN performs better than the other methods. We can see that for the first-order nonlinear system, CVCNN, CNN and MLP give good modeling accuracy with series-parallel. For the cases of noises and missing data, CVCNN and CNN work well, but MLP cannot.

Table 11 Performance metrics for nonlinear system benchmark using series parallel model
Table 12 Performance metrics for nonlinear system benchmark using parallel model
Table 13 Performance metrics for nonlinear system benchmark using series parallel model with noise data
Table 14 Performance metrics for nonlinear system benchmark using series parallel model with missing data

5 Conclusion

Measurement noise and missing data are important disturbances in system identification, which can affect directly modeling accuracy. CVCNN has much power and better modeling accuracy than the other classical modeling methods for these cases, although the mathematical analysis of CVCNN is more difficult. So CVCNN is efficient and robust model compare to classic ones for nonlinear dynamic system modeling with bid uncertainties. Our further work will be on CVCNN based robust control.