Complex Valued Deep Neural Networks for Nonlinear System Modeling

Lopez-Pacheco, Mario; Yu, Wen

doi:10.1007/s11063-021-10644-1

Complex Valued Deep Neural Networks for Nonlinear System Modeling

Published: 23 September 2021

Volume 54, pages 559–580, (2022)
Cite this article

Download PDF

Neural Processing Letters Aims and scope Submit manuscript

Complex Valued Deep Neural Networks for Nonlinear System Modeling

Download PDF

2921 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Deep learning models, such as convolutional neural networks (CNN), have been successfully applied in pattern recognition and system identification recent years. But for the cases of missing data and big noises, CNN does not work well for dynamic system modeling. In this paper, complex valued convolution neural network (CVCNN) is presented for modeling nonlinear systems with large uncertainties. Novel training methods are proposed for CVCNN. Comparisons with other classical neural networks are made to show the advantages of the proposed methods.

Fast Fitting Method of Complex Network Based on Deep Learning

Modeling of Complex Multidimensional Nonlinear Systems Using Neural System with Deep Architectures

Complex-Valued Feedforward Neural Networks Learning Without Backpropagation

1 Introduction

Predicting future behaviors in dynamic systems is a complicated task. Many advanced control techniques could be used to achieve this goal. The system modeling can be done through different methods, applying the laws of physics that govern the behavior of the system is one of the most used [1]. In order to apply this method, it is necessary to have sufficient knowledge of the system to determine the parameters of physic equations, which in most cases is difficult to achieve. In general, modeling or identification of systems is to find relationships between the inputs and outputs of the system. These relationships can often be represented as a convolution operation, such as time-invariant linear systems. Time series forecasting methods can be used for the system identification, but the inputs and outputs of the system are joined into one time series, and the convolution relation lost [2].

Neural networks are black box models, because they only need to know input and output data. This data-driven modeling method still has problems. There are still not effective methods for determining the hyper-parameters of the neural network, such as the number of hidden layers and the number of nodes in each of the layers [3]. The universal approximation theorem ensures that a neural model can approximate virtually any continuous system using some number of nodes in one hidden layer. However, increasing the number of nodes in this layer leads to overfitting problem [4]. As an alternative, instead of increasing the number of nodes in the hidden layer, it is recommended to add more hidden layers. This is the deep neural networks [5]. In recent years the models of deep learning have been performed in many areas, and they archive better results than existing models theoretically and practically [6].

One of the most popular deep learning models is convolutional neural network (CNN) [7], which is used mainly in images processing. CNNs are used for image classification in [8,9,10]. The CNNs give almost the highest performances, and they have been applied in medical images [11, 12]. There are also applications in other areas, such as time series prediction [13, 14]. Many variation architectures of CNNs have been presented. [15] presents a CNN in the frequency domain, which differs from the classical CNNs from the 90’s [3, 16].

A very common problem of systems modeling are large uncertainties affecting them, such as unknown parts of the data, which are commonly up to $50\%$ of total data, It can be produced by any cause, a fault, a human error, an accidental or provoked deletion, etc. Data loss is most often caused by physical failure of sensors, followed by human error. The others large uncertainties are the external disturbance,such as the measurement noise which can be around $20\%$ of the output amplitude, this type of noise is generated mainly by a malfunction of the sensors used in the system and it can be periodic, intermittent, or random. Data processing can reduce these effects on the process. We can also reduce these influences through our neural networks, for example multilayer perceptions (MLP) can increase theirs nodes and layer numbers. But we have to train a greater number of parameters, and there is overfitting problem. CNNs can reduce noise directly in the convolutional layers. CNN are use more frequently in the others tasks than nonlinear dynamic system modeling [17].

System identification can be formed into time series prediction [18, 19]. In [20], the time series is in the self-regulating model to improve performance. [21] presents a model based on neural networks to estimate COVID-19 cases in the following months. In [22] proposes a new LSTM network architecture for modeling dynamic systems. In this same area, very few publications are presented in which a CNN is used for this purpose. In [23], the identification of the nonlinear system is transferred to time series using ARMAX model. In our previous work [24], a modeling scheme for nonlinear systems is proposed using CNN with real values. In [25], CNN is trained to model uncertainties in the dynamic system. These uncertainties are over passed with the CNN because of the properties of share weights and sparse connectivity.

Most of deep learning algorithms work with real values, such as multilayer networks (MLP), support vector machine (SVM), long-short term memories (LSTM) and CNN. Complex valued neural networks have many advantage over classical NNs, such as complex valued MLP [26], complex valued RBF networks [27] and complex valued convolutional neural networks (CVCNN) [28, 29].

In order to use complex numbers, the normal neural networks have to be modified. The modifications are in the structure of NNs and in learning methods [30]. A general training algorithm for the complex valued NN is introduced in [28]. In [31], the complex valued NN is regarded as two real valued NNs, which expand the learning information. It is more difficult to give theoretical analysis for complex values [32]. For image processing, CVCNN is given for image denoising [33] and for image classification [34]. In [35] CVNN is proposed as classifier based on range-beam-Doppler tensor. In [36] classification of human activities are proposed using CVCNN, where the motion of human are captured and transformed to frequency domain. To the best of our knowledge, there are not published results using CVCNN for dynamic system modeling [37].

From our point of view, the main advantage of the CVCNN is that we can have same structure as the real valued CNN, but for the cases of missing data and big noises, CVCNN will give better performances. In order to create a neural network who can model nonlinear systems with missing data and large uncertainties with CNN, in this paper, we make the following contributions:

(1)
Novel architecture of CVCNN for non linear dynamic system modeling is proposed.
(2)
Learning method of the CVCNN is given.

To show the advantages of the CVCNN, comparisons with the other state-of-art methods are made by using several benchmarks. The comparison results show that: (1) the novel model has better modeling performances than the other algorithms in the cases of large uncertainties; (2) the proposed method has fast convergence speed, and it can be realized easily.

2 Nonlinear Dynamic System Modeling with Complex Valued CNN

2.1 Deep Nneural Networks for Dynamic System Modeling

Let consider an unknown discrete-time nonlinear system

$$\begin{aligned} {\bar{x}}(k+1)=f\left[ {\bar{x}}\left( k\right) ,u\left( k\right) \right] ,\quad y(k)=g\left[ {\bar{x}}\left( k\right) \right] \end{aligned}$$

(1)

where $u\left( k\right) \ $is the input vector, ${\bar{x}}\left( k\right) $ is the internal state vector, $y\left( k\right) $ is the output vector. f and g are general nonlinear smooth functions, $f,g\in C^{\infty }$. Denoting

$$\begin{aligned} \begin{aligned} Y(k)&=\left[ y\left( k\right) ,y\left( k+1\right) ,\cdots y\left( k+n-1\right) \right] \\ U(k)&=\left[ u\left( k\right) ,u\left( k+1\right) ,\cdots u\left( k+n-2\right) \right] \end{aligned} \end{aligned}$$

if $\frac{\partial Y}{\partial {\bar{x}}}$ is non-singular at ${\bar{x}}=0$ and $U=0,$ leads to the following nonlinear auto regressive exogenous model (NARX)

$$\begin{aligned} y(k)=\varUpsilon \left[ x\left( k\right) \right] \end{aligned}$$

(2)

where $\varUpsilon \left( \cdot \right) $ represent an unknown nonlinear difference equation corresponding to the plant dynamics,

$$\begin{aligned} x\left( k\right) =[y\left( k-1\right) ,\cdots y\left( k-m_{y}\right) ,u\left( k\right) ,\cdots u\left( k-m_{u}\right) ]^{T} \end{aligned}$$

(3)

$x\left( k\right) =\left[ x_{1}\cdots x_{l}\right] ^{T},$ $l=m_{y} +m_{u}+1,$ $k=1\cdots N,$ N is the total data for this particular system.

To model the nonlinear system (2) two types of models can be used:

(1)
The first type is
$$\begin{aligned} \begin{aligned}&{\hat{y}}(k)=N[u\left( k\right) ,\cdots ,u\left( k-n_{u}\right) ]\\&\text {or }{\hat{y}}(k)=N[{\hat{y}}(k-1),\cdots ,{\hat{y}}(k-n_{y}),\\&u\left( k\right) ,\cdots ,u\left( k-n_{u}\right) ] \end{aligned} \end{aligned}$$
(4)
where ${\hat{y}}(k)$ is the model output, $N\left[ \cdot \right] $ is the complex valued CNN. $n_{y}$ and $n_{u}$ are the regression orders. These representation is called the parallel model [38]. Here the input to the CNN does not include the real output $y\left( k-1\right) ,\cdots $
(2)
The second type is
$$\begin{aligned} \begin{aligned}&{\hat{y}}(k)=N[y\left( k-1\right) ,\cdots ,y\left( k-n_{y}\right) ,\\&u\left( k\right) ,\cdots ,u\left( k-n_{u}\right) ] \end{aligned} \end{aligned}$$
(5)
It is the series-parallel model [38]. The data regression is the same as the NARX model (3), only $n_{u}\ne m_{u},$ $n_{y}\ne m_{y}.$

Because the parallel model (4) do not use the output $y\left( k-i\right) ,$ the missing data and the disturbances in the output $y\left( k-i\right) $ does not affect the neural model. In this paper, we will use the parallel model (4).

The biggest difficult of the parallel model (4) is that it cannot model complex nonlinear system, such as missing data and large disturbances, this is because it is not enough to know only the input of the system. By not knowing the output data, even when the data is presented smoothly, the models used tend to diverge [39]. But we will use complex valued CNN to overcome this problem, the fact of adding the imaginary part to the network, leads to being able to extract different features of the system helping a better understanding by the network.

2.2 Complex Valued CNN

The architecture of the convolutional neural networks (CNN) for system identification is shown in Fig. 1. It is in cascade connection. Each CNN cell includes convolution and sub-sampling operations. The last layer is fully connected with synaptic weights. For complex valued CNN (CVCNN), all parameters (weights) are complex-value.

CVCNN is an efficient modeling option for nonlinear systems, because the convolution operation in CNN is the same as the input-output relationship of dynamic system [24], and the complex-based representation of the neural network increases the potential of the algorithm. The advantages of using CVCNN for modeling dynamic systems are:

(1)
To model a complex nonlinear system, large-scale neural networks are needed. The CNN model uses sparse connectivity and shared weight to reduce the number of model parameters. This property is one of the main advantaged of real valued CNN. A CNN does not depend on the type of data used, but on the structure of the network.
(2)
Each convolutional filter scans the entire data regardless of where the information is in the data. When acquiring features is in the complex domain, the created feature maps are much richer with system information, even the input data are real. Each of these feature maps has some particular system property through the convolution operation with complex values, enlarge by the imaginary part, This can be seen as if more filters are added to the network, corresponding to the imaginary part of them, causing more features to be acquired from the data, providing more information for the modeling of the system [40].
(3)
The multi-level grouping, such as the maximum operation on the sub sampling layer, has tolerance to the noise in the data. This makes CNN’s model very robust. The calculations carried out in the convolutional layers by the CVCNN obtain more information from the physical system, this due to the fact that the filters have complex components, which allows different features to be generated when performing these operations, which implies that the probability of presenting over fitting, which is a very frequent problem in these types of methods [41].
(4)
The activation function has the problem of gradient vanish [42], which occurs in deep neural networks. CVCNN also uses modified rectified linear units (ReLU) function to reduce this effect. With this activation function, the gradient passes through these layers without any alteration allowing the deep models to be more reliable. It also reduces the number of parameters to be updated in each iteration, since it is a conditional function which does not depend on any parameter, except for the input to function.

2.3 Complex Valued CNN for Dynamic System Modeling

To model complex nonlinear systems, we design the following operations for the complex valued CNN.

(1)
Convolutional operation. The parameters of the complex valued CNN, such as the filters and the synaptic weights, are complex values The convolution operation of CVCNN is modified as
$$\begin{aligned} \varphi _{j}^{(\ell )}=\phi _{j}^{(\ell -1)}*\varGamma _{j}^{(\ell )} \end{aligned}$$
(6)
where $\varphi $ is the $j-$th feature map in the layer $\ell $, produced by the convolution of the input $\phi ^{(\ell -1)}$, $\varGamma _{j}^{(\ell )}$ is the filter in current layer, $\ell =1,2,\cdots ,m$, m is the number of convolutional layers. In each layer, there are n filters with the length $f_{\ell }$. Each convolutional layer is denoted by $C_{\ell }$.

In the first layer, $\phi ^{(\ell -1)}=\phi ^{(0)}.$ It is the input to the CVCNN. Because convolution is a combined operation, the sum of the product can also be interpreted as a point product between two vectors.
(2)
Activation function. After the feature maps are created by the convolutional layer, an activation function is applied. For the complex valued CNN, the ReLU function is modified as ${\mathbb {C}}ReLU$ function,
$$\begin{aligned} \begin{aligned}&{\mathbb {C}}\text {ReLU}(x)=\text {ReLU}(x_{\mathfrak {R}})+i\text {ReLU}(x_{\mathfrak {I}})\\&x=x_{\mathfrak {R}}+ix_{\mathfrak {I}}\in {\mathbb {C}} \end{aligned} \end{aligned}$$
(7)
The ${\mathbb {C}}ReLU$ satisfies the Cauchy-Riemann equations, when both real and imaginary are strictly positives or negatives. This condition is also satisfied when $\theta _{x}\in \left[ 0,\frac{\pi }{2}\right] $ or $\theta _{x}=\in \left[ \pi ,\frac{3}{2}\pi \right] $, $\theta _{x}=arg(x),$ $\theta _{x}$ is the phase or argument of the complex number x.

Applying the ${\mathbb {C}}ReLU$ activation function to the feature map (6),
$$\begin{aligned} \phi _{j}^{(\ell )}={\mathbb {C}}ReLU(\varphi _{j}^{\ell }) \end{aligned}$$
(8)
where $\phi _{j}^{(\ell )}$ is the output of the current convolutional layer.
(3)
Pooling. An additional layer is included to the CVCNN for nonlinear dynamic system modeling. This layer is called pooling layer. Because the complex number has more properties than the real number, the pooling operations, such as maximum grouping and average grouping, have different meanings. Because of cycling property, we cannot only evaluate complex numbers by their phase. Magnitude is a more reliable property in the pooling, which is defined by
$$\begin{aligned} M(x)=|x|\;\;\;M:{\mathbb {C}}\rightarrow \mathfrak {R} \end{aligned}$$
(9)
The pooling uses the magnitude of the complex number as
$$\begin{aligned} \arg \max _{x\in set}M(x) \end{aligned}$$
(10)
(10) allows us to maintain a complex number through the complex valued CNN, because the output of the max-by-magnitude grouping belongs to the same input set.
(4)
Full connection. The output layer of the CVCNN is a fully connected layer with synaptic weights $W^{(\ell )}\in {\mathbb {C}}^{n}$. The output of the CVCNN is defined as:
$$\begin{aligned} {\hat{y}}(k)=W^{(\ell )}\phi ^{(\ell -1)} \end{aligned}$$
(11)

3 Complex Valued CNN Training

The normal training algorithms of CNN cannot be applied to CVCNN, because:

(1)
The ReLU of CNN is modified as ${\mathbb {C}}ReLU$ (7), which uses complex value. The backpropagation operation has to be changed in complex domain.
(2)
The convolution operation in complex domain can be only applied to the real counterpart. The image part has to be redefined to obtain more information from the data.
(3)
CNN uses the gradient descent algorithm to update the parameters. But for CVCNN, the gradient descent algorithm has to be derived in complex domain.

In this paper, we propose a novel training method of CVCNN for system identification.

3.1 Backpropagation in Complex Domain

The object of nonlinear dynamic system modeling using the complex valued deep neural networks (11) is to update the weights $W^{(\ell )}$, such that the output of the CNN ${\hat{y}}(k)$ convergence to the system output $y\left( k\right) $ in (2),

$$\begin{aligned} \arg \min _{W^{(\ell )},V}\sum _{k=1}^{N}\left[ {\hat{y}}\left( k\right) -y\left( k\right) \right] ^{2},\quad \ell =1,2,\cdots ,m \end{aligned}$$

where N the training data number.

In this paper, we use the stochastic gradient descent (SGD). The object is

$$\begin{aligned} \arg \min _{W^{(\ell )},V}\left[ {\hat{y}}\left( k\right) -y\left( k\right) \right] ^{2} \end{aligned}$$

where $\ell =1,2,\cdots ,m,$ $k=1,2,\cdots ,N.$

The performance index is defined as

$$\begin{aligned} J\left( k\right) =\frac{1}{2}e_{o}\left( k\right) ^{2},\quad e_{o}\left( k\right) ={\hat{y}}\left( k\right) -y\left( k\right) \end{aligned}$$

(12)

where $e_{o}\left( k\right) $ is the modeling error.

There are m convolutional layers, each layer has n filters. So there are $m\times n$ CNN cells. We should update the weigh of each filter $W^{(\ell )}\in {\mathbb {C}}^{n}$ and the weight in the full connection layer V, see Fig. 2.

We first discuss the gradient descent algorithm in complex domain. For real values, the weights in the output layer is updated as

$$\begin{aligned} V\left( k+1\right) =V\left( k\right) -\eta \phi ^{(\ell )}e_{o}\left( k\right) \end{aligned}$$

(13)

where $\phi _{m}^{(\ell )}\left( k\right) =\left[ \phi _{m,1}\cdots \phi _{m,n}\right] ^{T}$ is the output of the convolution operation of the CNN model (11), $\eta >0$ is the learning rate.

For complex values, we need to calculate the gradient descent of a complex function, which is defined as $f:{\mathbb {C}}\rightarrow \mathbb {C.}$. The derivatives of complex valued function can be regard as the derivative of a multivariate function, defined as $f:\mathfrak {R}^{2}\rightarrow \mathfrak {R}^{2}$.

If f(x) and f(x, y) are differentiable at points $a\in \mathfrak {R}^{1}$ and $(a,b)\in \mathfrak {R}^{2},$

$$\begin{aligned} \begin{aligned}&\lim _{x\rightarrow \alpha }\frac{f(x)-f(\alpha )}{x-\alpha }=\beta \\&\lim _{(x,y)\rightarrow (a,b)}\frac{\Vert f(x,y)-f(a,b)-F(x-a,y-b)\Vert }{\Vert (x-a,y-b)\Vert }=0 \end{aligned} \end{aligned}$$

(14)

where $x,\alpha ,\beta \in \mathfrak {R}$.

If $f(x,y)=f\left[ u(x,y),v(x,y)\right] $ and the derivative of f is denoted by Df,

$$\begin{aligned} Df= \begin{bmatrix} \frac{\partial u}{\partial x} &{} \frac{\partial u}{\partial y}\\ \frac{\partial v}{\partial x} &{} \frac{\partial v}{\partial y} \end{bmatrix} \end{aligned}$$

(15)

From Cauchy-Riemann Theorem [43]. A complex function $f(x)=r(x)+is(x),$ $r,s:\mathfrak {R}\rightarrow \mathfrak {R}$ is differentiable if and only if $f(x_{\mathfrak {R}},x_{\mathfrak {I}})=\left( r(x_{\mathfrak {R}},x_{\mathfrak {I}}),s(x_{\mathfrak {R}},x_{\mathfrak {I}})\right) $ is differentiable in $\mathfrak {R}^{2}\rightarrow \mathfrak {R}^{2}$, and its partial derivatives satisfy

$$\begin{aligned} \frac{\partial r}{\partial x_{\mathfrak {R}}}=\frac{\partial s}{\partial x_{\mathfrak {I}}} ,\quad \frac{\partial s}{\partial x_{\mathfrak {R}}}=-\frac{\partial r}{\partial x_{\mathfrak {I}}} \end{aligned}$$

(16)

At the point $a+ib$, the Wirtinger derive is

$$\begin{aligned} \lim _{x_{\mathfrak {R}}+ix_{\mathfrak {I}}\rightarrow a+ib}\frac{f(x_{\mathfrak {R}}+ix_{\mathfrak {I}})-f(a+ib)}{x_{\mathfrak {R}}+ix_{\mathfrak {I}}-(a+ib)}=\varsigma _{\mathfrak {R}}+i\varsigma _{Im} \end{aligned}$$

(17)

where $x_{\mathfrak {R}}$ and $x_{\mathfrak {I}}$ are the variables in real and complex parts. The complex number $\varsigma =\varsigma _{\mathfrak {R}}+i\varsigma _{Im}$. (17) becomes

$$\begin{aligned} \lim _{\begin{array}{c} x_{\mathfrak {R}}+ix_{\mathfrak {I}}\\ \rightarrow a+ib \end{array}}\frac{\left[ \begin{aligned}&f(x_{\mathfrak {R}}+ix_{\mathfrak {I}})-f(a+ib)\\&-(\varsigma _{\mathfrak {R}}+i\varsigma _{Im})(x_{\mathfrak {R}}+ix_{\mathfrak {I}}-(a+ib)) \end{aligned} \right] }{x_{\mathfrak {R}}+ix_{\mathfrak {I}}-(a+ib)}=0 \end{aligned}$$

(18)

So the complex derivative of function f can be regarded as a linear transformation, which is equivalent to the derivative of the function in $\mathfrak {R}^{2}\rightarrow \mathfrak {R}^{2}$. The complex derivatives are special linear functions with orthogonal matrices and with positive determinant.

The differential of a complex valued function $f(x):S\rightarrow {\mathbb {C}}$, $S\subseteq {\mathbb {C}}$ can be expressed by

$$\begin{aligned} df=\frac{\partial f(x)}{\partial x}dx+i\frac{\partial f(x)}{\partial x^{*} }dx^{*} \end{aligned}$$

(19)

where $x^{*}$ is in complex domain. Here its derivative of real function has a orthogonal matrix with positive determinant.

In complex domain,

$$\begin{aligned} \frac{\partial J}{\partial V}=\frac{\partial J}{\partial e}\frac{\partial e}{\partial {\hat{y}}}\frac{\partial {\hat{y}}}{\partial V}=e_{o}\left[ V_{j,\mathfrak {R}}+iV_{j,\mathfrak {I}}\right] \end{aligned}$$

(20)

The Cauchy-Riemann equation are satisfy as $r={\hat{y}}_{\mathfrak {R}}-y$ and $s=\hat{y}_{\mathfrak {I}}$:

$$\begin{aligned} \begin{vmatrix} \frac{\partial r}{\partial \hat{y_{\mathfrak {R}}}}=1&\frac{\partial r}{\partial \hat{y_{\mathfrak {I}}}}=0\\ \frac{\partial s}{\partial {\hat{y}}_{\mathfrak {R}}}=0&\frac{\partial s}{\partial \hat{y_{\mathfrak {I}}}}=1 \end{vmatrix} >0,\quad \frac{\partial e_{0}}{\partial {\hat{y}}}=1 \end{aligned}$$

(21)

For the fully connected layer, two gradients are required. Since the Cauchy-Riemann equations hold for $r={\hat{y}}_{\mathfrak {R}}$ and $s={\hat{y}}_{\mathfrak {I}}$,

$$\begin{aligned} \begin{vmatrix} \frac{\partial r}{\partial V_{\mathfrak {R}}}=\varPhi _{\mathfrak {R}}^{(m)}&\frac{\partial r}{\partial V_{\mathfrak {I}}}=-\varPhi _{\mathfrak {I}}^{(m)}\\ \frac{\partial s}{\partial V}=\varPhi _{\mathfrak {I}}^{(m)}&\frac{\partial s}{\partial V_{\mathfrak {I}}}=\varPhi _{\mathfrak {R}}^{(m)} \end{vmatrix} >0 \end{aligned}$$

(22)

the partial derivative of ${\hat{y}}$ respect to V is

$$\begin{aligned} \frac{\partial {\hat{y}}}{\partial V}=\frac{\partial r}{\partial V_{\mathfrak {R}}} +i\frac{\partial s}{\partial V_{\mathfrak {R}}}=\varPhi _{\mathfrak {R}}^{(m)}+i\varPhi _{\mathfrak {I}}^{(m)} =\varPhi ^{(m)} \end{aligned}$$

(23)

We can calculate the gradient of cost function respect V, and for the rest of the hyper-parameters in the output layer using chain rule

$$\begin{aligned} \frac{\partial J}{\partial V}=\frac{\partial J}{\partial e}\frac{\partial e}{\partial {\hat{y}}}\frac{\partial {\hat{y}}}{\partial V}=(e_{o})(1)\varPhi ^{(m)})=e\varPhi ^{(m)} \end{aligned}$$

(24)

The gradient of ${\hat{y}}$ respect to $\varPhi ^{(m)}$ is obtain with aid of the Cauchy-Riemann equation, $r={\hat{y}}_{\mathfrak {R}}$ and $s={\hat{y}}_{\mathfrak {I}}$:

$$\begin{aligned} \begin{vmatrix} \frac{\partial r}{\partial \varPhi _{\mathfrak {R}}^{(m)}}=V_{\mathfrak {R}}&\frac{\partial r}{\partial \varPhi _{\mathfrak {I}}^{(m)}}=-V_{\mathfrak {I}}\\ \frac{\partial s}{\partial \varPhi _{\mathfrak {R}}^{(m)}}=V_{\mathfrak {I}}&\frac{\partial s}{\partial \varPhi _{\mathfrak {I}}^{(m)}}=V_{\mathfrak {R}} \end{vmatrix} >0 \end{aligned}$$

(25)

So

$$\begin{aligned} \frac{\partial {\hat{y}}}{\partial \varPhi ^{(m)}}=\frac{\partial r}{\partial \varPhi _{\mathfrak {R}}^{(m)}}+i\frac{\partial s}{\partial \varPhi _{\mathfrak {R}}^{(m)}}=V_{\mathfrak {R}}+iV_{\mathfrak {I}}=V \end{aligned}$$

(26)

The updating law for the output layer (13) is

$$\begin{aligned} V\left( k+1\right) =V\left( k\right) -\eta \phi ^{(\ell )}\left[ V_{j,\mathfrak {R}}+iV_{j,\mathfrak {I}}\right] e_{o}\left( k\right) \end{aligned}$$

(27)

The updating law for the parameters in the convolution operation is

$$\begin{aligned} W_{ij}(k+1)=W_{ij}(k)-\eta \frac{\partial J}{\partial W_{ij}} \end{aligned}$$

where J is defined in (12), $W_{ij}$ is parameter of the convolution operation.

3.2 Backpropagation for Complex Valued Convolution Operation

The error is back-propagated from the layer $\ell -1$ to the layer $\ell $ as

$$\begin{aligned} e_{\ell }\left( k\right) =e_{\ell -1}\left( k\right) \frac{\partial \phi ^{(\ell -1)}}{\partial t}W^{(\ell -1)} \end{aligned}$$

(28)

In backward order, the next layer is the activation layer with ${\mathbb {C}} ReLU$. It does not have parameters,

$$\begin{aligned} \frac{\partial J}{\partial \varphi ^{(\ell )}}=\frac{\partial J}{\partial \phi ^{(\ell )}}\varphi ^{(\ell )} \end{aligned}$$

(29)

After activation layer, the convolutional layer with filters is updated.

For classical CNN,

$$\begin{aligned} \frac{\partial J}{\partial \gamma _{\jmath }^{(\ell )}}=\sum _{a=0}^{N-f_{\ell } }\varphi _{a}^{(\ell )}\phi _{\jmath +a}^{(\ell -1)} \end{aligned}$$

(30)

where N is the size of the $\phi ^{(\ell -1)}$, $\gamma _{\jmath }^{(\ell )}$ is one of the elements in the filter $\varGamma ^{(\ell )}.$ The training law is

$$\begin{aligned} \begin{aligned} \gamma _{\jmath }^{(\ell )}(k+1)&=\gamma _{\jmath }^{(\ell )}(k)-\eta \frac{\partial J}{\partial \gamma _{\jmath }^{(\ell )}}\\ \frac{\partial J}{\partial \phi _{\jmath }^{(\ell -1)}}&=\sum _{a=0}^{f_{\ell } -1}\frac{\partial J}{\partial \varphi _{\jmath -a}^{(\ell )}}\frac{\partial \varphi _{\jmath -a}^{(\ell )}}{\partial \phi _{\jmath }^{(\ell -1)}}=\sum _{a=0}^{f_{\ell }-1}\frac{\partial j}{\partial \varphi _{\jmath -a}^{(\ell )} }\gamma _{a}^{(\ell )} \end{aligned} \end{aligned}$$

(31)

The training object of [29] is to improve image classification, while our object is nonlinear dynamic system modeling with respect to measurement noise and missing data. Unlike the CNN and [29], we apply the convolution operation only in sum of the products of cells. We also use two real-value equations for each layer training, we do not use complex-value to train each layer as in [29]. These two differences between the proposed CVCNN and [29] help us to simplify the training and the analysis, at the same time they do not affect modeling accuracy.

3.2.1 Feedforward Operation

The filters and the weights of CVCNN are defined as

$$\begin{aligned} \varGamma ^{(\ell )}= \begin{bmatrix} \gamma _{1,\mathfrak {R}}^{(\ell )}+i\gamma _{1,\mathfrak {I}}^{(\ell )}\\ \vdots \\ \gamma _{p,\mathfrak {R}}^{(\ell )}+i\gamma _{p,\mathfrak {I}}^{(\ell )} \end{bmatrix} \in {\mathbb {C}} \end{aligned}$$

(32)

and

$$\begin{aligned} W^{(\ell )}=\left[ W_{1,\mathfrak {R}}^{(\ell )}+iW_{1,\mathfrak {I}}^{(\ell )}\cdots W_{m,\mathfrak {R}}^{(\ell )}+iW_{m,\mathfrak {I}}^{(\ell )}\right] ^{T}\in {\mathbb {C}}^{m} \end{aligned}$$

(33)

The input of CVCNN is $\phi ^{(0)}= \begin{bmatrix} \phi _{1}^{(0)}\\ \phi _{2}^{(0)}\\ \cdots \\ \phi _{l}^{(0)} \end{bmatrix}$. In the first layer, the convolution operation leads into the feature map $\varphi ^{(1)}$,

$$\begin{aligned} \begin{aligned} \varphi _{1}^{(1)}&=\phi ^{(0)}*\varGamma _{1}^{(1)}= \begin{bmatrix} \varPsi _{1,1}^{(1)}\\ \vdots \\ \varPsi _{1,q}^{(1)} \end{bmatrix}\\&= \begin{bmatrix} \phi _{1}^{(0)}\\ \vdots \\ \phi _{q}^{(0)} \end{bmatrix} *\begin{bmatrix} \gamma _{1,\mathfrak {R}}^{(1)}+i\gamma _{1,\mathfrak {I}}^{(1)}\\ \vdots \\ \gamma _{p,\mathfrak {R}}^{(1)}+i\gamma _{p,\mathfrak {I}}^{(1)} \end{bmatrix} \end{aligned} \end{aligned}$$

(34)

where each element $\varPsi ^{(1)}\in {\mathbb {C}}$ is obtained by the convolution

$$\begin{aligned} \varPsi _{1,j}^{(1)}=\sum _{a=0}^{q}\phi _{a}^{(0)}\gamma _{j-a} \end{aligned}$$

(35)

Then the ${\mathbb {C}}ReLU$ is applied to the feature maps,

$$\begin{aligned} \phi _{1}^{(1)}={\mathbb {C}}\text {ReLU}(\varphi _{1}^{(1)}) \end{aligned}$$

(36)

The elements of this vector are

$$\begin{aligned} \begin{aligned} \phi _{1}^{(1)}&= \begin{bmatrix} \varPhi _{1,1}^{(1)}\\ \vdots \\ \varPhi _{1,q}^{(1)} \end{bmatrix} = \begin{bmatrix} {\mathbb {C}}\text {ReLU}(\varPsi _{1,1}^{(1)})\\ \vdots \\ {\mathbb {C}}\text {ReLU}(\varPsi _{1,q}^{(1)}) \end{bmatrix}\\&= \begin{bmatrix} \text {ReLU}(\varPsi _{1,1,\mathfrak {R}}^{(1)})+i\text {ReLU}(\varPsi _{1,1,\mathfrak {I}}^{(1)})\\ \vdots \\ \text {ReLU}(\varPsi _{1,3,\mathfrak {R}}^{(1)})+i\text {ReLU}(\varPsi _{1,p,\mathfrak {I}}^{(1)}) \end{bmatrix} \end{aligned} \end{aligned}$$

(37)

The last layer of the convolution operation is

$$\begin{aligned} {\hat{y}}=W^{(m)}\phi ^{(m-1)}=\left[ W_{1}^{(m)}\cdots W_{p}^{(m)}\right] \begin{bmatrix} \varPhi _{1}^{(m-1)}\\ \cdots \\ \varPhi _{p}^{(m-1)} \end{bmatrix} \end{aligned}$$

(38)

The output of CVCNN is

$$\begin{aligned} {\hat{y}}={\hat{y}}_{\mathfrak {R}}+i{\hat{y}}_{\mathfrak {I}} \end{aligned}$$

where ${\hat{y}}_{\mathfrak {R}}$ and ${\hat{y}}_{\mathfrak {I}}$ are defined as:

$$\begin{aligned} \begin{matrix} {\hat{y}}_{\mathfrak {R}}=W_{1,\mathfrak {R}}^{(m)}\varPhi _{1,\mathfrak {R}}^{(m-1)}-W_{1,\mathfrak {I}}^{(m)}\varPhi _{1,\mathfrak {I}}^{(m-1)}+\cdots \\ +W_{p,\mathfrak {R}}^{(m)}\varPhi _{p,\mathfrak {R}}^{(m-1)}-W_{p,\mathfrak {I}}^{(m)}\varPhi _{p,\mathfrak {I}}^{(m-1)}\\ \vdots \\ {\hat{y}}_{\mathfrak {I}}=W_{1,\mathfrak {R}}^{(m)}\varPhi _{1,\mathfrak {I}}^{(m-1)}+W_{1,\mathfrak {I}}^{(m)}\varPhi _{1,\mathfrak {R}}^{(m-1)}+\cdots \\ +W_{p,\mathfrak {R}}^{(m)}\varPhi _{p,\mathfrak {I}}^{(m-1)}+W_{p,\mathfrak {I}}^{(m)}\varPhi _{p,\mathfrak {R}}^{(m-1)} \end{matrix} \end{aligned}$$

(39)

here $\varPhi _{1,\mathfrak {R}}^{(m-1)}$ and $\varPhi _{1,\mathfrak {I}}^{(m-12)}$ are the real and imaginary part of the element $\varPhi _{1}^{(m)}$, respectively.

3.2.2 Backpropagation of CVCNN

The backpropagation of the convolution operation needs chain rule and partial derivatives to calculate the gradient of the complex element. In the activation layer, the gradient is

$$\begin{aligned}&r=\text {ReLU}(\varPsi _{\mathfrak {R}})^{(m)},\quad s=\text {ReLU}(\varPsi _{\mathfrak {I}}^{(m)})\nonumber \\&\begin{vmatrix} \frac{\partial r}{\partial \varPsi _{\mathfrak {R}}^{(m)}}=\varPsi _{\mathfrak {R}}^{(m)}&\frac{\partial r}{\partial \varPsi _{\mathfrak {I}}^{(m)}}=0\\ \frac{\partial s}{\partial \varPsi _{\mathfrak {R}}^{(m)}}=0&\frac{\partial s}{\partial \varPsi _{\mathfrak {I}}^{(m)}}=\varPsi _{\mathfrak {I}}^{(m)} \end{vmatrix} >0 \end{aligned}$$

(40)

This leads to

$$\begin{aligned} \frac{\partial \varPhi _{{}}^{(m)}}{\partial \varPsi _{{}}^{(m)}}=\varPsi _{\mathfrak {R}}^{(m)} \end{aligned}$$

(41)

The gradient through a activation layer is determinate by adding $\varPsi _{\imath j}^{(m)}$ into the corresponding vector $\varphi _{j}^{(m)},$

$$\begin{aligned} \frac{\partial J}{\partial \varphi _{\imath }}=\frac{\partial J}{\partial \phi _{\imath }^{(m)}}\varphi _{\imath }^{(m)} \end{aligned}$$

(42)

The convolutional layer is similar to the fully connected layer. The convolution operation can be regarded as the sum of products. Similar with (31),

$$\begin{aligned} \frac{\partial J}{\partial \gamma _{\jmath }^{(\ell )}}=\sum _{a=0}^{N-f_{\ell } }\varphi _{a}^{(\ell )}\phi _{\jmath +a}^{(\ell -1)} \end{aligned}$$

(43)

and the gradient through a convolutional layer is

$$\begin{aligned} \frac{\partial J}{\partial \phi _{\jmath }^{(\ell -1)}}=\sum _{a=0}^{f_{\ell } -1}\frac{\partial J}{\partial \varphi _{\jmath -a}^{(\ell )}}\frac{\partial \varphi _{\jmath -a}^{(\ell )}}{\partial \phi _{\jmath }^{(\ell -1)}}=\sum _{a=0}^{f_{\ell }-1}\frac{\partial j}{\partial \varphi _{\jmath -a}^{(\ell )} }\gamma _{a}^{(\ell )} \end{aligned}$$

(44)

4 Simulations

In this section, we use three benchmarks to show the effectiveness of the complex valued CNN (CVCNN) compared with the classical CNN, classical neural network (MLP), and some other recent methods. The architecture of the CNN is the same as the CVCNN, but the filters and the weights are different, they are real value and complex values.

CNN has two convolutional layer followed by a ReLU activation function and a max-pooling layer. For each benchmark, the number of filters in the convolutional layers are different. We use random walk to find the best possible combination. MLP has one hidden layer, the activation function is $tanh\left( \cdot \right) $. The hidden nodes are different according to each benchmark. The initial filters of CNN and CVCNN are chosen randomly in the range $[-1,1].$ The weights of MLP are also chosen in $[-1,1].$

In order to show the advantages of CVCNN for dynamic system modeling, we use the following two neural network models:

(1)
Series-parallel model as (5)
$$\begin{aligned} {\hat{y}}(k)=NN[y\left( k-1\right) ,\cdots u\left( k\right) ,\cdots ] \end{aligned}$$
(45)
where both the input $u\left( k\right) $ and the output $y\left( k-1\right) $ of identified system are fed to the neural network $NN\left[ \cdot \right] .$
(2)
Parallel model as (4)
$$\begin{aligned} {\hat{y}}(k)=NN[{\hat{y}}(k-1),\cdots +u\left( k\right) ,\cdots ] \end{aligned}$$
(46)
where only the input $u\left( k\right) $ of identified system is fed to the neural network $NN\left[ \cdot \right] .$ We use the output of the neural networks ${\hat{y}}(k-1)$ as the other part of neural network input. In the case of noise, (45) becomes
$$\begin{aligned} {\hat{y}}(k)=NN[y\left( k-1\right) +\rho ,\cdots +u\left( k\right) ,\cdots ] \end{aligned}$$
(47)
where $\rho $ is the random noise. (46) becomes
$$\begin{aligned} {\hat{y}}(k)=NN[{\hat{y}}(k-1)+\rho ,\cdots u\left( k\right) ,\cdots ] \end{aligned}$$

In the case of missing data, (45) becomes

$$\begin{aligned} {\hat{y}}(k)=NN[{\bar{y}}\left( k-1\right) ,\cdots {\bar{u}}\left( k\right) ,\cdots ] \end{aligned}$$

(48)

where ${\bar{y}}\left( k-1\right) $ and ${\bar{u}}\left( k\right) $ are from the data sets $\left\{ y\left( 1\right) ,\cdots y\left( N\right) \right\} $ and $\left\{ u\left( 1\right) ,\cdots u\left( N\right) \right\} $. (46) becomes

$$\begin{aligned} {\hat{y}}(k)=NN[{\tilde{y}}\left( k-1\right) ,\cdots {\tilde{u}}\left( k\right) ,\cdots ] \end{aligned}$$

where ${\tilde{y}}\left( k-1\right) $ and ${\tilde{u}}\left( k\right) $ are from the data sets $\left\{ {\hat{y}}\left( 1\right) ,\cdots {\hat{y}}\left( N\right) \right\} $ and $\left\{ u\left( 1\right) ,\cdots u\left( N\right) \right\} $.

In this paper, we select $30\%$ of the data as missing data.

4.1 Searching Hyper-parameters

There are not optimal methods to define the neural structure. Most of them use trial and error to find a good structure. We will use random search method to decide the hyper-parameters of the neural model. It is similar with [44].

We first randomly select the combinations of the hyper-parameters in total set. Then we search the best score, i.e. minimizing a hyper-parameter response function $\varUpsilon $. The algorithm is as follows for CNN:

1.
Choose the number of convolutional layers in CNN from (1,2 or 3 layers).
2.
Define the number of filters in each layer (1–100 filters) and its dimension(3-6 of length).
3.
Initialize the filters and synaptic weights of the output layer (range: −1 to 1).
4.
Carry out the simulation and obtain the score of the function $\varUpsilon $
5.
Repeat previous steps 15 times with randomly hyper-parameter settings.
6.
Choose the structure with the best score.

In case of MLP, the algorithm is as follows:

1.
define a two-layer MLP.
2.
Define the number of neurons or nodes in the hidden layer (1–100 nodes).
3.
Initialize the synaptic weights (range: −1 to 1).
4.
Carry out the simulation and obtain the score of the function $\varUpsilon $
5.
Repeat previous steps 10 times with randomly hyper-parameter settings.
6.
Choose the structure with the best score.

In both cases, data set is used without preprocessing. With these algorithms, the hyper-parameters are decided and shown in the following sections, which are the best score in the hyper-parameter response function $\varUpsilon $ according to the random search algorithm.

4.2 Gas Furnace Modeling

The data set of gas furnace is a benchmark example for nonlinear system modeling [45]. The input signal $u\left( k\right) $ is the flow rate of the methane gas. The output signal $y\left( k\right) $ is the concentration of $CO_{2}.$ There are 296 samples in 9 seconds. We use 200 samples for training, the other 96 samples for testing. We compare our CVCNN with CNN and MLP.

For the series-parallel models (45), (47) and (48), the input vector to the neural models are

$$\begin{aligned} \left[ y\left( k-1\right) ,\cdots y\left( k-5\right) ,u\left( k\right) ,\cdots u\left( k-4\right) \right] \end{aligned}$$

For the parallel model (46), the input vector to the neural models are $\left[ {\hat{y}}\left( k-1\right) ,u\left( k\right) ,\cdots \right. \left. u\left( k-10\right) \right] $. Each convolutional layer of CVCNN and CNN has 3 filters, the size of each filter is 3. The MLP has 50 nodes in the hidden layer. The amplitude of the noise is about $10\%$ of the output amplitude. The modeling errors of testing phase for gas furnace with the parallel model (46) are shown in Fig. 3.

The performance of the MLP becomes worse if we reduce the percentage of training data from $60\%$ to $40\%$, i.e, data number is from 200 to 120. The root mean square error (RMSE) goes from 0.1257 to 0.2435. The same occurs for CNN, increasing from 0.1466 to 0.1935 and for the CVCNN, the RMSE also increase, from 0.0829 to 0.1458. Reducing the amount of training data does increase the RMSE value. Both CNNs have less problems in deducing training data compared with MLP. The quantity of 200 data is chosen, because this quantity is commonly used for this benchmark.

4.3 First Order Nonlinear System

The following discrete-time first-order nonlinear system is another popular benchmark [38],

$$\begin{aligned} y(k+1)=\frac{y(k)}{1+y^{2}(k)}+u^{3}(k) \end{aligned}$$

(49)

The control input u(k) is periodic, $u(k)=B\sin \left( \frac{\pi k}{50}\right) +C\sin \left( \frac{\pi k}{20}\right) .$ In training phase, $B=C=1.$ In testing phase, $A=0.9,$ $B=1.1.$ We use 5000 data generated by (49) to train the neural models, and use 100 data for the testing.

For the series-parallel models (45), (47) and (48), the input vector to the neural models is

$$\begin{aligned} \left[ y\left( k-1\right) ,\cdots y\left( k-10\right) ,u\left( k\right) ,\cdots u\left( k-13\right) \right] \end{aligned}$$

For the parallel model (46), the input vector to the neural models are $\left[ {\hat{y}}\left( k-1\right) ,u\left( k\right) ,\cdots \right. \left. u\left( k-12\right) \right] $. The CVCNN and CNN have 8 filters in each convolutional layer. MLP has 35 nodes in the hidden layer. The amplitude of the noise is around $10\%$ of the output amplitude.

The system identification of the testing results for the first-order nonlinear system with the parallel model (46) are shown in Fig. 4.

4.4 Wiener-Hammerstein System

Wiener-Hammerstein system is the series connection of a linear system, a static non linearity and another linear system. It is a electrical circuit consisting of three cascade blocks [46]. This benchmark system has 14, 000 samples. We use 1, 000 for testing, the other part data are used for training. For the series-parallel models (45), (47) and (48), the input vector to the neural models are $\left[ y\left( k-1\right) ,\cdots y\left( k-4\right) ,u\left( k\right) ,\cdots u\left( k-5\right) \right] $, with noise amplitude around $10\%$ of output amplitude For the parallel model (46), the input vector to the neural models are $\left[ {\hat{y}}\left( k-1\right) ,\cdots \hat{y}\left( k-3\right) ,u\left( k\right) ,\cdots u\left( k-80\right) \right] $. For CVCNN and CNN, each convolutional layer has 15 filters, the seize is 6. The MLP is the same as the model of the gas furnace, using 80 nodes in the hidden layer. The modeling errors of the series-parallel model with noise data (47) are shown in Fig. 5. The modeling errors of the parallel model (46) are shown in Fig. 6. We can see that the parallel model with the CVCNN is alternative in the context of system modeling, due to its better performances compared to other methods.

4.5 Discussion

The main metric to evaluate performance is the mean square error (MSE) defined by

$$\begin{aligned} \frac{1}{N}\varSigma _{k=1}^{N}\left[ y(k)-{\hat{y}}(k)\right] ^{2} \end{aligned}$$

We also use the other metrics as

$$\begin{aligned} \begin{aligned}&\text {R}^{2}:1-\frac{\frac{1}{N}\varSigma _{k=1}^{N}\left[ y(k)-\hat{y}(k)\right] ^{2}}{\frac{1}{N}\varSigma _{k=1}^{N}\left[ y(k)-\bar{y}(k)\right] ^{2}}\\&\text {Mean absolute error(MAE): }\frac{1}{N}\varSigma _{k=1}^{N}\left| y(k)-{\hat{y}}(k)\right| \\&\text {Root mean squared error (RMSE): }\sqrt{\frac{1}{N}\varSigma _{k=1} ^{N}\left[ y(k)-{\hat{y}}(k)\right] ^{2}} \end{aligned} \end{aligned}$$

For the above three benchmarks, we use five metrics to compare our CVCNN with CNN and MLP. The MLP has one hidden layer. CNN and CVCNN have the same structure as above.

Tables 1, 2, 3, 4 show the comparison results of the three models. We can see that in the most cases, the proposed CVCNN performance better than the others

Table 1 Performance metrics for gas furnace benchmark using series parallel model

Full size table

Table 2 Performance metrics for gas furnace benchmark using parallel model

Full size table

Table 3 Performance metrics for gas furnace benchmark using series parallel model with noise data

Full size table

Table 4 Performance metrics for gas furnace benchmark using series parallel model with missing data

Full size table

Table 5 shows the modeling errors of different recent methods. In order to do a fair comparison, we use the same models as the Arima model and PEC-WNN in [47]. We form the problems of prediction, modeling and identification into the same sense: we use the trained model to estimate the next value in the time series, and the objects are the same. We can see that our CVCNN has better performances than the other methods in missing data case. PEC-WNN is better than ours, but it has to use complete data set. None of the other methods considers missing data.

Table 5 Performances of the other recent methods for the gas furnace modeling

Full size table

For this benchmark we can conclude that:

In the cases of big noise and missing data, CVCNN gives good modeling accuracy than CNN and MLP.
If the parallel model is used, both CVCNN and CNN work well.
If the series-parallel model can be used, both CNN and MLP work well.

Tables 6, 7, 8, 9 shown the comparison results of Wiener-Hammerstein system. We can see that similar with the gas furnace modeling, our method gives better results for Wiener-Hammerstein system. It should be noted that our method has better performances in the cases of missing data and big disturbances. CVCNN, CNN and MLP give good modeling accuracy with series-parallel. For the cases of noise and missing data, the CVCNN and CNN work well, but MLP cannot model correctly.

Table 6 Performance metrics for Wiener-Hammerstein benchmark using series parallel model

Full size table

Table 7 Performance metrics for Wiener-Hammerstein benchmark using parallel model

Full size table

Table 8 Performance metrics for Wiener-Hammerstein benchmark using series parallel model with noisy data

Full size table

Table 9 Performance metrics for Wiener-Hammerstein benchmark using series parallel model with missing data

Full size table

Table 10 presents the results of other four methods for dynamic system modeling: LSTM and SVM are popular sadistical learning methods, BLA uses Spearman correlation for model optimization, PNLSS uses polynomial nonlinear state-space model. All of these methods use the trained model to estimate the next value in the time series, and the modeling objects are the same. We can see that our CVCNN has very similar results with the other for this benchmark. But for the cases of big noise and missing data, they did not show results.

Table 10 Performance of Wiener-Hammerstein system modeling using the other methods

Full size table

Tables 11, 12, 13, 14 give the modeling results of the nonlinear system. Clearly, CVCNN performs better than the other methods. We can see that for the first-order nonlinear system, CVCNN, CNN and MLP give good modeling accuracy with series-parallel. For the cases of noises and missing data, CVCNN and CNN work well, but MLP cannot.

Table 11 Performance metrics for nonlinear system benchmark using series parallel model

Full size table

Table 12 Performance metrics for nonlinear system benchmark using parallel model

Full size table

Table 13 Performance metrics for nonlinear system benchmark using series parallel model with noise data

Full size table

Table 14 Performance metrics for nonlinear system benchmark using series parallel model with missing data

Full size table

5 Conclusion

Measurement noise and missing data are important disturbances in system identification, which can affect directly modeling accuracy. CVCNN has much power and better modeling accuracy than the other classical modeling methods for these cases, although the mathematical analysis of CVCNN is more difficult. So CVCNN is efficient and robust model compare to classic ones for nonlinear dynamic system modeling with bid uncertainties. Our further work will be on CVCNN based robust control.

References

Noel JP, Kerschen G (2017) Nonlinear system identification in structural dynamics: 10 more years of progress. Mech Syst Signal Process 83:2–35
Article Google Scholar
Lopez M, Morales J, Yu W (2020) Frequency domain CNN and disipate energy approach for damage detection in building structures. Soft Comput 24:15821–1584051
Article Google Scholar
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Proceedings of the advances in neural information processing systems (NIPS’06), pp. 153–160
Hinton G, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Sermanet P, LeCun Y (2011) Traffic sign recognition with multi-scale convolutional networks. In: Proceedings of the 2011 international joint conference on neural networks (IJCNN)
Mao L, Li X, Yang D et al (2021) Convolutional feature frequency adaptive fusion object detection network. Neural Process Lett. https://doi.org/10.1007/s11063-021-10560-4
Article Google Scholar
Soleymanpour S, Sadr H, Beheshti H (2020) An efficient deep learning method for encrypted traffic classification on the web. In: 2020 6th International conference on web research (ICWR), IEEE, pp. 209–216
Sultana F, Sufian A, Dutta P (2018). Advancements in image classification using convolutional neural network. In: 2018 Fourth international conference on research in computational intelligence and communication networks (ICRCICN), IEEE, pp. 122–129
Yadav SS, Jadhav SM (2019) Deep convolutional neural network based medical image classification for disease diagnosis. J Big Data 6(1):1–18
Article Google Scholar
Altan G, Yayik A, Kutlu Y (2021) Deep learning with ConvNet predicts imagery tasks through EEG. Neural Process Lett 53:1–16
Article Google Scholar
Ismail FH et al (2019) Deep learning for time series classification: a review. Data Min Knowl Discov 33(4):917–963
Article MathSciNet MATH Google Scholar
Hatami N, Gavet Y, Debayle J (2017) Classification of time-series images using deep convolutional neural networks. arXiv:1710.00886v2
Wang Y et al. (2016) CNNpack: Packing convolutional neural networks in the frequency domain. In: NIPS
Lee H, Groose R, Ranganath R, Ng A (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th international conference on machine learning ( ICML09), pp. 609–616
Kiranyaz S, Avci O, Abdeljaber O, Ince T, Gabbouj M, Inman DJ (2021) 1D convolutional neural networks and applications: a survey. Mech Syst Signal Process 151:107398
Article Google Scholar
Borovykh A, Bohte S, Oosterlee CW (2017) Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691
Javedani SH et al (2019) Short-term load forecasting by using a combined method of convolutional neural networks and fuzzy time series. Energy 175:365–377
Article Google Scholar
Binkowski M, Marti G, Donnat P (2017) Autoregressive convolutional neural networks for asynchronous time series. arXiv:1703.04122
Namasudra S, Dhamodharavadhani S, Rathipriya R (2021) Non-linear neural network based forecasting model for predicting COVID-19 cases. Neural Process Lett. https://doi.org/10.1007/s11063-021-10495-w
Article Google Scholar
Wang Y (2017) A new concept using LSTM neural networks for dynamic system identification. In: 2017 American control conference (ACC), IEEE, pp 5324–5329
Genc S (2017) Parametric system identification using deep convolutional neural networks. In: Proceedings of the 2017 international joint conference on neural networks (IJCNN17), pp. 2112–2119
Lopez M, Yu W (2019) Impact of random weights on nonlinear system identification using convolutional neural networks. Inf Sci 477(1):1–14
MathSciNet Google Scholar
Kang Y, Chen S, Wang X, Cao Y (2018) Deep convolutional identifier for dynamic modeling and adaptive control of unmanned helicopter. IEEE Trans Neural Netw Learn Syst 30:524–538
Article MathSciNet Google Scholar
Virtue P, Yu SX, Lustig M (2017) Better than real: complex-valued neural nets for MRI fingerprinting. IEEE Int Conf Image Process (ICIP) 2017:3953–3957. https://doi.org/10.1109/ICIP.2017.8297024
Article Google Scholar
Xiong T et al (2015) Forecasting interval time series using a fully complex-valued RBF neural network with DPSO and PSO algorithms. Inf Sci 305:77–92
Article Google Scholar
Dramsch J, Lüthje M, Christensen AN (2021) Complex-valued neural networks for machine learning on non-stationary physical data. Comput Geosci 146:104632 (Elsevier)
Article Google Scholar
Guberman N (2016) On complex valued convolutional neural networks. arXiv preprint arXiv:1602.09046
Benvenuto N, Piazza F (1992) On the complex backpropagation algorithm. IEEE Trans Signal Process 40(4):967–969
Article Google Scholar
Hirose A, Yoshida S (2011) Comparison of complex-and real-valued feedforward neural networks in their generalization ability. In: International conference on neural information processing. Springer, Berlin, Heidelberg
Hirose A (2003) Complex-valued neural networks: theories and applications, vol 5. World Scientific, Singapore
Book MATH Google Scholar
Quan Y, Chen Y, Shao Y, Teng H, Xu Y, Ji H (2021) Image denoising using complex-valued deep CNN. Pattern Recognit 111:107639
Article Google Scholar
Zhang Z, Wang H, Xu F, Jin YQ (2017) Complex-valued convolutional neural network and its application in polarimetric SAR image classification. IEEE Trans Geosci Remote Sens 55(12):7177–7188
Article Google Scholar
Meyer M, Kuschk G, Tomforde, S (2020) Complex-valued convolutional neural networks for automotive scene classification based on range-beam-doppler tensors. In: 2020 IEEE 23rd International conference on intelligent transportation systems (ITSC), IEEE, pp. 1–6
Yao X, Shi X, Zhou F (2020) Human activities classification based on complex-value convolutional neural network. IEEE Sens J 20(13):7169–7180
Article Google Scholar
Zimmermann HG et al (2011) Comparison of the complex valued and real valued neural networks trained with gradient descent and random search algorithms. In: Proc. of ESANN 2011
Narendra K, Parthasarathy K (1990) Identification and control of dynamical systems using neural networks. IEEE Trans Neural Netw 1(1):4–27
Article Google Scholar
Ruta D, Gabrys B (2007) Neural network ensembles for time series prediction. Int Joint Conf Neural Netw 2007:1204–1209
Article Google Scholar
Gu H, Qing G, Wang Y, Hong S, Gacanin H, Adachi F (2020) Deep complex-valued convolutional neural network for drone recognition based on RF fingerprinting
Tan X, Li M, Zhang P, Wu Y, Song W (2019) Complex-valued 3-D convolutional neural network for PolSAR image classification. IEEE Geosci Remote Sens Lett 17(6):1022–1026
Article Google Scholar
Hongyo R, Egashira Y, Yamaguchi K (2018) Deep neural network based predistorter with ReLU activation for Doherty power amplifiers. Asia-Pacific Microw Conf (APMC) 2018:959–961
Google Scholar
James G, Burley D (2002) Matematicas avanzadas para ingenieria. Pearson Educacion, London
Google Scholar
Bergstra J, Bengio Y (2011) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
MathSciNet MATH Google Scholar
Box G, Jenkins G, Reinsel G (2008) Time series analysis: forecasting and control, 4th edn. Wiley, Hoboken
Book MATH Google Scholar
Schoukens J, Suykens J, Ljung L (2009) Wiener-Hammerstein benchmark. In: Proceedings of the 5th IFAC symposium on system identification
Ustundag BB, Kulaglic A (2020) High-performance time series prediction with predictive error compensated wavelet neural networks. IEEE Access 8:210532–210541
Article Google Scholar
George K, Mutalik P (2017) A multiple model approach to time-series prediction using an online sequential learning algorithm. IEEE Trans Syst, Man, Cybern: Syst 49(5):976–990
Article Google Scholar
Gonzalez J, Yu W (2018) Non-linear system modeling using LSTM neural networks. IFAC-PapersOnLine 51(13):485–489
Article Google Scholar
Shaikh MAH, BarbÃ K (2019) Wiener Hammerstein system identification: a fast approach through spearman correlation. IEEE Trans Instrum Measurement 68(5):1628–1636
Article Google Scholar
Marconato A, Schoukens J (2009) Identification of Wiener-Hammerstein benchmark data by means of support vector machines. IFAC Proc Vol 42(10):816–819
Article Google Scholar
Paduart J, Lauwers L, Pintelon R, Schoukens J (2012) Identification of a WienerHammerstein system using the polynomial nonlinear state space approach. Control Eng Pract 20(11):1133–1139
Article Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Control Automático, CINVESTAV-IPN, Ciudad de México, Mexico
Mario Lopez-Pacheco & Wen Yu

Authors

Mario Lopez-Pacheco
View author publications
You can also search for this author in PubMed Google Scholar
Wen Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen Yu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lopez-Pacheco, M., Yu, W. Complex Valued Deep Neural Networks for Nonlinear System Modeling. Neural Process Lett 54, 559–580 (2022). https://doi.org/10.1007/s11063-021-10644-1

Download citation

Accepted: 09 September 2021
Published: 23 September 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s11063-021-10644-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Complex Valued Deep Neural Networks for Nonlinear System Modeling

Abstract

Similar content being viewed by others

Fast Fitting Method of Complex Network Based on Deep Learning

Modeling of Complex Multidimensional Nonlinear Systems Using Neural System with Deep Architectures

Complex-Valued Feedforward Neural Networks Learning Without Backpropagation

1 Introduction

2 Nonlinear Dynamic System Modeling with Complex Valued CNN

2.1 Deep Nneural Networks for Dynamic System Modeling

2.2 Complex Valued CNN

2.3 Complex Valued CNN for Dynamic System Modeling

3 Complex Valued CNN Training

3.1 Backpropagation in Complex Domain

3.2 Backpropagation for Complex Valued Convolution Operation

3.2.1 Feedforward Operation

3.2.2 Backpropagation of CVCNN

4 Simulations

4.1 Searching Hyper-parameters

4.2 Gas Furnace Modeling

4.3 First Order Nonlinear System

4.4 Wiener-Hammerstein System

4.5 Discussion

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Complex Valued Deep Neural Networks for Nonlinear System Modeling

Abstract

Similar content being viewed by others

Fast Fitting Method of Complex Network Based on Deep Learning

Modeling of Complex Multidimensional Nonlinear Systems Using Neural System with Deep Architectures

Complex-Valued Feedforward Neural Networks Learning Without Backpropagation

1 Introduction

2 Nonlinear Dynamic System Modeling with Complex Valued CNN

2.1 Deep Nneural Networks for Dynamic System Modeling

2.2 Complex Valued CNN

2.3 Complex Valued CNN for Dynamic System Modeling

3 Complex Valued CNN Training

3.1 Backpropagation in Complex Domain

3.2 Backpropagation for Complex Valued Convolution Operation

3.2.1 Feedforward Operation

3.2.2 Backpropagation of CVCNN

4 Simulations

4.1 Searching Hyper-parameters

4.2 Gas Furnace Modeling

4.3 First Order Nonlinear System

4.4 Wiener-Hammerstein System

4.5 Discussion

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation