Abstract
Convolutional neural network (CNN) is recognized as state of the art of deep learning algorithm, which has a good ability on the image classification and recognition. The problems of CNN are as follows: the precision, accuracy and efficiency of CNN are expected to be improved to satisfy the requirements of high performance. The main work is as follows: Firstly, wavelet convolutional neural network (wCNN) is proposed, where wavelet transform function is added to the convolutional layers of CNN. Secondly, wavelet convolutional wavelet neural network (wCwNN) is proposed, where fully connected neural network (FCNN) of wCNN and CNN are replaced by wavelet neural network (wNN). Thirdly, image classification experiments using CNN, wCNN and wCwNN algorithms, and comparison analysis are implemented with MNIST dataset. The effect of the improved methods are as follows: (1) Both precision and accuracy are improved. (2) The mean square error and the rate of error are reduced. (3) The complexitie of the improved algorithms is increased.
Similar content being viewed by others
1 Introduction
Convolutional neural network (CNN) is a typical deep learning method which is based on feature extraction of convolution calculation [9]. It is widely applied to fields of prediction, classification [14] etc. CNN can solve high-dimensional problems which are difficult for traditional machine learning methods [19]. The ability to minimize the system error between the label and the inference [22] of CNN is much more powerful especially in the application of image processing. The neuron weights [12] of CNN are modified by forward propagation and error back propagation [15]. In recent years, the ability of CNN becomes more powerful because the distributed computing power has been greatly improved. Apart from image recognition [3], CNN are also applied in the other fields [11] such as text classification [26], control system [1] and target tracking [21].
The development history of CNN is as follows
The earliest study about CNN can be traced back to Fukushima, who mimicked the visual cortex of an organism and proposed the Neocognition model [7]. Time-Delay Neural Network (TDNN) was proposed by Alexander Waibel et al. in 1987 [27]. It is proved in TDNN that more hidden layers have greater feature extraction capabilities, which becomes the foundation of further optimization of CNN. After a series of improvements, He-Kaiming et al. released ResNet in 2015 [8]: the network manages to skip some neuron nodes to achieve higher performance. In 2017, Gao Huang et al. proposed DenseNet.
Problems of CNN can be summarized as follows
(1) The precision, accuracy and efficiency of CNN are expected to be improved. (2) High-dimensional information contains more details, which is difficult to be learned such as datasets of MINIST and CIFAR. Even human brain also tends to ignore the high-dimensional information. (3) CNN is more complex than classical neural network, but the trained model of CNN cannot be well explained. It is proved that randomly generated network of CNN can solve difficult problems better than the carefully designed network sometimes. More intelligent module which can identify more detailed information is expected.
Wavelet transform (WT) is often used in deep learning [5, 16, 24]. Many features can be obtained by the discrete wavelet transform which have been improved by researches. The application fields based on WT and deep learning methods are image classification [10, 23], computer vision [4, 17], texture classification [6], etc.
The applications based on wavelet neural network (WNN) in deep learning are as follows
In 2019, Pengju Liu et al. proposed a Multi-level Wavelet Convolutional Neural Networks(MWCNN) [16], which is proved to increase the receptive field by reducing the number of map. The Multi-Path Learnable Wavelet Neural Network for Image Classification was introduced by De Silva et al. [5]. This model introduces a multi-path layout with several levels of wavelet decompositions. In the domain of prediction, a convolutional LSTM network using the wavelet decomposition has been proposed in 2018 [28]. It takes the wavelet decomposition as the method of feature extraction rather than the manual feature extraction, which has been also proved by Kiskin et al. in 2017 [13].
The advantages of wavelet analysis are as follows
Wavelet analysis has been widely used in signal processing and analysis. Wavelet analysis method is called mathematical microscope [2, 18], which is considered as a powerful tool for zooming details of sound, image, etc. Although the wavelet transformation has some complexity [32], the powerful detail extraction ability of wavelet transformation is helpful and important to solve the above problems of CNN [20].
The motivation of this research is to solve the CNN’s problems based on the advantages of the WT. The importance of the research is that the improvements of CNN neurons are focused. Different from the ability of network with deeper layers, it is believed that the improvements of each neuron of CNN can improve the features identification and learning ability of the whole CNN [30]. Wavelet analysis is adopted [29] to improve the CNN network in this study.
The contributions of this study are as follows
(1) The wavelet-based Convolutional Neural Network (wCNN) is proposed, where the wavelet transformation is adopted as the activation function in Convolutional Pool Neural Network (CPNN) of CNN. (2) Based on wCNN, the wavelet-ased Convolutional wavelet Neural Network (wCwNN) is proposed, where the Fully connected Neural Network (FCNN) of wCNN is replaced by wavelet Neural Network (wNN). (3) Comparative experiments between CNN, wCNN and wCwNN are implemented on the MNIST dataset.
The following sections are organized as follows
The traditional CNN model is described in the second section. The improved wCNN is proposed in the third section. The improved wCwNN is proposed in the fourth section. The performance of CNN, wCNN and wCwNN is verified, analyzed and compared respectively with MNIST dataset in the fifth section. Discussion, conclusions and further research are given in the sixth section.
2 Model of convolutional neural network (CNN)
2.1 Structure of CNN
The structure of classical CNN is shown in Fig. 1. There are two parts in CNN: the first part is CPNN, and the second part is FCNN. In CPNN, the first layer is an input layer, and the following layers of CPNN are several pairs of convolutional layers and pooling layers. In FCNN, the first layer is an input layer, and the second layer of FCNN is an output layer, both layers of FCNN are fully connected.
The relation and features of CPNN and FCNN are as follows. (1) The input layer of CPNN is the input layer of CNN; (2) The last layer of CPNN is the input layer of FCNN; (3) The output layer of FCNN is the output layer of CNN; (4) The activation function of the convolutional layer in CPNN and the output layer in FCNN is sigmoid function; (5) There are not any activation functions in the input layer and pooling layer of CPNN and the input layer of FCNN.
2.2 Algorithm of CNN
The algorithm of CNN can be described as follows: (1) Initializing weights between layers and bias of neurons. (2) Forward propagating. (3) Calculating the mean square error (MSE) of all samples according to the loss function. (4) Calculating the errors of back propagating for each layer, which are the results of derivation by the chain rule. (5) Applying gradient to adjust the weights and bias according to the back-propagated errors. (6) Repeating the step (2) to step (5) until the MSE is small enough. (6) Evaluating the accuracy, precision and efficiency.
2.2.1 Forward propagation of CNN
Forward propagation of CNN is the calculation process from the input layer to the output layer, which can be described as follows: (1) The input layer of CNN is filled by a two-dimensional matrix of pixels of an image. (2) Forward propagation is calculated in convolutional and pooling layers (CPNN). (3) Forward propagation is calculated in fully connected layer (FCNN).
Definition 1: netl and Ol are the input and the output of neurons in layer l. The output of each neuron can be calculated according to the input and the activation function of each neuron. l is the layer number, e.g. l = 1 stands for the first layer, and l = − 1 stands for the last layer. i and j are the row number and column number respectively.
According to the above definition, net−1, net−2 and net−3 stand for the input of the last FCNN layer, the input of the first FCNN layer and the input of the layer before FCNN (i.e. the last layer of pooling layers) respectively. The data structures of netl and Ol of each layer of CPNN are two-dimensional matrix, while the netl and Ol of each layer in FCNN are one-dimensional vectors.
Definition 2: \( {w}_{ij}^l \) and \( {b}_j^l \) are the weights and bias of layer l. \( {w}_{ij}^{-1} \) and \( {b}_j^{-1} \) are the weights and bias of the last layer of FCNN. If the layer l is a convolutional layer or a pooling layer, the size of the convolutional kernel or the pooling windows can be expressed as sizel × sizel. If layer l is a fully-connected layer, the number of neurons is expressed as sizel.
Definition 3: int(x) is the function for getting the integer part of x, e.g., int(5.1) = int(5.7) = 5.
Forward propagation of convolutional layer
The input of convolutional layer (netl) can be calculated according to Eq. (1). The \( {net}_{mn}^l \) stands for each input value of neurons in layer l. The convolution(Ol − 1, wl, m, n) is the function for convolution calculations. The Ol − 1 is the output of the previous layer. The wl is the matrix of weights between the input of layer l (netl) and the output of the previous layer (Ol − 1). The bl is the bias of layer l.
An example of convolution operation is provided. If \( x={\displaystyle \begin{array}{cc}\begin{array}{cc}{x}_{11}& {x}_{12}\\ {}{x}_{21}& {x}_{22}\end{array}& \begin{array}{c}{x}_{13}\\ {}{x}_{23}\end{array}\\ {}{x}_{31}\kern0.5em {x}_{32}& {x}_{33}\end{array}} \), \( y={\displaystyle \begin{array}{cc}{y}_{11}& {y}_{12}\\ {}{y}_{21}& {y}_{22}\end{array}} \), the formula of convolution(x, y) can be expressed as Eq. (2):
The output of the convolutional layer l (\( {O}_{mn}^l \)) can be calculated as Eq. (3), where sigmoid() is the activation function.
Forward propagation of pooling layer
Definition 4: The function pool(x) represents the average pooling of matrix x. The formula of pool(x) can be expressed as Eq. (4). The sizel stands for the size of the pooling window.
An example of average pooling is provided: If \( x={\displaystyle \begin{array}{cc}\begin{array}{cc}{\mathrm{x}}_{11}& {\mathrm{x}}_{12}\\ {}{\mathrm{x}}_{21}& {\mathrm{x}}_{22}\end{array}& \begin{array}{cc}{\mathrm{x}}_{13}& {\mathrm{x}}_{14}\\ {}{\mathrm{x}}_{23}& {\mathrm{x}}_{24}\end{array}\\ {}\begin{array}{cc}{\mathrm{x}}_{31}& {\mathrm{x}}_{32}\\ {}{\mathrm{x}}_{41}& {\mathrm{x}}_{42}\end{array}& \begin{array}{cc}{\mathrm{x}}_{33}& {\mathrm{x}}_{34}\\ {}{\mathrm{x}}_{43}& {\mathrm{x}}_{44}\end{array}\end{array}} \), The pooling result is calculated as Eq. (5).
According to Eq. (4), the output of the pooling layer Ol is according to the output of the previous layer (Ol − 1). In other words, the input of the pooling layer l (netl) is as same as the output of the previous layer (Ol − 1).
Forward propagation of fully-connected layer
The total number of neurons in the first layer of FCNN (size−2 × 1) is equal to the number of neurons of in the last layer of CPNN (size−3 × size−3), which can be expressed as size−2 × 1 = size−3 × size−3. The output of the first layer of FCNN (\( {O}_i^{-2} \)) is transformed from the output of the last layer of CPNN (\( {O}_{mn}^{-3} \)). The transform relation between \( {O}_i^{-2} \) and \( {O}_{mn}^{-3} \) can be expressed as Eq. (6).
The result of forward propagation is \( {\hat{y}}_n \), which can be formulated in Eq. (7) to Eq. (9).
Back propagation of CNN
There are three kinds of back propagation (BP) in CNN algorithm: BP in fully-connected layer, BP in pooling layer and BP in convolutional layer.
Definition 5: δl is defined as the input error of layer l.
Definition 6: L is the mean square error (MSE) of all samples, which can be formulated as Eq. (10). The closer the values of \( {\hat{y}}_n \) and yn, the better the training effect is, because \( {\hat{y}}_n \) is the prediction of xn, and yn is the label of xn. If each value of \( {\hat{y}}_n \) is very close to yn, the value of L will be very small, which means that the training effect is very good and the trained model has a good fitting.
Back propagation of fully-connected layer
According to the above definition, \( {\updelta}_j^{-1} \) is defined as the input error of the last layer of FCNN, which is formulated as Eq. 11. yn is the labels of training and testing samples. \( {\hat{y}}_n \) is the predictive result of samples. n is number (ID) of the samples. N is the total number of samples.
\( {\delta}_i^{-2} \) is defined as the input error of the first layer of FCNN. The size of \( {\delta}_i^{-2} \) is size−2 × 1. \( {\delta}_{mn}^{-3} \) is defined as the back propagation error of previous layer of FCNN (the last layer of CPNN before the first layer of FCNN). The size of \( {\delta}_{mn}^{-3} \) is size−3 × size−3. The transform relation between \( {\delta}_i^{-2} \) and \( {\delta}_{mn}^{-3} \) can be expressed as Eq. (12):
The error back propagation from the first layer of FCNN to the last pooling layer in CPNN is expressed as Eq. (13):
Backpropagation of pooling layer
If the layer l is a convolutional layer, the layer l + 1 is a pooling layer. Functions of pool calculations can be expressed as Eq. (14) to Eq. (16):
Definition 7: Function of padding(x): matrix x can be expanded with 0 around as Eq. (14):
Definition 8: Function of rotate(x): matrix x can be rotated 180 degrees as Eq. (15):
The input error of convolutional layer is calculated by Eq. 16:
Backpropagation of convolutional layer
Definition 9: Function poolExpand(x): the size and data of the output of pooling layer is expanded to the input of pooling layer. For example, matrix xuv (output of pooling layer) is replaced by matrix ymn (input of pooling layer) according to the function poolExpand(x) which is expressed as Eq. (17). sizel is the size of pooling window.
For example, if \( x={\displaystyle \begin{array}{cc}{x}_{11}& {x}_{12}\\ {}{x}_{21}& {x}_{22}\end{array}} \), the result of poolExpand(x) is calculated as Eq. (18):
If the layer l is a pooling layer, then the layer (l + 1) is a convolutional layer, and the input error of pooling layer can be calculated as the Eq. (19),:
Adjustment of weights and parameters of CNN
The change value of weights and bias can be calculated as Eq. (20) to Eq. (21):
The updated value of weights and bias can be calculated as Eq. (24) to Eq. (27). η _ CPNN is the learning rate of CNN:
Pseudocode of CNN
Features and labels are contained in training set, which are learned by the model of CNN. Weights and bias are adjusted in the training process.
-
(1)
Definition of Adjustment cycle and Simulation process
Definition 10: Adjustment cycle (AC): In each AC, all the weights and biases are adjusted one time according to Eq. (20) to Eq. (27).
Definition 11: Simulation process (SP): Each SP is a complete training process. 1SP contains many continuous ACs. Each SP starts from the first AC (for example: t = 1) to the last AC (for example: t = 6000).
The relationship between 1SP and 1 AC is that 1SC is composed of nACs.
-
(2)
Training algorithm of CNN
The effect of each 1SP is measured by the loss function, which is expressed as Eq. (28):
In Eq. (28), n is the number of each training sample, and N represents the total number of training samples. train _ p(n) is the result of forward propagation of the nth training sample, which is also expressed as \( {\hat{y}}_n \). train _ y(n) is the label of nth training sample, which is also expressed as yn.
The pseudocode of CNN is listed in Algorithm 1.
3 Wavelet transform
Wavelet transform (WT) is an ideal method to process details of signals. WT provides a Time-Frequency Window which can capture higher and lower resolution of details of signals. The problem of Fourier transform [25] (FT) is that the window size cannot be changed when the frequency is changed. This problem can be solved by WT. The ψ(a, b) is called wavelet generating function, which can be expressed as \( \psi \left(a,b\right)=\frac{1}{\sqrt{a}}{\int}_{-\infty}^{\infty }f(t)\ast \varphi \left(\frac{t-b}{a}\right) \), where a and b are the scale parameters which control the extension and translation of function.
ψ(a, b) is designed according to the following conditions
(1) Only in a very small domain, the function value is not 0, and other domains are 0. In other words, translating the signal in timeline is same to adding a window on the original signal. (2) The integral value of function in the x axis must be 0. (3) The transform must be reversible. There are many wavelet generating functions such as: (1) haar wavelet, (2) db wavelet [31], (3) sym wavelet [15], (4) coif series wavelet, etc. The wavelet function of this study is \( \varphi (x)=\cos (1.75t)\ast {e}^{-\frac{t^2}{2}} \).
The processes of wavelet transform are visualized as follows
In Fig. 2, the error is the difference between the signal of wavelet inverse transformation and the original signal. The scale is the parameter of wavelet function, which controls the extension of wavelet function. When the scale parameter is changed, the wavelet transform’s ability of information extraction to original signal is changed.
In summary, by adjusting the scale and translation, wavelet transform can learn the different feature. So, richer feature can be learned by adding the wavelet transformation into the CNN.
4 Model of wavelet convolutional neural network (wCNN)
4.1 Structure of wCNN
The improvement of the proposed wCNN is that: the activation function F() of the convolutional layer in CNN is replaced by the Ψ(). The F() of CNN is sigmoid function, and the Ψ() of wCNN is wavelet scale transformation function.
The structure of proposed wCNN is that: The first part of wCNN is Wavelet Convolutional Pooling Neural Network (wCPNN), and the second part is Fully Connected Neural Network (FCNN). The structure of wCNN is shown in Fig. 3.
4.2 Algorithm of wCNN
The difference between wCNN and CNN is the activation function of convolution layer.
The training algorithm of wCNN also has three steps: (1) forward propagation of wCNN; (2) back propagation of wCNN; (3) weight and bias adjustment of wCNN.
4.2.1 Forward propagation of wCNN
The forward propagation of wCNN is same as CNN. The input of wCNN is the feature of training samples. The output of wCNN is calculated from the first layer of wCNN (input of wCNN) through convolutional layer, pooling layer and fully connected layer.
-
(1)
Forward propagation of convolution layer
If the layer l is the convolutional layer, the input of this layer (\( {net}_{mn}^l \)) is calculated by Eq. (29):
The output of this layer (\( {O}_j^l(t) \)) is calculated by Eq. (30). acland bcl is the parameters of the scale transformation in activation function:
In the convolutional layer of wCNN, the activation function Ψwc(x) is expressed as Eq. (31):
(2)Forward propagation of Pooling layer and Fully connected layer.
Forward propagation in the pooling layer and the fully-connected layer of wCNN is the same as CNN, which are shown in Eq. (4) and Eq. (7).
4.2.2 Back propagation of wCNN
The predicted values \( {\hat{y}}_n \) of the training samples can be calculated by forward propagation of wCNN, then the MSE (mean square error) of all the training samples can be calculated by loss function as Eq. (10).
Back propagation of error is necessary for weights and bias adjustment, which is calculated in fully connected layer, pooling layer and convolutional layer. The back propagation in fully connected layer and convolutional layer are same as CNN, while back propagation in pooling layer is different:
If the layer l is pooling layer, the layer (l + 1) is a convolutional layer, and the error of the input of the pooling layer is expressed as Eq. (32):
Gradient descent method is applied to calculate the changed values of weights and bias such as wl,acl,bcl, which can be expressed as Eq. (34) to Eq. (36):
The adjusted results such as \( {w}_{ij}^{-1} \) and \( {b}_j^{-1} \) can be expressed as Eq. (37) to Eq. (41), where η _ CPNN is the learning rate of wCNN:
4.3 Pseudocode of wCNN
The training process of wCNNwCNN is similar to CNN, while the activation function of wCNN is different from CNN. The pseudocode of wCNN is listed in Algorithm 2.
5 Model of wavelet convolutional wavelet neural network(wCwNN)
5.1 Structure of wCwNN
Based on wCNN, the improvement of wCwNN is that: the fully connected network (FCNN) is replaced by a wavelet Neural Network(wNN). The structure of wCwNN has two parts: (1) wavelet Convolutional Pooling Network(wCPNN), and (2) wavelet Neural Network (wNN). In the convolutional layer of wCPNN and the hidden layer of the wNN, all the activation functions are wavelet scale transformation functions. The structure of wCwNN is drawn as Fig. 4.
5.2 Algorithm of wCwNN
The first part of wCwNN is wCPNN, which is same as wCNN. The second part of wCwNN is wNN, which is different from the second part of wCNN (FCNN). In this section, the forward propagation and back propagation of wNN are described in detail.
Definitions for wNN are listed as follows.
-
(1)
The number of the last layer (output layer) of wNN is expressed as l = − 1. size−1 is the number of neurons in this output layer.
-
(2)
The number of the second layer (the second to last layer, hidden layer) of wNN is expressed as l = − 2. size−2 is the number of neurons in this hidden layer.
-
(3)
The number of the first layer (the third to last layer, input layer) of wNN is expressed as l = − 3. size−3 is the number of neurons in this input layer.
-
(4)
The number of the last layer of wCPNN, which is the previous layer of the input layer of wNN, is defined as l = − 4. size−4 × size−4 is the number of neurons in this output layer of wCPNN.
5.2.1 Forward propagation of wCwNN
The input of the input layer in wNN (\( {O}_i^{-3} \)) comes from the output of the last layer in wCPNN (\( {O}_{mn}^{-4} \)), both two layers have the same numbers of neurons: size−3 = size−4 × size−4.
The dimension of matrix \( {O}_{mn}^{-4} \) is m × n, and the dimension of matrix of \( {O}_i^{-3} \) is i × 1. The correspondence between \( {O}_{mn}^{-4} \) and \( {O}_i^{-3} \) can be expressed as Eq. 42:
In the hidden layer of wNN, the input matrix is \( {net}_j^{-2} \), and the output matrix is \( {O}_j^{-2} \). \( {O}_j^{-2} \) can be calculated by Eq. (43) to Eq. (44):
In the output layer of wNN, the input matrix and the output matrix are \( {net}_j^{-1} \) and \( {O}_j^{-1} \) respectively, and the predicted result of wCwNN is \( {\hat{y}}_n \). \( {O}_j^{-1} \) and \( {\hat{y}}_n \) can be calculated by Eq. (45) to Eq. (47):
5.2.2 Back propagation of wCwNN
In wNN, the back propagation of input errors (\( {\delta}_i^{-3} \) in the output layer, \( {\delta}_j^{-2} \) in the hidden layer and\( {\delta}_k^{-1} \) in the input layer) can be calculated as Eq. (48) to Eq. (50).
Gradient descent method is applied to adjust the weights (\( {w}_{ij}^{-2} \) and \( {w}_{kj}^{-1} \)) and bias (ac−2, bc−2 and \( {b}_k^{-1} \)) of wCwNN. The changed values of the above weights and bias can be calculated by Eq. (51) to Eq. (55).
The adjustive result of the above weights and bias are expressed as Eq. (56) to Eq. (60), where α _ wNN, η _ wNN, and η _ wCPNN are the inertia coefficient of wNN, the learning rate of wNN, and the learning rate of wCPNN, respectively.
5.2.3 Pseudocode of wCwNN
Training algorithm of wCwNN is similar to wCNN, while the difference is that FCNN in wCNN is replaced by wNN in wCwNN. The pseudo code of wCwNN is listed in Algorithm 3.
6 Experiment
The objectives of the experiment are as follows: (1) verify the viability of each algorithm (convergence), (2) improve accuracy of three algorithms(reduce the minimum mean square error), (3) improve the accuracy rate (reduce the error rate), (4) analyze the efficiency of the algorithms (5) find an algorithm with greater classification capacity.
The contents of experiment are as follows: (1) record error of three algorithms in every AC, then plot time-error curve (2) calculate the mean square error of all test samples after training (precision); (3) calculate error rate (accuracy rate) of the test sample after training; (4) record the time of training process (5) analyze the results of each algorithm; (6) analyze the results of each experiment.
6.1 Datageset introduction
Dataset of MNIST and CIFAR-10 are adopted for the comparative experiments. MNIST is well known from the National Institute of Standards and Technology. The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples.. Training set and test set are shown in Fig. 5 below:
CIFAR-10 is an open dataset, which has 60,000 images. The resolution of the images is 32*32. The images are divided into 10 categories, each category contains 6000 images. There are 50,000 images for training, 10,000 images for testing. The data set is shown in Fig. 6 below. In this experiment, two kinds images are selected.
6.2 Experiment design of CNN
For the following comparative experiments, the structure of CNN is designed as follows: the first layer of CNN is an input layer; the 2nd to 5th layers are two pairs of convolutional-pooling layers; the 6th to 7th layers are fully connected (FCNN).
Firstly, parameters of the structure of CNN are set as follows:
-
(1)
Input layer of CNN: The size is 28 × 28.
-
(2)
The first convolutional layer: the input size is 28 × 28. The size of convolutional kernel is set to 5 × 5. The output size is 24 × 24. The feature number of the output is set to 6. The activation function is Sigmoid.
-
(3)
The first pooling layer: the input size is set to 24 × 24. The size of the pooling window is 2 × 2. The output size is 12 × 12. The number of output features is 6.
-
(4)
The second convolutional layer: The input size is 12 × 12. The convolutional kernel size is 5 × 5. The output size is 8 × 8. The activation function is Sigmoid. The number of output feature is 12.
-
(5)
The second pooling layer: the input size is 8 × 8. The size of pool window is 2 × 2, the output size is 4 × 4, the number of output feature is 12.
-
(6)
Input layer of FCNN: The size is 192, which is equal to 4 × 4 × 12.
-
(7)
Hidden layer of FCNN: The size is 10, the activation function is Sigmoid.
-
(8)
Output layer of FCNN: The size is 10, which can represent 10 different classes. For example: if the predicted result is class 1, the output is: [1,0,0,0,0,0,0,0,0,0], and if the predicted result is class 2, the output is: [0,1,0,0,0,0,0,0,0,0].
Secondly, parameters for simulations are configured as follows:
-
(1)
Learning rate: η = 0.01 or η = 0.1.
-
(2)
Total number of training processes (SP): max _ SPs = 10.
-
(3)
Total number of adjustment cycles (AC): max _ ACs = 6000.
-
(4)
Target error: target _ err = 0.0000001. In each SP, current error is calculated by current _ err = L(). L() is loss function, where adopts MSE in this study. When current _ err is smaller than target _ err, SP will be stop though the current AC is smaller than max _ ACs.
In each SP, the current error is continuously reduced. Therefore, in order to get smaller error, target _ err is set to a very small value which cannot be reached within max _ ACs in each SP.
-
(5)
Total number of training samples taken in each AC: BatchSize = 10.
6.3 Experiments design of wCNN and wCwNN
Configuration of parameters of network structures and simulations of wCNN and wCwNN are listed in Table 2. There are two groups of experiments with different learning rate η = 0.1 and η = 0.01 respectively.
Comparative results will be recorded as follows: (1) feasibility (whether the algorithm is convergent), (2) minimum MSE (precision), (3) correct rate (accuracy) of all the test data, (4) running time (efficiency). The structure and experimental parameters configuration of wCNN, wCwNN are shown in Table 1.
Table 2 shows that the most significant difference is that: (1) The activation function of CPNN in wCNN and wCwNN is wavelet function, while the activation function of CNN is sigmoid function. (2) The second part of neural network (after CPNN) of wCwNN is wavlet neural network (wNN), while the second part of neural network of CNN and wCNN is FCNN.
7 Results
7.1 Results of the experiment of CNN
The learning rate is set as η = 0.01 and η = 0.1 respectively. Results are recorded as follows: (1) current error (MSE) of each AC in each SP; (2) the error rate of all test samples after each SP; (3) time spent in each SP. The above results are recorded in Table 2:
In the experiment of CNN, each SP has 6000 ACs. All the errors of 6000 ACs are recorded and drawn in Fig. 7. All the points of error values are drawn into the orange line and fitted to a blue line by linear regression. The blue line indicates the downward trend of the orange line. Fig. 7a shows the result of 1SP when η = 0.01. Fig. 7b shows the result of 1SP when η = 0.1. Fig. 7c shows the result of 10SPs when η = 0.01. Fig. 7d shows the result of 10SPs when η = 0.1.
7.2 Results of the experiments of wCNN
In experiment of the wCNN, learning rate is set as η = 0.01 and η = 0.1 respectively. Each SP contains 6000ACs. All the MSE in each ACs are recorded, which is shown in Fig. 8:
Statistic results of the above simulations are listed in Table 3, which includes: (1) the final MSE of each SP, (2) the error rate of each SP, (3) Consumed time of each SP.
7.3 Results of the experiments of wCwNN
In the experiment of the wCwNN, learning rate is set as η = 0.01 and η = 0.1 respectively. Each SP contains 6000ACs. All the MSE in each ACs are recorded, which is shown in Fig. 9.
Statistic results of the above simulations are listed in Table 4, which includes: (1) the final MSE of each SP, (2) the error rate of each SP, (3) consumed time of each SP.
8 Discussion
8.1 Discussion of results of CNN
According to the experimental results of the CNN presented in Table 2 and Fig. 7, the main findings are as follows:
-
(1)
Classification of MNIST can be completed by CNN, the correct rate is 90.03% (the error rate is 9.97%)
-
(2)
CNN has a good ability for image classification, and the maximum correct rate can reach 90.8% (the error rate is at least 9.2%).
-
(3)
When the learning rate is increased (from η = 0.01 to η = 0.1), the MSE is significantly decreased (from 0.289 to 0.094), and the error rate is significantly decreased (from 28.07% to 9.97%). Time spent in simulation (η = 0.1) is lightly increased (from 83.39 to 91.04).
-
(4)
The descending process of MSE is stable, and the results among 10SPs are close.
8.2 Discussion of the wCNN results
According to the experimental results of the wCNN presented in Table 3 and Fig. 8, the following findings can be drawn:
-
(1)
wCNN algorithm is convergent. Classification of the MNIST can be completed by wCNN. The average correct rate is 92.78% (the average error rate of 7.22%).
-
(2)
wCNN has a good (better than CNN) ability for image classification, and the maximum correct rate can reach 95.66% (the error rate is at least 4.34%).
-
(3)
When the learning rate of wCNN is increased (from η = 0.01 to η = 0.1), the MSE is significantly decreased (from 0.184 to 0.088), and the error rate is significantly decreased (from 13.91% to 7.22%). Time spent in simulation (η = 0.1) is lightly increased (from 258.53 to 268.56).
-
(4)
MSE reduction processes have differences, variance of final error of wCNN is greater than CNN. wCNN achieves a very low error rate (4.34%), but there are some high error rates (such as 18.71% and 18.35%) of wCNN experiments. With the research of Liu in 2015 [15], the wavelet network’s ability to make the training MSE smaller is a result of sacrificing network stability, because it has the advantage of jumping out of the local minimum, which is not available in classical BP network and RBF network. However, this advantage also brings the disadvantage that the error decline during the learning process is more oscillating.
8.3 Discussion of experimental results of wCwNN
According to the experimental results of wCwNN presented in Fig. 9 and Table 4, the following conclusions can be drawn:
-
(1)
wCwNN is convergent. It can complete the task of classification for MNIST dataset, and the average accuracy is 96.57% (the average error rate is 3.43%).
-
(2)
wCwNN has a good ability (better than CNN and wCNN) of image classification, and the maximum correct rate can reach 97.04% (the minimal error rate is 2.96%).
-
(3)
When the learning rate of wCwNN is increased (from η = 0.01 to η = 0.1), the MSE is decreased (from 0.068 to 0.054), and the error rate is significantly decreased (from 5.17% to 3.43%). Time spent in simulation (η = 0.1) is lightly increased (from 357.23 to 368.81).
-
(4)
The descending process of MSE is stable, and the differences among 10SPs are not significant.
9 Conclusion
Firstly, MNIST dataset is adopted to verify the proposed methods in this study, and CNN is implemented to finish the task of classification. The correct rate of CNN is more than 90%. Secondly, wCNN is proposed, the activation function of the convolutional network in CNN is replaced by the wavelet function. Thirdly, the FCNN of CNN and wCNN is replaced by wNN, wCwNN is proposed. With the same hyperparameters, the comparative results of experiments among CNN, wCNN and wCwNN are shown in Tables 5 and 6.
The MSE of CNN, wCNN and wCwNN in each ACs are drawn in Fig. 10.
We took the same experiments on both the MNIST dataset and the CIFAR10 dataset. Results are shown in the Experiment section. The comparison between MNIST dataset and the CIFAR-10 dataset is as follows:
In the experiment of CIFAR-10 dataset, wCNN’s performance(the Mean MSE is 0.144948, the mean error rate is 0.20095) is better than CNN(the Mean MSE is 0.29845, and the mean error rate is 0.205107), while the wCwNN’s performance(the Mean MSE is 0.134675, and the mean error rate is 0.18145) is better than wCNN.
According to the comparative experimental results of CNN, wCNN and wCwNN shown in Tables 5, 6 and 7and Fig. 10. The following findings can be drawn:
-
(1)
CNN, wCNN and wCwNN are convergent, the task of classification for MNIST dataset can be completed by all above methods.
-
(2)
When the learning rate of CNN, wCNN and wCwNN are increased (from η = 0.01 to η = 0.1), the MSE of each algorithm is decreased significantly, and the error rate of each experiment is decreased significantly, while the consumed time is increased slightly.
-
(3)
wCwNN has the highest precision (the minimum MSE is 0.045), and the precision of wCNN (the minimum MSE is 0.088) is less than the precision of CNN (the minimum MSE is 0.094).
-
(4)
The wCwNN has the highest accuracy (the minimum error rate is 3.43%, when the η = 0.1), the error rate of wCNN (the minimum error rate is 7.22%, when the η = 0.1), both of them are less than the error rate of CNN (9.97%).
-
(5)
The variance of MSE of trained wCwNN is the smallest (the variance is 0.000014). The variance of MSE of trained wCNN is the largest (the variance is 0.000554). The variance of MSE of trained CNN is 0.000069.
-
(6)
The wCwNN (average time of SP is 131.67) consumes more time than wCNN (average time of SP is 83.57), and wCNN consumes much more time than CNN (average time of SP is 10.89).
In summary
(1) the proposed wCwNN is successfully improved based on wCNN, and the proposed wCNN is successfully improved based on CNN. (2) Both wCwNN and wCNN have higher precession (MSE is smaller) than CNN, and the precision of wCwNN is the highest. (3) Both wCwNN and wCNN have higher accuracy (error rate is smaller) than CNN, and the accuracy of wCwNN is the highest. Both improvements of wCNN and wCwNN lead to more time comsuming in each SPs.
In the future, the research we continue to do is as follows
(1)We should find a mechanism so that wavelet function will not overextend the minimum value while maintaining the ability to jump out of the local minimum value. (2) We would improve the learning ability of CNN, wCNN and wCwNN by expanding the network structure like depth of network. Furthermore, the proposed wCNN and wCwNN can be used as a neuron to build a deeper neural network. (3) We would conduct more experiments to verify the performances of the improved methods. .
References
Bateux Q, Marchand E, Leitner J, Chaumette F, Corke P (2017) Visual servoing from deep neural networks. arXiv preprint arXiv:1705.08940
Cao L, Hong Y, Fang H, He G (1995) Predicting chaotic time series with wavelet networks. Physica D 85(1–2):225–238
Chang J, Sitzmann V, Dun X, Heidrich W, Wetzstein G (2018) Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification. Sci Rep 8(1):1–10
Chen H, He X, Qing L, Xiong S, Nguyen TQ (2018) Dpw-sdnet: Dual pixel-wavelet domain deep cnns for soft decoding of jpeg-compressed images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 711–720
De Silva D, Vithanage H, Fernando K, Piyatilake I (2020) Multi-path learnable wavelet neural network for image classification. In: Twelfth International Conference on Machine Vision (ICMV 2019), vol. 11433, p. 114331O. International Society for Optics and Photonics Res improv wavelet convolutional wavelet neural netw 35
Fujieda S, Takayama K, Hachisuka T (2017) Wavelet convolutional neural networks for texture classification. arXiv preprint arXiv:1707.07394
Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4):193–202
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Jiang Y, Chen L, Zhang H, Xiao X (2019) Breast cancer histopathological image classification using convolutional neural networks with small se-resnet module. PLoS One 14(3):e0214587
Khan A, Sohail A, Zahoora U, Qureshi AS (2019) A survey of the recent architectures of deep convolutional neural networks. arXiv preprint arXiv:1901.06032
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kiskin I, Orozco BP, Windebank T, Zilli D, Sinka M, Willis K, Roberts S (2017) Mosquito detection with neural networks: the buzz of deep learning. arXiv preprint arXiv:1705.05180
LeCun Y, Bengio Y, Hinton G (1988) Deep learning. Nature 521(7553), 436–444 (2015) 15. LeCun, Y., Touresky, D., Hinton, G., Sejnowski, T.: a theoretical framework for backpropagation. In: proceedings of the 1988 connectionist models summer school, vol. 1, pp. 21–28. CMU, Pittsburgh, Pa: Morgan Kaufmann
Liu J (2014) research of adaptive wavelet neural network (awnn) and ann based control system intelligent applications
Liu P, Zhang H, Lian W, Zuo W (2019) Multi-level wavelet convolutional neural networks. IEEE Access 7:74973–74985
Liu Y, Li Q, Sun Z (2019) Attribute-aware face aging with wavelet-based generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11877–11886
Mallat S (2008) A wavelet tour of signal processing: The sparse way (academic, burlington ma)
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5(4):115–133
Pati YC, Krishnaprasad PS (1993) Analysis and synthesis of feedforward neural networks using discrete affine wavelet transformations. IEEE Trans Neural Netw 4(1):73–85
R, G.: Fast r-cnn. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit pp. 1440–1448 (2015)
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by backpropagating errors. nature 323(6088):533–536
Sakkari M, Zaied M (2015) An architecture of distributed beta wavelet networks for large image classification in mapreduce. In: 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA), pp. 523–527. IEEE
Savareh BA, Emami H, Hajiabadi M, Azimi SM, Ghafoori M (2019) Wavelet-enhanced convolutional neural network: a new idea in a deep learning paradigm. Biomed Engin Biomedizinische Technik 64(2):195–205
Sifuzzaman M, Islam M, Ali M (2009) Application of wavelet transform and its advantages compared to fourier transform
Song Y, Hu QV, He L (2019) P-cnn: enhancing text matching with positional convolutional neural network. Knowl-Based Syst 169:67–79
Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37(3):328–339
Wang F, Yu Y, Zhang Z, Li J, Zhen Z, Li K (2018) Wavelet decomposition and convolutional lstm networks based improved deep learning model for solar irradiance forecasting. Appl Sci 8(8):1286 36 Jing-Wei LIU1,2 et al.
Wang JZ (2001) Wavelets and imaging informatics: a review of the literature. J Biomed Inform 34(2):129–141
Wang X Ma S YB (2014) Effects of visual perception training on hippocampal neural cell plasticity and changes in learning and memory functions. J Zhejiang Univ (Med Sci) pp. 601–604
Xiao H, H.T.D.Z.: A damage detection method for grout sleeve splicingat assembly column based on the sym wavelet and bp neural network. Structural Engineers (2018)
Zhang Q (1997) Using wavelet network in nonparametric estimation. IEEE Trans Neural Netw 8(2):227–236
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Liu, JW., Zuo, FL., Guo, YX. et al. Research on improved wavelet convolutional wavelet neural networks. Appl Intell 51, 4106–4126 (2021). https://doi.org/10.1007/s10489-020-02015-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02015-5