Remain useful life forecasting for roller bearings using sparse auto-encoder

A method based on sparse auto-encoder (SAE) in deep learning (DL) for roller bearings remain useful life (RUL) prediction is presented in this paper. Firstly, the roller bearings vibration signals were calculated by different time and frequency domain factors, in which reflect the vibration signals information well. Therefore, the time and frequency domain features were regarded as the input of SAE, then the SAE model in deep learning was used to extract the features through several hidden layers and the sigmoid function was selected as the output function for calculate the prediction value. Finally, compared with other different prediction methods, such as support vector machine (SVM), back propagation (BP) neural network and random forest (RF), the performance of SAE is better than that those models by using mean absolute error (MAE) and root mean square error (RMSE) these two indicators.


Introduction
Rolling bearing is one of the key components of rotating machinery, and its performance directly affects the normal operation of the whole machine (Huang et al. 2012;Resta et al. 2011;Bagordo et al. 2011;Wang and Tong 2014).Effective maintenance strategy can not only reduce the number of downtime and maintenance costs, but also ensure the normal operation of the whole equipment (Guo and Tse 2013;Cong et al. 2013).If the correct maintenance decision is made according to the real-time status of the equipment, the basis of accurate remain useful life (RUL) prediction is indispensable.There are two key problems in the RUL; (1) Select the characteristic parameters that can accurately reflect the performance degradation process as input parameters of the model.(2) Establish a suitable life prediction model.
For characteristic parameters selection, using vibration signals for roller bearings vibration signals feature extraction is one of the common ways in recent years.How to analyze the roller bearings vibration signals and extract its characteristics effectively is very important practical significance, because the vibration signals can reflect the state of the characteristics of the roller bearings and the quality of the feature extraction, in which determines the performance of the RUL prediction.
Time and frequency domain factors, in which extracted from vibration signals, can reflect the degradation trend, it is the one of the most common methods for characteristic parameters selection.Shen et al. (2013) proposed a bearing residual life prediction method based on correlation features and multivariate support vector machines (SVM).This paper utilized the time and frequency domain factor, relative root mean square (RMS) value, which is not affected by individual bearing differences, to directly extract degraded WI, and then used SVM for RUL Lei et al. (2016) also used RMS to construct WI to evaluate the wear status of bearings.Then, the Kalman particle filter algorithm was used for RUL.In order to solve the problem of difficult detection due to the low frequency and energy of vibration signals in low-speed slewing bearings, Kosasih et al. (2014) paper used a low-pass filter to extract various time-frequency indicators from the output signal, such as RMS, skewness, and kurtosis, to extract degradation indicators and perform RUL.Liu et al. (2017) considered the time domain and frequency domain features which extracted from the collected data instances as the appropriate degradation indicator for monitor roller bearings condition.In reference (Tse and Wang 2017), a method based on time and frequency domain factors, in which can reflect the roller bearings condition well, were utilized to pump RUL prediction.In this paper, the time and frequency domain factors was used to extract the vibration signal features after lowpass filter, then the dimension reduction method, such as PCA, was used to extract the RUL features for prediction.Tran et al. (2017) applied the wavelet transforms (WT) to remove the vibration noise, and then the time and frequency domain statistical features were used to measure the characteristics of the valve conditions using vibration signals.Finally, the DBN in deep learning (DL) utilized to classify the air compressor faults.Time and frequency domain features extraction methods were employed to extract the original features from the rolling bearings vibration signals and reflect its degradation trends.Next, principal components analysis (PCA) was run and reduce the dimension (Wang et al. 2016).As aforementioned references, those papers adopt the time and frequency domain factors to extract the feature from vibration signals.Hence we use the time and frequency domain factors to extract the roller bearings vibration signals features to monitor its working condition in this paper.
For RUL prediction models, many traditional methods, such as support vector machine (SVM), back propagation (BP) neural network and random forest (RF) have been successfully applied in prediction domain.Li et al. (2015) utilized the BP to forecast the power supply system RUL.Wang and Sun (2015) employed the RF model for parallel load forecasting of electric power user side.Li et al. (2017) presented an effective way to estimate RUL online by using the SVM algorithm.Patil et al. (2015) proposed a method based on the critical features using SVM for Liion batteries RUL prediction under different operating conditions.
Different from aforementioned traditional methods, DL is a unsupervised learning method, which has been widely used in automatically extract data feature information from a large number with non-tagged data for classification in recent years.It has been successfully applied in the field of images and speech recognition (Hintonand and Sala Hutdinov 2006;Affonso et al. 2017;Sun et al. 2017) with its powerful automatic feature extraction capabilities and made brilliant achievements.Feng et al. (2016) presented a method based on sparse auto-encoder (SAE) in DL for rollers bearings fault diagnosis.In this paper, the vibration signals was decomposed by FFT and then half of vectors were regarded as the input of SAE model for feature extraction.Finally, the classification model named softmax in output layer was applied to roller bearings fault diagnosis, this paper demonstrated that the fault identification accuracy is higher than BP.SAE is a multi-layers network with several hidden layers to extract the vibration signals features, then the per-training, fine-tuning network parameters and eigenvectors through the final hidden layer are regarded as the input of output layer to distinguish the different fault categories.Compared with traditional methods, such as the mentioned above methods, SAE has a strong feature extraction ability, and can extract features from a massive data automatically.
However, all the above RUL prediction models need to rely on the manual experience of engineers to set health threshold.When the extracted degradation indicators exceed the threshold line, it means that the equipment enters a degradation state.Then calculate the remaining lifespan.Therefore, the quality of setting the threshold line will directly affect the accuracy of life prediction.To solve this problem, this article uses Y = 1-X function as a label value for SAE output training and testing, we used the degradation percentage of the ith bearing time at time t as the output data label.Additionally, we used the 25 different time and frequency domain indicators, such as Absolute mean, Peak, Variance, Peak to peak, Kurtosis factor, Root mean square, Clearance factor, Shape factor, Impulse factor, Root amplitude, Crest factor, Skewness factor, Variation coefficient, as the SAE input for training and testing.The data preprocessing adopts the maximum minimum normalization method to eliminate the differential impact caused by different dimensional ranges of data input features.The output layer used the Soft max as the prediction model.As mentioned above, a method based on SAE for roller bearings RUL prediction is presented in this paper.
The main contributions of this paper are as follows.
(1) Reduce the dependence on expert troubleshooting experience for RUL prediction.
(2) The model proposed in the article provides a theoretical basis and engineering application case reference for deep learning based bearing life prediction in this branch field.
The rest of this paper is organized as follows: Sect. 2 presents the review of SAE, time and frequency domain factors, respectively.Section 3 gives the detailed procedure of roller bearings RUL methodology.Section 4 describes the experimental data sources and two RUL prediction evaluation indicators.Experiments validation is given in Sect. 5 followed by conclusions in Section VI.

SAE model
Auto-encoder (AE) is a kind of symmetrical three layers neural network (Shin et al. 2013).As shown in Fig. 1.It through the hidden layer to encode the input data, the hidden layer is used to reconstruct or decode the input data, the major purpose of AE is used to reduce the reconstruction error minimum betwwen the extracted output and input data.
The idea of sparse encoder (SE) was first proposed by Olshusen (1996), and its main purpose was to simulate the non-supervision calculation of human perceptual learning.To furthermore better express the input data, SAE, as an extension of the auto-encoder, introduced sparse penalty term on the basis of the AE for concise learning related sparse data characteristics.For a given roller bearings dataset X= has N sam- ple points.For the unlabeled input matrix X, SAE hope to learn the features expression h(X, W , b) = δ(WX + b) according to the Eq. ( 1) through the encode and decode to obtain the output data Z close to input X in Fig. 1.
In general, if the output of a neuron node is approximately 1, the neuron is considered to be active and conversely, inactive.One of the tasks of AE is to make these neuron nodes inactive at the most of time.Assuming the a j (x) denotes the jth active unit in hidden layer.In the forward propagation procedure, the activation unit of the hidden layer can be expressed for a given input matrix X as follows.
where W denotes the weight matrix between the input and output layer.b represents the bias item.Hence the average activation probability ρ j of the jth unit in the implicit layer can be calculated by As most of the neurons inactive at the most of time, so the value of ρ j is close to a constant value ρ .To achieve the sparse effect, the AE cost function introduced the kullback-leibler (KL) divergence to make the ρ j not devi- ate ρ (Kullbacks and Leibler 1951).
(1) The Eq. ( 3) is used to punish the encoder's cost function, the punish item PI is defined as where S denotes the number of the neurons in hidden layer.The AE's cost function is defined as where Z is the output, X is the input, n is the number of samples.l is the lth hidden layer, β is the weight punish of PI, is the weight punish of regularization item.In whole encode procedure, J (W , b) is the object optimization ( 6) where η is the learning rate.
Several AE models are stacked into SAE with l hidden layers.For a given vibration signals input vector X, the input size of SAE equals to the dimension of each data sample.The second hidden layer is regarded as the second AE to reconstruct the first layer data.Correspondingly, the number of nodes at current hidden layer in AE is initialized by the output of former layer.Therefore, the process is conducted in the sequence until the lth AE is trained.The extracted features vector output Z is obtained through the final hidden layer.The structure of the SAE is given in Fig. 2. In addition, the final layer use a prediction model, such as SoftMax to finish the RUL prediction.

Time and frequency domain factors
Too many time and frequency domain factors of the vibration signals have ability to monitor and diagnose faults for roller bearings, each aforementioned factor, which is used to reflect the roller bearings health condition, sensitivity, is not identical.In order to select highquality feature index to describe roller bearings working (7)

The procedure of the presented method
In this Section, the procedures of the presented method are given as follows.
Step 1.Time and frequency domain characteristic calculation: The full life roller bearings datasets were calculated using time and frequency feature factors.In addition, the roller bearings vibration signals are decompose using fast fourier transformation (FFT) before the frequency domain factors calculation.
Step 2. Data preprocessing: The characteristic parameters and the real RUL values were regarded as the input and output of SAE model.Before SAE training, the input and output eigenvectors were normalized into [0, 1].
Step 3. RUL prediction: The SAE model was used to forecast the RUL value.Compare the actual and predicted values with different models including SVM, BP, RF and SAE.The flow chart of the presented method is shown in Fig. 3.

Table 1 Time domain feature factors
Absolute mean x 2 i Kurtosis factor Impulse factor (1) The rotating part: The rotating part mainly includes motor, accelerator, gearbox and seven roller bearings.The power of synchronous motor can reach 1.1 Kw, and the operating speed range is 0-6000 rpm.The rotation of the motor was transmitted to the bearing through the shaft and coupling.The rolling bearing model in the experiment is NSK6307DU.
(2) Loading part: By increased the radial load of the roll bearing until it reached the maximum load, it could accelerated the decline of the bearing and shorten its life cycle greatly.
(3) Measurement section: Through installed on the roller bearings of the speed sensor and torque sensor can realized real-time monitoring and obtained the roller bearings RUL data.
The platform includes seven roller bearings, the first two roller bearings (Bear1 and Bear2) are used in this paper.The detailed description of experimental data, in which is used this paper, is given in Table 2.

Two performance indicators
After roller bearings RUL prediction, two prediction indicators, such as mean absolute error (MAE) and root mean square error (RMSE), were used to compare the performance of SVM, BP, RF, and SAE.The calculation are given as follows where V i and V denote the actual RUL value and pre- dicted value, respectively.Here n is the number of samples.
The smaller the two indicators value, the better performance the prediction model. (9) 5 Experiment results and comparison

Time and frequency features extraction
In this Section, the time and frequency features are able to represent the roller bearings RUL information.The time domain waveforms for bear1 is shown in Fig. 5.As shown in Fig. 5, the X aixs is the total length of the sampling points, the length of each sample is 2560, the number total samples is 2803, hence the total length of all samples is 7,175,680.Nectoux et al. (2012) suggested the roller bearings is considered failure when its amplitude exceeds 20 g.The first failure sample is 2765 when amplitude exceeds 20 g (7.08e + 006/2560 = 2765) in Fig. 5. Therefore, the first to 2765th samples are used to RUL prediction for bear1.It can be seen from Fig. 5 that the value of vibration signals amplitude are increased with the number of sampling point, because the speed and the radial load of bear1 were always increased when bear1 was failed.The length of sampling point in Fig. 5 were calculated by time and frequency domain factors according to the Table 1.In this paper, we only give the time domain characteristic for overall samples.
The results of time domain factors except the variation coefficient in Table 1 are shown in Fig. 6.As shown in Fig. 6, the overall time domain factors values in Table 1 are increased with the time step from left to right, such as variance, root mean square.
However, several time domain factors including shape factor, crest factor, impulse factor, clearance factor, kurtosis factor, skewness factor, are increase instantly from 700 to 800th sample, as each sample length is 2560, so the corresponding sampling point range of 700-800th sample is [179200,204800], it can be seen from Fig. 5 that roller bearings amplitude increased instantly close to 200,000.

Parameter determination for various prediction models
Before RUL prediction, several parameters should be preset in different models.The specific parameter settings are given as follows.
RF: RF consisted by many decision trees (DTs).At each node in decision tree (DT), mtry variables are randomly selected out of the M input variables (mtry < < M) and the best split on these mtry is used to split the node.In this paper, mtry selection is according to the M input variables.After extracting the vibration signals time and frequency features, so the number of input variables M = 25 and the parameter mtry is often meet the condition mtry ≤ √ M (Bremian 2001), the mtry = 5 and the num- ber of the DT is fixed as 500 in this paper.

BP:
The input layer nodes is 25.In hidden layer and the output layer, the number of nodes are set as 30 and 1, respectively.
SVM: The kernel function is selected as radial basis function (RBF) in SVM model.Two parameters, such as penalty parameter C and the kernel function parameter g, are set as 3.21458 and 1.2549, respectively.SAE: As the size of time and frequency eigenvectors is 25, then the number of input layer nodes is 25, and number of hidden layer and output layer nodes are [200 100 80 60 40 20 10] and 1, respectively.Hence the hidden layer includes 8 layers for extract the vibration signals feature information, the iteration number for each hidden layer and output layer is 3000, learning rate is set as 0.8.To eliminate the influence of different data range on experiment, we normalized the overall datasets into [0, 1].Therefore, the active function is sigmoid function, because the input and output.In addition, the sparse coefficient in Eq. ( 6) is often less than 0.05.We choose 0.05 for sparse coefficient in this study.In output layer, we use the sigmoid function to calculate the output value.

RUL prediction
After the time and frequency features calculation, the SVM, BP, RF and SAE models were applied to RUL forecast.To eliminate the negative effects of parameters under different range, the time and frequency features were normalized into [0, 1].In this paper, the RUL of a roller bearings is defined as the available service time from the current time to the moment failed.For a roller bearings at sampling point t1, it failed at sampling point t2, hence the RUL of this roller bearings is t2-t1.We select these RUL value for each sample as the output vector in various prediction models.80 and 90 percentage samples were randomly selected as training sample for bear1, and the corresponding rest 20 and 10 percentage samples were selected as testing samples, hence the corresponding total number of the training and test samples were about 2210, 2490, 550 and 270, respectively.The results of RUL predicted values and the actual value using different models are given in Fig. 7.The results of MAE and RMSE are given in Table 3.
As shown in Fig. 7g, h, the prediction value is close to the actual value well using SAE model, Fig. 7a-f show that data volatility between the prediction value and the actual value is more obvious.Compared with SVM, BP and RF models, the stability of data is better using SAE.
It can be seen from Fig. 7 that the effect of RUL prediction is not good when the data point at the range of [1,200] using SVM, BP and RF models.However, Fig. 7g, hshow that the prediction value is close to actual value when the data point number from 1 to 200.
The smallest MAE and RMSE value are 0.0359 and 0.0762 with dataset B using SAE.The highest values are 0.1039 and 0.1635 using other models.The overall MAE and RMSE values are smaller than SVM, BP and RF models.Those results demonstrate that the effect of RUL prediction by SAE is better than SVM, BP and RF models.In order to further verify the effect of RUL prediction is better than SVM, BP and RF models.We run the SAE, SVM, BP and RF models 10 times.The relevant results of the MAE and RMSE are shown in Fig. 8.
As shown in Fig. 8a.MAE values in SAE is smaller than SVM, BP and RF models, the MAE values are close to 0.04.But in BP, SVM and RF, there are more than 0.04.The RMSE values in SAE same as MAE.In addition, the stability of MAE and RMSE is better than that other models.Those results demonstrate that the effect of the SAE is better than SVM, BP and RF models.
To further verify the prediction effect of SAE is better than SVM, BP and RF models.We use the SAE, SVM, BP and RF models to forecast the bear2 RUL.The results of   RUL predicted values and the actual value using different models are given in Fig. 9.The results of MAE and RMSE are given in Table 4.As shown in Fig. 9 and Table 4.
Those results demonstrate that the effect of RUL prediction by SAE is better than SVM, BP and RF models.
The range of the learning rate is generally set at [0, 1].If the learning rate is set at too large, the convergence speed is fast, but it is easy to fall into the local optimal.On the contrary, if the learning rate is set to too small, the SAE and ESAE models exhibit slow convergence (Xu et al. 2018).The learning rate η is set as [0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.0008, 0.001, 0.002, 0.003, 0.004, 0.005, 0.008, 0.01, 0.02, 0.03, 0.04, 0.05, 0.08, 0.1, 0.2, 0.3, 0.4, 0.5] to discuss the performance impact.The result of MAE and RMSE under different learning rates as shown in Table 5.The results of RUL predicted values and the actual value under different learning rates for bear1 using SAE (10 percentage samples are selected as testing samples) is shown in Fig. 10.In Table 5, the maximum value of RMSE is 0.1246 when learning rate = 0.001, the minimum value of RMSE is 0.0777 when learning rate = 0.5.The RMSE value of the SAE model is smaller than that of the SVM, BP, and RF models (In Table 3).The values of SAE model MAE are close to those of SVM, BP, and RF models.These results show that the performance of SAE is better than f SVM, BP, and RF models.

Fig. 10
The results of RUL predicted values and the actual value under different learning rates for bear1 using SAE. 10 percentage samples are selected as testing samples under different learning rates

Conclusions
RUL prediction method on basis of SAE is presented in this paper.As the time and frequency domain factors can reflect the roller bearings degradation situation well, we use the time and frequency domain factors to extract the roller bearings vibration signals features.Then these features were regarded as the input of SAE, SVM, BP and RF models for RUL prediction.In SAE, the input size depends on the dimension of time and frequency domain factors, the experiment results show that the SAE model can extract the features well through several hidden layers and forecast the RUL well.This article uses Y = 1-X function as a label value for SAE output training and testing.
However, in actual engineering data collection, due to the continuous variation of the original signal vibration, the actual lifespan label sometimes does not show a decreasing trend of Y = 1-X.To solve this problem, in the future, an unsupervised deep learning model will be used for degradation feature extraction, which uses a deep learning model without an output layer to extract degradation features before RUL work.The unsupervised deep learning model will directly use the vibration amplitude of the original signal to extract degradation, further reducing the impact on artificial experience and data labels.

Fig. 1
Fig.1The network structure of AE

Fig. 3 Fig. 4
Fig.3The flow chart of this presented method

Fig. 8
Fig.8The results of MAE and RMSE using SVM, BP, RF and SAE.Here 10 percentage samples are selected as testing samples a MAE b RMSE

Fig. 9
Fig. 9 The results of RUL predicted values and the actual value for bear2 using SVM, BP, RF and SAE.B 10 percentage samples are selected as testing samples by using different models.(a) SVM;(b) BP; (c) RF; (d) SAE condition fully, 13 kinds of time domain factors including absolute mean, peak, peak to peak, variance, root amplitude, crest factor, skewness factor, kurtosis factor, shape factor, clearance factor, impulse factor, root mean square, variation coefficient, are tabulated in Table1.Moreover, another 12 statistical factors using the same names and formulae except variation coefficient as aforementioned in Table1, but extracted from the vibration signals in frequency domain, are also used to quantify the roller bearings condition.As a result, a total of 25 statistical features are extracted from each sample.In Table1, Symbol N represents the length of each sample.

Table 3
The results of MAE and RMSE using different datasets and prediction models for bear1

Table 4
The results of MAE and RMSE with different prediction models for bear2(Dataset B)

Table 5
The results of MAE and RMSE with different learning rates For Bear2 (Dataset B)