Remaining Useful Life Estimation of Aircraft Engines Based on Deep Convolution Neural Network and LightGBM Combination Model

Accurately predicting the remaining useful life (RUL) of aero-engines is of great significance for improving the reliability and safety of aero-engine systems. Because of the high dimension and complex features of sensor data in RUL prediction, this paper proposes a model combining deep convolution neural networks (DCNN) and the light gradient boosting machine (LightGBM) algorithm to estimate the RUL. Compared with traditional prognostics and health management (PHM) techniques, signal processing of raw sensor data and prior expertise are not required. The procedure is shown as follows. First, the time window of raw data of the aero-engine is used as the input of DCNN after normalization. The role of DCNN is to extract information from the input data. Second, considering the limitations of the fully connected layer of DCNN, we replace it with a strong classifier-LightGBM to improve the accuracy of prediction. Finally, to prove the effectiveness of the proposed method, we conducted some experiments on the C-MAPSS data set provided by NASA, and obtained good accuracy. By comparing the prediction effect with other commonly used algorithms on the same data set, the proposed algorithm has obvious advantages.


Introduction
The aero-engine is a crucial component that provides thrust for a plane [1]. To ensure the safety of the aircraft, it is important to estimate the RUL of the engine. Prognostics and health management (PHM) is an emerging technology [2], which aims to monitor the reliability and the security of an engineering system for improving the maximum operating availability and reduceing maintenance cost [3,4]. As one of the most challenging technologies in PHM, the RUL estimation of aero-engines has attracted much attention.
RUL prediction methods are mainly divided into three categories: model-based methods, empirical knowledgebased methods and data-driven methods. The model-based methods establish the model through mechanical principles which take a lot of time. Besides, it is difficult to build an accurate model due to the complex system structure and uncertain environments [6]. The empirical knowledgebased methods require industry experts to use extensive prior knowledge to establish the corresponding knowledge base [7]. This method does not require an accurate model, but the prediction accuracy cannot be guaranteed. Datadriven methods build estimate models based on historical run-to-failure data, which avoid the limitations of relying on physical failure models and expert knowledge [8]. Moreover, data-driven approaches have the advantages of low computing cost and high accuracy. This paper mainly focuses on data-driven methods for predicting the RUL.
Many data-driven prediction methods have been proposed and achieved good results in recent years, including support vector machines (SVM) [9], hidden Markov models [10], etc. Traditional data-driven methods analyze and mine sensor data by signal processing technologies, and extract features that reflect system degradation and failure, and implement the RUL prediction of equipment. However, it still remains challenging to develop an effective approach to mine complex data information of time series and achieve high accuracy prediction.
In the past years, deep learning has gradually emerged in the field of PHM. It is more capable to extract deep features of big data composed of multi-sensor performance parameters. Malhi et al. [11] adopted a recurrent neural network (RNN) approach to long-term prognostics of machine health status. As an optimization of the traditional RNN, a long short-term memory (LSTM) method which can make full use of the sensor sequence information and expose hidden patterns within sensor data was proposed for RUL estimation by Zheng et al. [12]. Li et al. [13] predicted the RUL of aeroengines by building a 2-dimensional (2D) DCNN based on time series from the sensor signals. Within the deep learning architecture, DCNN has fewer parameters than other methods because it adopts weight sharing technology and shows excellent feature extraction ability. However, these nonlinear combination features extracted by DCNN are only learned in a simple manner. Therefore, the ability of DCNN to search for the global optimum is limited.
In recent years, decision tree ensemble methods have been widely used by data scientists [14]. Extreme gradient boosting (XGBoost) which is based on gradient boosting decision tree (GBDT) has achieved promising results in many machine learning challenges [15]. As an improvement of XGBoost, the light gradient boosting machine (Light-GBM) is better at processing data with high dimension data [16] and adopts a leaf-wise strategy to improve prognostic accuracy.
To solve the issues in DCNN prediction methods, a model combining DCNN and LightGBM for predicting the RUL of aircraft engines is proposed in this paper. We use the deep features extracted by DCNN as the input of LightGBM to get more accurate prediction results. The effectiveness of this approach is validated on C-MAPSS datasets provided by NASA.
The rest of the paper is organized as follows. In Sect. 2, we briefly review CNN and describe the specific structure of deep learning models. Then the method of model improvement is presented in Sect. 3, along with an introduction of LightGBM. In Sect. 4, we analyze C-MAPSS datasets and demonstrate the superiority of DCNN-LightGBM algorithm by comparisons with other methods. Finally, concluding remarks are provided in Sect. 5.

Convolutional Neural Network
Convolutional neural networks (CNNs) were first proposed by LeCun, which have many outstanding achievements in the fields of image processing and natural language processing [17]. In general, CNNs are structured by three types of hidden layers composed of convolutional layers, pooling layers, and fully connected layers [18].

Convolutional Layer
The convolutional layer is the most important part of convolutional networks. Feature maps are produced by sliding the convolution kernel on data and convolving with the covered data. And the property of shared weights reduces model parameters and the risk of overfitting. The calculation process of the i-th feature map of the l-th convolutional layer x l i x l i x l i , is as follows: where z l i represents the output of convolution operations, * denotes the convolution operator, k l i is the i-th convolution kernel, x l−1 is the input volume, b l i and (⋅) represent the bias term and nonlinear activation function, respectively. C is the number of input channels.

Pooling Layer
The purpose of a pooling layer is to merge similar features into one using nonlinear down-sampling functions and speed up the calculation. The max-pooling layer is the most commonly used pooling layer. The inputs of the pooling layer are the feature map from the previous layers, and the outputs are the maximum of a local patch of the inputs. The function is as follows: where x l i the is i-th feature map of the l-th pooling layer,x l−1 i is the i-th feature map in the previous layer l-1, max(⋅) means the max-pooling, p and s represent the pooling size and the stride size, respectively.

Fully Connected layer
As the last layer of the convolutional neural network, the fully connected layer summarizes the features and outputs the prediction results. The output x l of the l-th fully connected layer is as follows: where x l−1 is the output of the previous layer l − 1 , l and b l represent the weight matrix and the bias vector, respectively. (1)

Proposed Deep Convolutional Neural Network Structure
DCNN has excellent learning ability, which is mainly achieved using multiple nonlinear feature extractions. It can automatically learn hierarchical representations from data. Therefore, the number of pooling layers and convolution layers and the size of the convolution kernel will have a great impact on the prediction results. The aeroengine degradation simulation data used in this paper are numerical data and the dimension of the raw feature is relatively low. Although the pooling operation improves the computing efficiency, some useful information is filtered for this kind of prognostic problems. Table 1 shows the different forecasting effect with or without pooling layers in the model. It can be seen that the network structure without pooling layers has better results. By analyzing the data set in this paper and considering the poor correlation of features from different sensors and the results of literatures 13 and 15, the filter size is 10 × 1 and the filter number is 10. A larger convolution kernel can reduce the impact of noise. The effect of the number of convolution layers on the result error is shown in Fig. 1. It can be seen that the network with 5 convolutional layers has the best performance.
A primary network structure suitable for the RUL prediction is designed based on the above experiments. Figure 2 shows the proposed network structure for the RUL estimation in this study. First, the input data is two-dimensional (2D). One dimension is the time sequences of sensors, denoted as L s , and the other is the number of features, denoted as L f . The raw features are signals collected by the multiple sensors. (The detail of the data preparation will be illustrated in Sect. 4). Next, four convolutional layers with the same structure extract input data features. Zeropadding ensures that the dimensions of the output feature map are consistent with that of the input. The dimension of the input is L s × L f. Then, a convolutional layer with 1 filter combines the feature maps. The filter size is 3 × 1 . The small size of the convolution kernel is conducive to extracting more subtle features. In this way, advanced features hidden in the raw data are extracted. Afterwards, the two-dimensional feature will be flattened and passed to a fully connected. layer which has 100 neurons. In addition, we use dropout technique to relieve overfitting after the flattened layer. Finally, the output layer contains a neuron whose output represents the predicted value of the RUL.  The activation function of each layer is RELU. He_normal initializer is used to initialize the weights of DCNN [19]. DCNN is trained to minimize the loss based on the back propagation algorithm. Adam algorithm is chosen as the optimizer in our experiment. The initial learning rate is set as 0.001, and it will be divided by 10 for every 30 epochs until convergences. Considering the actual situation of the aero-engine datasets, we increase the penalty for late predictions, and then the loss function can be represented as: where y i is the actual value of the i-th test engine, ŷ i is the predicted value of the i-th test engine, and N is the number of the validation set.

Model Improvement
Convolutional neural network has shown the excellent ability of feature extraction. These features that have been abstracted by multiple convolutions are integrated by a fully connected layer. However, the fully connected layer learns the nonlinear combination features in a simple manner, this method will fall into the local optimal value when there is serious noise mixed in the raw data. Therefore, we replace the fully connected layer with a strong classifier namely LightGBM.

The Light Gradient Boosting Machine
The light gradient boosting machine (LightGBM), proposed by Microsoft in 2017, is mainly applied to solve the problem of accuracy and efficiency when using gradient boosting decision tree (GBDT) to process massive amounts of data. Like GBDT, LightGBM also learns a decision tree (DT) through negative gradient fitting residuals in each iteration [20,21]. In this paper, the input of LightGBM is a feature vector created by DCNN. Given the dataset where n is the number of samples, y i is the target RUL value. The Light-GBM architecture is described in Fig. 3.
K additive functions are used to predict the output, which is defined as follows: where K is the number of trees, F is a set of regression tree, f k is one of the trees with the leaf scores. The predicted values of all trees are added to get the RUL estimation. The training loss function is defined as: where l is the training loss function, Ω is the regularization function. The loss function is the square error. To improve the optimization speed and the generalization of model, the second-order Taylor expansion is applied [15]. The loss can be represented as: Then LightGBM uses the following two methods to speed up the forecasting speed without sacrificing accuracy.
(1) Gradient-based one-side sampling (GOSS). Sample points with large gradients will contribute more information gain. GOSS algorithm retains these sample points with large gradients and randomly sample the sample points with small gradients for keeping the accuracy of the information gain evaluation. (2) Exclusive feature bundling (EFB). LightGBM uses the histogram algorithm for the merge of exclusive features. EFB algorithm puts many features of highdimensional data together in a sparse feature space to reduce the number of features. Fig. 3 The structure of LightGBM for the RUL prediction

The DCNN-LightGBM Model
In this paper, we combine a deep convolutional neural network and the LightGBM algorithm for the RUL estimation. The prognostic structure is shown in Fig. 4. The features of aero-engine data are extracted by convolutional layer of DCNN. Then LightGBM further learns the information of the output of the flatten layer to complete the prediction. The detail of the forecasting process is given as follows: (1) Data preprocessing. Exploratory data analysis is used to select sensor signals with significant changes for the RUL prediction. Sliding window technique is utilized to construct time sequence characteristics. The training dataset, the testing dataset, and the RUL labels are generated after data normalization.

Aero-engine Dataset
This paper selects C-MAPSS datasets provided by NASA to verify the effectiveness of the above method. The C-MAPSS are widely used in prognostic studies, which contains four sub-datasets of aero-engines under different operating conditions and failure modes. Each sub-dataset contains training set, testing set and testing RUL values, and it is consist of 21 sensors and 3 operation settings [22]. Each engine unit has varying degrees of wear. Over time, the engine units begin to degrade, until they reach the system failure which is described as an unhealth time cycle. The sensor records in the testing set are terminated before the system failure.
The purpose of the experiment is to predict the RUL of each engine unit in the testing set. The dynamic characteristics of aero-engine operating data under different operating conditions are significantly different, which leads to different network structures for extracting features. The structure of the proposed DCNN in this paper is designed for the prediction of the RUL of aero-engines under a single operating condition. Therefore, this paper utilizes the data sets FD001 and FD003 obtained under a single operating condition of the aero-engines for experimental analysis. FD001 and FD003 are composed of 100 training samples and 100 test samples, respectively.
where x i,j is the i-th measuring point of the j-th sensor.x The normalized results about the 14 signals of the first engine in the FD001 and FD003 datasets are shown in Fig. 5. The operating data will show a significant abnormal trend due to the degradation of the aero-engines. At the initial stage of engine operation, the variation trend of the engine operation is not obvious and cannot provide effective information for the RUL prediction, as shown in the 0-75 time cycles in Fig. 5a. Therefore, the piecewise linear function has been adopted for the RUL target label. Figure 6. shows the piecewise RUL of the first engine unit (11) x i,j norm = in the FD001 and FD003 datasets, respectively. We set the RUL value in the early stage of degradation to a fixed value of 125 as an upper limit, as the literature 23 and 25 did. This threshold is denoted by R early . The effectiveness of piecewise linear function on this forecasting problem has been confirmed in the literature 13,15,20 and 26. The processed label values are smoothed.

Performance Metrics
In this paper, two metrics are used to evaluate prognostic performance [12], i.e., scoring function (Score) and the root mean square error (RMSE). The scoring function is widely used in the International Conference on Prognostics and Health Management Data Challenge. The formulation of Score is defined as: where d i = RUL � i − RUL i , that is the error between the predicted value and the true value of the i-th testing data sample, and N is the number of engines in the test set. Score penalizes late predictions more than early predictions since late predictions may cause serious accidents. Another metric is RMSE, which measures the average distance between the predicted values and the actual values, RMSE is presented as: Fig. 5 Normalized results of the 14 signals of the first engine in the FD001 and FD003 datasets, respectively

Result Analysis
The processor used in the experiments is Intel(R) Core (TM) i7-8565U, 8 GB memory, Microsoft Windows 10 64 bit. The python version is 3.6.

Model Parameters and Training Results
The time window size is an important factor affecting the prediction accuracy of the proposed method. Figure 8 shows the effect of the time window size on the model performance. The prediction results of the RUL are affected by the amount of historical information. As shown in the Fig. 8, increasing the time window size can improve the prediction accuracy of the RUL of the engine. Note that the selected time window is determined by the number of the shortest cycle of the engine test set. Therefore, the time window sizes L s of the FD001 and the FD003 data sets are 31 and 38, respectively. Furthermore, we train the DCNN-LightGBM model separately for 10 times to exclude the effects of random disturbances and take the average of the results. The key parameters of the proposed model are summarized in Table 2. Figure 9 shows the RUL prediction results of 100 testing engine units in descending order. It can be observed that the predicted values of DCNN-LightGBM are closer to the actual values than that of DCNN. The prediction results of the proposed method are more accurate along with the engine degeneration. This is because the model can extract more failure features from the sensor data with increasing degradation. The security of the system can be improved by accurately predicting the RUL near the stage of the engine failures.

Comparing with Other Popular Method
To verify the superiority of the proposed model, the methods including XGBoost, RNN, Deep Neural Network (DNN), DCNN and LightGBM are used to predict the RUL. Like DCNN-LightGBM, all comparison models are independently trained ten times. The prognostic results compared with different methods are presented in Table 3. DCNN-LightGBM has an excellent performance in two metrics. Although LightGBM improves the accuracy based on XGBoost, it has lower accuracy than DCNN-LightGBM. This is because DCNN can learn advanced features from original features through complex network calculations. These advanced features contain more concentrated effective information and are low dimensional. And many of these features cannot be constructed manually. So these features can achieve a better fitting effect when used on LightGBM. Therefore, the improved method based on DCNN achieves higher accuracy.

Comparing with Other Popular Method
To verify the superiority of the proposed model, the methods including XGBoost, RNN, Deep Neural Network (DNN), DCNN and LightGBM are used to predict the RUL. Like DCNN-LightGBM, all comparison models are independently trained 10 times. The prognostic results compared with different methods are presented in Table 3. DCNN-LightGBM has an excellent performance in two metrics. Although LightGBM improves the accuracy based on XGBoost, it.  has lower accuracy than DCNN-LightGBM. This is because DCNN can learn advanced features from original features through complex network calculations. These advanced features contain more concentrated effective information and are low dimensional. And many of these features cannot be constructed manually. So these features can achieve a better fitting effect when used on LightGBM. Therefore, the improved method based on DCNN achieves higher accuracy. Table 4 presents the research results of the current commonly used methods on the FD001 and FD003 data sets of C-MAPSS. Compared with traditional machine learning methods such as SVR and Random forest, deep learning method achieves better results in both Score and RMSE. Combined deep learning method such as Autoencoder-BLSTM has higher accuracy than the Deep LSTM which is the traditional deep learning method. Gradient boosting The number of trees 300 Subsample

Comparing with Related Works
Subsample ratio of the training instance 0.8

Fig. 9
Sorted prediction for the 100 testing engine units also has a good performance in the RUL estimation. In addition, Table 4 shows that both the proposed algorithm and the CNN-XGB algorithm can provide accurate prediction results for the RUL of aero-engines. The CNN-XGB algorithm takes the average of the prediction results provided by CNN and XGB as the final result. The nature of the algorithm does not propose structural innovations for CNN and XGB. The proposed algorithm is different from the CNN-XGB. As a new prediction method, the algorithm obtained by the combination of DCNN and LightGBM can integrate the advantage of DCNN to extract degradation features and the advantage of LightGBM to obtain the final RUL prediction values.

Conclusion
In this paper, we propose to use the model combining DCNN and LightGBM to predict the RUL of the aero-engine, and confirm the excellent performances of this model on the C-MAPSS data. The role of DCNN is to extract deep features, and LightGBM is used to complete the prediction of the RUL. Comparing the scores of different models, we can see that the ensemble learning model has better prediction accuracy than other models. As the degree of degradation increasing, the prediction results are more accurate. While the proposed method gets good experiment results, future architecture optimization is necessary. According to the literature 31, we will further optimize the model structure and hyperparameters to reduce training time and computational load. In future work, the proposed method will be applied for the RUL prediction of aero-engines with different operation conditions. When the operation conditions are more complex, the RUL prediction is more challenging and this kind of problems deserve further studies.