Adaptive remaining useful life prediction framework with stochastic failure threshold for experimental bearings with different lifetimes under contaminated condition

Deterioration modelling and remaining useful life (RUL) prediction of roller bearings is critical to ensure a safe, reliable, and efficient operation of rotating machinery. RUL prediction models in model-based approaches are often based on constant failure threshold and time-domain features for bearings’ failure prognosis. Due to nonlinearity of the acceleration signals, noises, and measurement errors, the time-domain features used as condition indicators are unable to track bearings’ degradation successfully and they are mostly utilized for fault diagnosis, especially in the fault classification field using machine learning algorithms. This paper proposes an adaptive RUL prediction framework with a stochastic failure threshold which comprises of two main phases of feature extraction and RUL prediction using laboratory-acquired accelerated life test data obtained from contaminated bearings. The first phase is to decompose the empirical input signals into different frequency bands using some time–frequency transformation functions and extract several condition indicators for the second phase. The second phase is based on a stochastic Wiener process while the key parameters of the model are updated iteratively using a Bayesian approach, and RUL at different degradation datapoints is computed numerically. The experimental results showed the good performance of the developed framework. Some factors affecting RUL prediction such as the length of bearing samples, and degradation mechanism are highlighted in the result. The results of this paper can be further used for an effective maintenance optimization, determining an optimal maintenance alarm threshold, improving the reliability and safety of rotating machinery, and reducing the downtime cost.


Introduction
Roller bearings are one of the most important and sensitive components in rotating machinery in several industries which are widely used in various equipment such as wind turbines, high-speed trains, aeroengines, and vehicles (Yang et al. 2022;Farsi and Masood Hosseini 2019). They are often operated under high stress conditions, which lead to unexpected failures. Such unexpected failures account for 45-55% of motor failures and can cause a breakdown of the entire machine, reducing its productivity, and service lifetime while increasing the unplanned downtime and cost (Xia et al. 2019;Guo et al. 2017;Singleton et al. 2015). Condition monitoring and a precise remaining useful life (RUL) estimation of bearings is required to avoid such sudden failures and catastrophes, improve the reliability, operational safety, and availability of the rotating machinery, and build an effective maintenance schedule (Li et al. 2018a;Ravikumar et al. 2021). RUL prediction using the current machine condition and past operating profile, as a key factor in condition-based maintenance (CBM), is defined as the length of the time a system is likely to operate before it fails (Ahmadzadeh and Lundberg 2014;Fornlöf et al. 2016).
Two main challenges in the RUL prediction of roller bearings are health indicator (HI) construction and failure threshold identification. HIs are built based on many features extracted from acceleration signals, which represent the underlying deterioration behavior of the bearings throughout their lifetime. The features are categorized as time-domain features, frequency-domain features, and time-frequency features, which exhibit different failure prediction capabilities and different sensitivities to degradation levels. Frequency-domain and time-frequency domain features are widely used together with data-driven approaches for both fault diagnosis and failure prognostics (Bhattacharya and Dan 2014). The degradation model in data-driven approaches such as machine-learning techniques and artificial neural networks (ANN) is built based on sufficient historical failure data to estimate RUL without any prior knowledge about the system and the physical nature of the system's degradation mechanism (Nguyen and Medjaher 2019).
For instance, Zhu et al. (2019) presented a RUL-estimating approach based on wavelet transform (WT) as a time-frequency representation (TFR) technique and a multiscale convolutional neural network. Liu et al. (2021) proposed a fault prediction method for aero-engine bearings with multi-stage degradation performance by combining the advantages of long short-term memory network with statistical process analysis. In their study, the time-domain root mean square (RMS) of the raw signals was used as the HI. Li et al. (2018a; proposed a novel deep learning-based RUL prediction approach to estimate the machine degradation status. They applied short-time Fourier transform (STFT) on raw acceleration data to obtain time-frequency information of the signals and implemented a multi-scale feature extraction using convolutional neural network (CNN) to improve the learning ability of the algorithm. Li et al. (2018a, b) proposed a new data-driven approach for RUL prediction using deep CNN where the inputs are the normalized raw collected data, and the outputs are the estimated RUL values.
In contrast, model-based approaches use the knowledge of a system's failure mechanism such as the crack growth to build a quantitative mathematical description of the system's deterioration process to estimate its RUL (Liao and Köttig 2016;Salehpour-Oskouei and Pourgol-Mohammad 2017). Among the available model-based approaches, stochastic processes have attracted considerable research interest as they consider random errors in measurements, uncertainties in a working environment, individual variability of components in a larger population, and capture stochastic dynamics in the degradation process Salehpour-Oskouei and Pourgol-Mohammad 2017).
For instance, Liu and Fan (2022) proposed a new stochastic degradation model which integrates the characteristics of multi-stage and multi-variability of degradation trend while the model's parameters are updated using parameters estimation method based on expectation maximization algorithm. To show the effectiveness of their proposed approach, the authors used a real-case bearing dataset where the RMS of vibration signals as a time-domain feature has been used to track the bearings' condition and RUL prediction. In (Wen et al. 2018a, b), the authors employed a multiple changepoint Wiener process as a degradation model, applied full Bayesian approach to update the parameters of the model iteratively, and predicted RUL of several bearings using a Monte-Carlo simulation algorithm. The degradation signals in thrust bearings were log transformed vibration signal obtained through accelerated tests. Ahmad et al. (2019) proposed a dimensionless HI, which is the RMS of the vibration signal at any time divided by RMS under a baseline condition and estimated the RUL of bearings using a dynamic regression model. The dynamic regression model was recursively updated to capture the underlying degradation trend of the bearings. Thoppil et al. (2021) proposed an algorithm utilizing principle component analysis (PCA) to reduce the dimensionality of the monitored data and exponential model to construct a HI and estimate RUL. Li et al. (2015) used the kurtosis and RMS of vibration signals as two time-domain 1 3 features to track the bearings' degradation in healthy and degraded stages and adopted an improved exponential model for failure prognosis of the bearings. Wen et al. (2018a, b) developed an algorithm to predict the RUL of bearings using the RMS of vibration signals as a time-domain statistical feature and a nonlinear Wiener process with a time-dependent drift parameter. However, in these studies, the frequency domain of vibration signals for failure prognosis has not been thoroughly demonstrated and studied.
As discussed above, much of the available literature on stochastic or statistical models have considered simple timedomain features, which fluctuate significantly in some cases and, thus, are not counted as good indicators for health condition monitoring of rolling components. In contrast, some other studies that considered the frequency domain of the signals, mostly worked with machine learning techniques and neural networks for RUL prediction which may suffer from black box approach and model explainability (Rudin 2019). In other words, the integration of frequency-domain approaches for HI construction and stochastic modeling for failure prediction of bearings has not been fully explored before.
Another key challenge in prognostics is the lack of a predetermined failure threshold. H. Wang et al. (2021) proposed a dynamic RUL prediction and optimal maintenance time estimation approach where an isotonic regression-based method was used for data preprocessing and a Gamma process was used to predict the bearing RUL. However, in their study, the failure threshold was assumed to be constant through the bearing's lifetime which may not be realistic in practice. Yan et al. (2021) also investigates the degradation modeling and RUL estimation of dependent competing failure processes subject to gradual degradation and random shocks. The failure time in their paper is defined as the first point of time that the system's cumulative damage reaches or exceeds a prespecified failure threshold. Narayanan et al. (2019) defined the failure threshold as the last time instance of running or the last condition-monitoring datapoint and predicted RUL considering kurtosis and RMS as HIs by using a support vector machine. Pan et al. (2017) developed a Wiener process-based reliability estimation approach in which the Wiener process drift parameter follows a truncated normal distribution (TND). The reliability function and probability density function (PDF) of the FPT are both based on a pre-determined failure threshold. While most of the recent literature has studied RUL prediction using a predefined fixed threshold, the stochastic failure threshold has not been thoroughly studied. However, in practice, bearings are used by diverse users in various systems. In many cases, the designer of a component might not know with certainty the level at which degradation can explicitly cause failure. In addition, the difference in the scales of features representing the underlying performance of bearings, as well as the degradation mechanisms, fault location, and some physical characteristics of bearings, yield different failure thresholds (Peng and Coit 2007). These findings emphasize the importance of investigating probabilistic failure thresholds. Tang et al. (2016) are one of the few researchers who developed a Wiener-based RUL prediction model with a stochastic failure threshold following TND. However, the diffusion parameter in the model is assumed to be fixed among all units, which may not be true in some real-life applications.
To address the above issues, this paper proposes a RULprediction framework comprising of two main phases of feature extraction using three main time-frequency approaches and an adaptive stochastic model for the RUL prediction of experimental roller bearings under contaminated conditions. The main research contribution is summarized as follows: • The frequency domain of acceleration signals is accounted for in addition to the time domain by decomposing the signals into various frequency bands using different transformation techniques and extracting statistical features (i.e., condition indicators) from each band for deterioration modeling. The features extracted from such techniques are more sensitive to the degradation stage and thus more suitable for failure prognostics of bearings. • Since the data of the selected features exhibit a linear non-monotonic degradation trend, a continuous-time stochastic Wiener process is used as the RUL-prediction model. The failure threshold in the model is assumed a probabilistic variable following TND to reflect the threshold variability while the key parameters of the model are also assumed random and updated iteratively when a new observation of the degradation data becomes available. The RUL prediction of the bearings are computed numerically at different degradation datapoints with reasonably high accuracy which helps maintenance managers develop more effective and reliable predictive maintenance policies. • Some real-time accelerated degradation tests have been conducted on rolling element bearings degraded by particle contamination under laboratory operating condition and the data were collected to present the effectiveness of the proposed framework. Most bearing datasets in literature have been tested under loading effect whereas contamination as a crucial degradation mechanism was not investigated before in existing datasets.
The rest of this paper is structured as follows. Section 2 demonstrates the proposed framework together with the theoretical literature review. Section 3 presents the model development. Section 4 describes the case study to illustrate the proposed framework, and Sect. 5 presents the discussion and conclusions. Figure 1 shows the proposed framework of the paper. Phase 1 of this framework involves feature extraction, where each of the collected raw acceleration signals is decomposed into different frequency bands to filter out all irrelevant frequencies and determine the signature of the fault characteristics of the signals. The decomposition is performed using a number of techniques namely empirical mode decomposition (EMD), empirical wavelet transform (EWT), and fast Fourier transform (FFT). The decomposition in FFT is based on the frequency variation throughout the bearing lifetime. Once the signals have been decomposed into different windows with different frequency ranges, the statistical features as condition indicators including kurtosis, RMS, crest factor, shape factor, impulse factor, skewness, and mean value, are calculated for each window to compare the frequency bands and understand which feature from which band can monitor the health status of the bearings. The condition indicators (CIs) that exhibit a similar trend throughout the lifetime of bearings are then considered as the selected features and used for RUL prediction of bearings in phase 2.

Proposed framework for RUL prediction
Phase 2 involves degradation modeling and online RUL prediction. A continuous-time stochastic Wiener process is built on the increments of the selected features. The Wiener process parameters are estimated by the maximum likelihood estimation (Si and Zhou 2014). The Bayesian prior knowledge basis for each bearing dataset is obtained from the other bearings' datasets that have been tested under the same laboratory operating condition. The prior distribution for the time-dependent drift and diffusion parameters of the Wiener process is a conjugated normal-gamma distribution (Cowles 2013). The prior distribution parameters, as well as the measurements of the degradation data, are used to update the parameters of the Wiener process and obtain the posterior distribution. The failure threshold is a stochastic variable following TND where the parameters are estimated by fitting a normal distribution on the real failure thresholds observed for each technique. A numerical approximation of RUL with respect to the first passage time (FPT) distribution is obtained for each experimental bearing, and the model is evaluated in terms of score, prediction error, and prognostics horizon (PH). The details of the framework will be discussed further in the following sections.

Phase 1. Feature extraction
Due to the complicated mechanical structure and nonstationary characteristic of roller bearings, raw acceleration signals may contain massive signal components and they can be contaminated by different kinds of noise during collection and transmission of the signals (Jiao et al. 2019). Thus, they may not be sensitive enough to the bearing's condition and suitable to be directly used for degradation modeling. For this purpose, various methods have been proposed to extract useful information from the acceleration signals in the time domain, frequency domain, and time-frequency domain for both diagnosis and prognosis, which are sensitive to the rotational speed and structural geometry of the bearings (Laala et al. 2020;Kumar et al. 2018). Some of the time-domain features fluctuate significantly, especially at the beginning of the bearing's deterioration, and some are not appropriate for dealing with localized faults (i.e., inner race faults, outer race faults, cage, and roller faults). However, the frequencydomain approaches are more reliable and accurate compared to the time-domain approaches, especially for fault diagnosis, as different types of bearing faults correspond to different frequency characteristics (Wang et al. 2014a;. Time-frequency signal-processing techniques provide the opportunity to identify components carrying important diagnostic information for further processing (Kumar et al. 2018).
The three main classifications of TFR techniques are Fourier transform (FT), WT, and Hilbert-Huang transform (HHT) (Cooley and Tukey 1965;Ricker 1940;Huang 2005). Among these, HHT as an adaptive data-driven approach performs well on both nonlinear and nonstationary vibration signals. However, it is an empirical approach and lacks an underlying solid mathematical theory (Boashash et al. 2016). In contrast, WT is based on a solid mathematical theory and can handle nonlinear signals and, in some cases, nonstationary signals. However, it is still based on a prescribed division mechanism for analyzing vibration signals (Bessous et al. 2016). FT faces the challenge of frequency band selection and does not perform effectively on nonstationary signals where different frequency components exist at different intervals of time (Kumar et al. 2022). In this study, the above three approaches are used to decompose the experimental signals into several frequency frames to understand how the features extracted from the decomposed signals using these techniques can differ in monitoring the underlying degradation process of the bearings and perform in combination with stochastic-threshold Wiener process for failure prognostics.

Fast Fourier transform
In this paper, FFT is applied to each acceleration signal taken within regular time frames, assuming that they are stationary. The FFT algorithm is used to compute the discrete Fourier transform (DFT) and its inverse. The k-point DFT converts a time-domain signal into a frequency-domain signal. The FFT of a signal is defined as Y = FFT(x) , where x is the raw acceleration signal and Y is the Fourier-transformed vector of the acceleration signal calculated using Eq. (1) (Zhang et al. 2013): where M is the total number of acceleration readings in one acceleration sample or the signal length, e −2 i M is the M th root of unity, and k is the number of points in DFT, or the new length of the Fourier-transformed signal, which is obtained from 2 2 k ≥|M| and is used to improve the visual frequency resolution and performance of FFT. Y(k) is the k th Fourier coefficient, which is a complex number comprising an imaginary part and a real part. One of the challenges with FT is the selection of the frequency band of interest. This selection can be based on the frequency variation of the frequency components in the Fourier spectrum of the signals, which can show the signal energy distribution as a function of frequency. This can be recognized by plotting the power spectral density or square of the magnitude part of the Fourier vector over the frequency vector to determine the dominant frequency of the data. Equation (2) calculates this mathematically by computing the difference between the accumulated amplitude difference of the FFT spectra on each frequency line (Wu et al. 2017): where f i (k) is the squared amplitude of Y(k) for the i th acceleration sample. N is the number of FFT spectra, which is equal to the number of acceleration samples from the healthy state to the failed state. f diff (k) indicates the variation in each frequency line throughout the lifetime of the bearings. Based on f diff (k) , the indicator H(g) is calculated using Eq. (3), which indicates the accumulated sum of f diff (k) and stands for the total difference from the first frequency line to the g th frequency line.
If the differences in the corresponding frequency line amplitudes along with lifetime are small, then the change in H(g) will be smooth. Otherwise, the value of H(g) will change rapidly and H(g) will have a large slope (Wu et al. 2017). The objective here is to find the range of g values (i.e., frequencies) where H(g) changes rapidly and has a large slope. This yields the optimal frequency band, where there is the highest variation in frequency throughout the bearing lifetime. The optimal frequency band can then be used to filter the signals and extract features from the filtered signals. The selected feature, which has a common increasing trend among all bearings, can then be used as an HI to model the bearing degradation and RUL prediction as it will reveal the underlying degradation performance of the bearing. In this study, the other frequency ranges that are close to the optimal frequency band are also used to filter the signals and, consequently, feature extraction, and RUL prediction performance.

Hilbert-Huang transform
The basis of HHT is empirical mode decomposition (EMD), which decomposes the signals based on their time-domain characteristics and uses the envelopes defined by the local maxima and minima. Through EMD, the time-based signals are decomposed into different segments called intrinsic mode functions (IMFs), ordered from high frequency to low frequency, as presented in Eq. (4). An IMF must satisfy two conditions. First, the number of local extrema and the number of zero crossings must differ at most by one. Second, at any point in the IMF, the mean value of the local minima and local maxima must be zero (Zhang et al. 2019).
where t is the time within each sample, x(t) is the original time-domain signal, c i (t) is the ith IMF, n is the number of IMFs extracted from an acceleration signal, and r(t) is the residual signal, which is the mean trend of x(t) . The algorithm used to extract these IMFs is summarized below.

Initialize with the input signal x(t).
2. Find the extrema (local minima and maxima points). 3. Use pchip interpolation to determine the mean of the upper and lower envelopes: ( m 1 (t) ) and conditions are met; if not, steps 2 to 3 are repeated on h 1 (t) until the conditions are satisfied and the 1st IMF is obtained. 5. Once c 1 (t) is determined, steps 2 to 4 are repeated on the residue, r(t) = x(t) − c 1 (t) , to determine other IMFs. There are various stopping criteria to stop the procedure, such as the Cauchy-type criterion, the mean value crite- rion, and the maximum number of IMF criteria (Wang et al. 2010). Once one of these criteria is met, the algorithm stops and provides a set of IMFs together with the residue signal. As the original acceleration signals are not smoothed, pchip interpolation was selected to determine the mean of local extrema.
The details of the process are provided in the literature (Huang et al. 1998;Lei et al. 2012). In this section, the acceleration signals are decomposed into a number of IMFs. The statistical features are calculated from each of these IMFs and compared to determine which feature from which IMF is suitable for deterioration modeling and failure prognostics.

Empirical wavelet transform
Wavelet transform examines acceleration signals using several wavelet basis functions. The selection of this function and the number of decomposition levels in the denoising process are crucial factors for extracting the fault features, which affect the performance of the fault detection process (Lin and Qu 2000). EWT is developed to overcome some of these shortcomings and to improve the flexibility and decomposition performance of conventional WT. The process of signal segmentation in EWT is conducted using the Fourier spectrum of the signal, which is important to make the wavelet adaptive to the signal (Chen et al. 2016). However, the main challenge is still to select the most sensitive fault-prone sub-band, which contains the hidden defect information of the bearings. The analysis of inappropriate sub-bands can lead to misleading frequency ranges and negatively affect fault detection and failure prediction.
The general idea of EWT is to segment the Fourier spectrum of the acceleration signal by obtaining band-pass filters. For a frequency w , assumed in the normalized Fourier axis of [ 0, ], each segment is defined as [ w n−1 , w n ], where w n is the limit between the segments such that w 0 = 0 and w n = or expressed as In this paper, the EWT function is applied to each signal (i.e., acceleration sample) to obtain the EWT coefficients and multiresolution analysis (MRA) of the signals.
The model segmentation in this method is based on the frequency domain, while in EMD, the decomposition is based on the time domain. Thus, the first step is to obtain the Fourier spectrum of the acceleration signal, and the second step is to separate different portions of the spectrum that correspond to modes centered around a specific frequency. To determine these boundaries or portions of the frequency modes in Fourier segmentation, the local minima method of the Fourier spectrum's logarithm of the processed signal is used. This is to address the two challenges namely flat picked modes and global-versus-local modes described by Lei et al. (2012). In this paper, EWT obtains the MRA of each acceleration signal by using the five largest peaks, while the first local minimum is located between adjacent peaks.
The objective of EWT implementation on the acceleration signals is to break them down into different sub-bands based on their frequency domain. The statistical features are calculated from each of the first three sub-bands and compared to identify which feature from which band is the most informative for failure prognosis. Table 1 presents the features and their calculation formulae.
If the time-domain acceleration signals are directly used for feature extraction without accounting for the frequency domain, then x i indicates the acceleration measurements. However, if the signals are processed by FFT, EMD, and EWT, x i can be the measurements of the processed signal. The variables m , , and M indicate the mean, standard deviation, and the number of measurements in a signal respectively.
Several performance measures in literature have been discussed for selecting more appropriate degradation indicators, such as monotonicity, prognosability, and trendability (Boukra et al. 2019;Thoppil et al. 2021). They are used to measure the suitability of a candidate for RUL estimation. However, in this paper, a broader perspective is considered, which means that the RUL of bearings is predicted using all features that have a similar trend over the bearings' lifetimes.

Phase 2. Modeling and evaluation of the model
To capture the stochasticity of the bearings degradation, a Wiener process as a continuous-time stochastic process with a stochastic failure threshold is employed to model the health status of the bearings and predict their RUL. The Wiener process is selected because the observed experimental degradation trajectories exhibit linear and nonmonotonic behavior over the components' lifetimes. In this model, the drift and diffusion parameters are updated iteratively using the Bayesian inference approach once more observations are available in order to make the model adaptive and consider the uncertainty of the model's parameters. As mentioned before, the failure thresholds in different experiments are treated as stochastic variables following TND to involve the uncertainty of the thresholds.

Wiener process with stochastic failure threshold
A general format of Wiener process as a widely used stochastic process, is given in Eq. (5). One main assumption in this model is that the degradation increments are assumed to be independent and normally distributed with the mean and standard deviation t and 2 t , respectively, while t is the time unit. In the conventional Wiener process, the degradation process ( X(t) ) is modeled as follows (Wang et al. 2014a, b): where x 0 is the initial degradation state of the system of interest, is the drift coefficient capturing the degradation rate, B is the diffusion coefficient, and B(t) while t ≥ 0 is the standard Brownian motion that represents the stochastic dynamics of the degradation process.
Assuming that RUL is interpreted as FPT or the first time that the degradation state has passed the failure threshold, FPT distribution is analytically proved to be an inverse gaussian (IG) distributed with the parameters n = L−X t n and n = (L−Xt) 2 2 n , where L is the failure threshold, X t is the degradation status at time t , and n and n are the drift and diffusion coefficients of the Wiener process, respectively ).

Bayesian inference-Wiener process
The mean (µ) and precision parameters ( 1 2 B ) as the two uncertain factors affecting the bearing degradation process, are the two examples of random variables that are unknown in reality. Expert knowledge, theoretical analyses, and historical data can be used to obtain preliminary information regarding these uncertain factors and their associated probability distributions (Gao et al. 2020). In this paper, to obtain the prior distribution parameters, b datasets are divided into two portions, which include the bearing of interest whose parameters need to be estimated and the b − 1 other datasets treated as historical data. Since a Wiener process is used to model the degradation processes and future increments are assumed to follow a normal distribution with unknown mean and precision parameters, the conjugate prior distribution is a normal-gamma distribution, as presented in Eq. (6): Let us consider b units (i.e., experimental bearings), and l , l , and l for l = 1, 2, … , b are the mean, standard deviation, and precision parameters of the measurements of bearing l respectively, then the prior distribution parameters of bearing j , where j ∈ [1, 2, … , b] , are calculated as below (Cowles 2013): • 0 and 0 are the maximum likelihood estimated shape and scale parameters of fitted Gamma distribution on l values, where l = 1, 2, … , b and j ∉ l . The PDF of Gamma distribution is: where Γ(.) is the Gamma function, k and are the shape and scale parameters respectively (Cowles 2013).
Once the prior distribution parameters are computed, the degradation observations of bearing j , ( x 1 , x 2 , x 3 , … , x n ), at different samples, together with the existing prior distribution information, can be used to update the posterior distribution of the uncertain variables for bearing j . Equations (7-10) present how to calculate the posterior distribution parameters, which is also a normal-gamma distribution. The increments in the degradation observations ( x i ′s ) are both measurements from time-domain features and the measurements from the features obtained with three other techniques (i.e., FFT, EMD, and EWT). The large uncertainty regarding the and parameters at the beginning of the procedure, when the prior distribution is only available, can lead to a large uncertainty in RUL prediction. However, as new observations are gradually obtained, the updated posterior distribution approaches the real distribution, the uncertainty of and decreases, and the uncertainty of RUL prediction reduces as well. This provides an updated RUL distribution for each sample; however, RUL is still affected by the stochastic nature of the degradation process: In addition to and parameters, the failure threshold ( L ) is another uncertain variable in the model. In reallife applications, the failure threshold is not a fixed value, which varies among different components and users. Moreover, there is uncertainty of threshold with respect to the failure time in historical data, while the degradation level that can cause failure is also uncertain. If the considered failure threshold is larger than the real failure threshold, it will lead to an unexpected failure, a less reliable system, and a delay in maintenance time. In contrast, if it is smaller than the real failure threshold, it will cause premature maintenance. In this paper, the failure threshold is a stochastic variable following TND. The mean ( L ) and standard deviation ( L ) of TND are obtained by fitting normal distribution on the real observed failure thresholds for the features that have a trend over time in each technique (i.e., FFT, EMD, and EWT).
Using the law of total probability (Cowles 2013), the unconditional cumulative distribution function (CDF) of RUL of bearings can be calculated numerically, as presented in Eq. (11). The numerical solution is determined by integrating the multiplication of IG distribution, conjugate normal-gamma distribution, and TND over all possible values of , , and L . The reason to choose TND to model the failure threshold is that the observed thresholds in features among different bearings follow a normal distribution. A truncated version of normal distribution is considered, as L and IG distribution parameters are positive values (i.e., L > 0 , n = L−x t n > 0 , n = (L−xt) 2 2 n > 0 ). As the Wiener process parameters ( n and 2 n ) are also positive, L − X t should be above zero. Thus, the TND domain is [X t , +∞] , which means that only L values greater than X t are taken into account for modeling.
where of , and f GA | n , n is the PDF of , which is gamma-distributed. The result of Eq. (11) is the CDF and PDF of RUL at different samples of the experimental bearings. The multidimensional composite midterm rule (Cowles 2013) can then be employed to solve numerical integration along each dimension. For this purpose, the three intervals are divided into n subintervals so that the entire 3D support is divided into small rectangular solids, and the integral can be solved numerically, although it becomes computationally expensive, depending on the number of subintervals.
To address this problem, it is proposed to find the areas of each subfunction that have non-zero values at different subintervals. As the main total function is a multiplication of the three distributions over µ, and L , if one subfunction is equal to zero, the total function will also be equal to zero, and such intervals can be disregarded to have a more efficient and faster computation. For this purpose, the 0.025th and 99.975th percentiles are calculated for each subfunction, and the values in between are extracted as meaningful nonzero intervals. The non-zero intervals can then be divided into subintervals to calculate the integration using the multidimensional composite midterm rule.

Model evaluation
To evaluate the model, the relative error denoted by Er i in Eq. (12) is calculated at the i th prediction point by comparing the actual observed value of RUL, denoted by RUL, and its predicted value, denoted by R UL.
There are various performance metrics to evaluate the model's accuracy and how well it can predict future condition of the system. In this paper, several prognostic metrics are used for the model's evaluation. The score of a model is a measure of its accuracy and is estimated by the weighted average of penalty function, using Eq. (13). The weighted score function assigns a higher weight to the recent RUL predictions, as it corresponds to the severe degradation stage of the bearings when failure prediction is vital. The weights increase linearly from the first datapoint to the last one. The highest weight is assigned to the last prediction, and then descends linearly based on the number of predicted datapoints. Thus, the n th prediction value has a weight of n and the first prediction datapoint has a weight of 1.
The penalty function presented in Eq. (14), is adopted from IEEE PHM 2012 prognostics challenge datasets to distinguish between the overestimation and underestimation of RUL (Nectoux et al. 2012).
Equation (14) means that there is a higher penalty with respect to the estimated RULs that exceed the actual RUL and a lesser penalty where the estimated RULs are lower than the actual ones.
Equation (15) gives the Cumulative relative accuracy (CRA) which is defined as the average relative error between the predicted and actual values of RUL at all prediction points.
Prognostics horizon (PH) in Eq. (16)  where n represents the prediction number. A larger PH exhibits a better prediction performance, which results in an earlier end of life prediction with more reliability.
Equations (17) and (18) give the root mean square relative error (RMSRE) and mean absolute error (MAE) of the predicted and actual lifetimes, respectively.
Higher values of score, PH, and CRA and lower values of RMSRE and MAE indicate that the model exhibits a better prognostics capability for RUL prediction (Wu et al. 2017).

Case study
In this section, an experimental dataset is used to illustrate the proposed framework. The data are the acceleration of rotational bearings in the horizontal direction collected under laboratory operating condition. A detailed description of the experimental setup and the data are given in the following sections.

Laboratory setup design
An experimental setup at the Reliability, Availability, Maintainability, and Safety (RAMS) laboratory at NTNU is developed to conduct numerous accelerated life tests on roller bearings. The setup is called the "Bently Nevada (system 1) Rotor Kit, model RK4" depicted in Fig. 2. The major components of the setup include a horizontal bearing shaft, bearing blocks mounted on the two sides of the shaft to hold the bearings, and the accelerometers shown in Fig. 3. The bearing block located on the left part of the setup contains the experimental bearing which runs until its acceleration amplitude crosses 10 g. The two sets of miniature accelerometers are mounted radially and axially on each bearing to collect the horizontal and vertical acceleration data over time. The collected data are then transformed into the central system, which visualizes the data on the screen. The amplifier (type 5134B) and accelerometers (type 8702B100), used in the measurement part of this setup are provided by the "Kistler" company. The rotating part of this setup is an asynchronous motor, which is used as an actuator that allows the inner race, rollers, and the bearing cage to rotate. The maximum rotational speed of the motor is approximately 10,000 revolutions per minute (rpm). The vibration setup is mounted on an aluminum platform with a safety cover, which is used while running experiments to ensure the safety of the operator. Lubrication and contamination are the two most important failure modes (Lee and Choi 2020). However, in this study, contamination is selected as the degradation mechanism, since it can speed up the experiment process. To this aim, the bearing degradation mechanism is achieved by mixing solid Silicon carbide particles (called BW F240, Coulter particles, with the average size of 50-52 µm) with a lubricant and pouring the same amount of this mixture with one press of nozzle into the experimental bearing at predetermined time intervals (i.e., every 25 min) until the acceleration reading reaches a predefined threshold.

Data description
The output data are stored in CSV files and include horizontal and vertical acceleration, together with motor speed and the date and time of data collection. As the vertical acceleration data are under the impact of gravity and are not reliable and accurate enough, only the horizontal acceleration data are considered in this paper. Table 2 presents the geometric characteristics of the bearings.
The lifetime of a bearing consists of a number of acceleration samples which are obtained every 5 min. Figure 4 presents the lifetime of B1 for illustration. Table 3 summarizes the collected acceleration data in the horizontal direction and the number of samples for each bearing. The motor speed is 2975 rpm, the sampling frequency for bearing B1 is 12,806 Hz, and for the rest of this dataset is 12,802 Hz. Each sample lasts 0.639 s, which means B1 has 2064 and the other bearings have 8192 datapoints. To prevent motor damage, the experimental tests are stopped once the amplitude of the acceleration surpasses 10 g.
For illustration purpose, Fig. 5 shows the first sample of bearing B1 when the bearing is healthy and the last sample (i.e., sample number 110) when the acceleration amplitude is above 10 g and the test is stopped.

Feature extraction and HI construction
After collecting the data as a batch of acceleration samples, the three techniques of FFT, EMD, and EWT as given in Sects. 2.1.1, 2.1.2, and 2.1.3 are employed to decompose the signals into different frequency bands and filter the signals. In the FT approach, the frequency spectrum of  the acceleration signals corresponding to the failure state of the bearings is obtained. As stated before, one major challenge in FT is to determine the fault-prone frequency band that contains the bearing's defect information. The development of frequency spectra shows the variation in frequency over the bearing's lifetime, which evolves gradually at the beginning and dramatically at the end. The frequency band with the highest variation is mathematically computed using Eq. (3). Figures 6 and 7 show f diff (k) and H(g) given in Eqs.
(2) and (3), respectively, for the 10 experimental bearings tested in the laboratory. The optimal frequency band for all bearings is marked with red dashed lines. The frequency band of [50,2000] Hz has the maximum variation in all bearings and the bearings show a severe degradation within this frequency band interval. Thus, it is selected as optimal frequency band for feature extraction and degradation modeling. In addition to the optimal frequency band, the frequencies of [50, 2500], and [50, 3000] Hz are also used for feature extraction and their RUL prediction performance is presented in Appendix 1, Table 11.
The statistical features are calculated for the Fourier optimal frequency band and the frequencies of [50,2500] and [50, 3000] Hz using the mathematical expressions in Table 1. Figure 8 shows the features calculated for the optimal frequency band (i.e., [50,2000] Hz) over the bearings'  Fig. 6 The frequency variation throughout the bearings lifetime using FFT lifetime. As shown in this Figure, the RMS energy-representative feature exhibits a similar increasing trend over the bearings' lifetime, while the other features do not show a common increasing or decreasing trend. Thus, RMS feature is selected as the degradation indicator since it can provide more relevant information for degradation modeling while it is simple to extract. The same argument applies to the other tested frequencies.
To obtain the optimal HIs from the EMD and EWT approaches, each acceleration sample is decomposed into different modes. The decomposition process in EWT is based on the local minimum approach to avoid the challenge of flat-picked modes, while the logarithm of the frequency spectrum of the signals is taken into account, instead of the frequency spectrum, to address the challenge of global-versus-local modes. Similar to FT, the features are calculated for each of the first three EWT sub-bands and are compared to identify the most informative HI for deterioration modeling and RUL prediction. In EMD, as a method with a timebased decomposition procedure, the first three IMFs of each acceleration signal are extracted, and their statistical features are calculated and compared. The first IMFs mainly consists of high-frequency components, they are more informative, however they contain more noise compared to the last IMFs (Zhang et al. 2020). Figure 9 presents the RMS energy degradation trajectories of the FT optimal frequency band, the first IMF of EMD, and the first MRA of EWT, as well as the timedomain approach. The RMS starts from zero where the experiment has started, and it ends when the acceleration amplitude surpasses 10 g.

Results and discussion
The increments in the different degradation trajectories of the bearings' datasets are used to estimate the drift and diffusion parameters of the Wiener process. The prior parameters and observation measurements of the degradation process are used to update the parameters continuously at each sample. In this model, the stochastic failure threshold (L), denoted by L ∼ TND( L , L ) , follows a TND with mean ( L ) and standard deviation ( L ). L and L are estimated by fitting a normal distribution to the maximum degradation Fig. 7 The optimal frequency band for different bearings using FFT levels of different bearings which vary between 0.5 and 3.0 as seen in Fig. 9. Table 4 presents the TND parameters of the failure thresholds. Table 5 presents the drift and diffusion parameters of the Wiener process for B1 using different approaches. It also presents the prior parameters, denoted by 0 , 0 , 0 , 0 , 0 , 0 for B1. The parameters for other bearings are available in Appendix 1, Tables 8, 9 and 10. The parameters are also calculated for frequency windows [50,2500] Hz, [50,3000] Hz in FT, IMF 2 and IMF 3 in EMD, MRA 2 and MRA 3 in EWT, presented in Appendix 1, Table 11, and will be further used for RUL estimation.
Several performance measures are utilized to evaluate the model performance in RUL prediction. The − performance metric, given in Eq. (19), evaluates whether the prediction accuracy at time t falls within the confidence bounds of , which are expressed as a percentage of the actual RUL values (Saxena et al. 2010). In other words, it measures whether the predictions stay within a cone of accuracy. That is, the accuracy bounds shrink over time. In this paper, the − prognostic metric is used to illustrate the predicted RUL values at different sample numbers over the bearing lifetime. Figures 10 and 11 present the − accuracy plot of RUL predictions with a stochastic and constant failure threshold, respectively, where the HI is the RMS feature extracted from the time domain, FT  Hz, EMD (IMF1), and EWT (MRA 1). in − approach is a goal threshold which is normally specified by an analyst considering the accuracy required to fulfill user specific criteria for success (Lall et al. 2013). In this paper, is selected arbitrarily as 30% for performance validation. In Fig. 11, the constant failure thresholds are equal to the maximum degradation level of the bearings. As it can be seen in Fig. 10, the estimation results of the model are close to the actual RUL values in most of the bearing datasets, especially, for the ones that have an average number of samples (e.g., 100-150 samples), such as B1, B4, and B9. In these cases, approximately 80% of the prediction results fall within 30% confidence bound intervals or are close to the confidence bounds. However, for the bearings that have a short lifetime with a few samples (e.g., 35 samples), such as B2, the model does not predict an acceptable RUL and there is a high prediction error. B2 has an abnormal lifetime, and there might be technical issues or external factors affecting the experiment that cause a shock in the acceleration amplitudes. Thus, the amplitude is recorded as over 10 g while the bearing is still in a normal operating mode. The technical problem can be induced by the accelerometers not being well-attached to the bearings resulting in erroneous signals. What is common among all datasets is that Fig. 9 RMS Energy degradation trajectories of the bearings using the FT optimal frequency band, the 1st IMF of EMD, and the 1st MRA of EWT at the beginning of the bearing lifetime, the model has a larger prediction error, which is due to high uncertainties associated the underlying model parameters ( , , L ). However, the uncertainties reduce with time and the model yields a more reliable RUL prediction as more degradation measurements are observed and used to update the model parameters. Figure 11 shows the RUL prediction results assuming a constant failure threshold throughout the lifetime of bearings. The constant failure threshold is assumed to be equal to the maximum degradation level of the bearing. Compared to the stochastic threshold case, the model has negligible prediction error in the second half of the bearing lifetimes (i.e., deterioration stage), which is significantly important in terms of predictive maintenance scheduling and decisionmaking. This indicates that, in the first half of the bearing lifetimes (i.e., normal stage), there is a higher variation in the failure threshold compared to the second half (i.e., deterioration stage). Moreover, in deterioration stage, with both constant and stochastic failure thresholds, the time-frequency approaches (FT, EMD, and EWT) fluctuate more and show higher variation and sensitivity to noise compared to the simple time-domain technique.
In order to evaluate the prediction performance of the model with both constant and stochastic failure threshold, the performance metrics given in Eqs. (12-18) have been calculated for each approach. Tables 6 and 7 present the average of performance metrics such as MAE, RMSRE, and the model score, of all the bearings for the model with stochastic and constant failure threshold, respectively.
As it can be seen from Table 6, the RMS from EMD as the HI, has less RUL prediction error compared to others in terms of the average CRA, RMSRE, and MAE metrics. This means that it is a better health indicator for reflecting the deterioration process of roller bearings throughout their lifetime. The RMS extracted from the  Hz frequency band in FT, has larger average PH value compared to the other methods. This states that the R UL falls within the confidence bound at an earlier point of time in the degradation process compared to other methods. A better performance of the different methods is presented in bold.
According to Tables 6 and 7, taking the uncertainty of failure threshold into account in addition to other model parameters has reduced the prediction error significantly and improved the result, although the score of the model in both cases (i.e., stochastic and constant failure threshold) did not have a noticeable change. Additionally, the model with stochastic failure threshold has larger values of PH which means the model can predict within the desired accuracy sufficiently prior to the end of bearings life.
As it is presented in Table 10 in Appendix 1, for the bearings with short lifetime (i.e., those with few samples) and a different RUL prediction pattern, such as B2, the EMD approach outperforms the other methods. In other words, the RMS value extracted from IMF 1 is a more informative and suitable HI for condition monitoring of bearings compared to the simple time-domain approach, even for the bearings with an abnormal period of life. In contrast, for the bearings with long lifetime (i.e., those with more than 200 samples), such as B6, the EWT and time-domain approaches perform better than EMD. For the bearings with an average lifetime (i.e., 100 to 150 samples), the RUL predictions fall within the 30% confidence bounds after approximately 20 samples. This means that in both the early stage of bearing lifetime and in the deterioration stage, RUL can be predicted based on the prior knowledge obtained from similar bearings and the current condition-monitoring data using this framework.
Overall, by considering the uncertainty of different parameters such as failure threshold, and the parameters of  the degradation model, the performance of the framework has improved significantly. The prediction error has been lowered, and the PH value has increased. In addition, combining the RMS feature extracted from EMD method with this framework, will provide the possibility of estimating RUL of roller bearings which have different length of lifetime and the same degradation mechanism.

Conclusions and recommendations
This paper proposes an adaptive RUL prediction framework for experimental roller bearings degraded by contamination using Silicon carbide particles. The framework comprises of two main phases of feature extraction and degradation modeling, with both constant and stochastic failure thresholds.
The feature extraction phase focuses on the decomposition of the raw acceleration signals and extracting statistical features to understand which frequencies and which features are the most relevant for degradation modeling. In order to propose the effectiveness of the proposed framework, it is applied to 10 run-to-failure experimental tests of bearings conducted at RAMS laboratory at NTNU. To this aim, the − accuracy plot with = 30% is presented to show whether the predicted RUL values fall within the specified bounds. The results show that the RUL prediction depends heavily on the failure threshold and duration of the bearing lifetime. According to the results, the optimal features (HIs) can successfully capture the evolving trend of bearing lifetime, especially within the steady-state and degradation stage. Moreover, the RMS feature from IMF 1 in EMD method has the best performance, and, therefore, it is a more suitable feature for tracking and representing the bearing deterioration process.
By considering the failure threshold as a stochastic variable, the RUL prediction performance of the model has improved significantly compared to the model with constant failure threshold. The prediction results, especially at the deterioration stage, were within the desired accuracy sufficiently in advance to the bearing end of life.
To improve the prediction results, future studies can focus on the fusion of HIs and a mixed-threshold RUL prediction model. In addition, data-driven approaches such as machine learning algorithms can be investigated to compare the performance of RUL prediction with respect to the mathematical complexity, computational time, and prediction accuracy at different health stages of bearings. Furthermore, the framework can be extended to various sensor data including temperature and different degradation mechanisms such as overloading.
Funding Open access funding provided by NTNU Norwegian University of Science and Technology (incl St. Olavs Hospital -Trondheim University Hospital). The research received no funding from any funding agency.

Conflict of interest
We, the authors of this manuscript have no conflicts of interest to disclose.

Human participants and/or animals
We, the authors ensure that no human or animal participation is involved in this research.

Informed conssent
We haven't used human participation or other personal information, informed consent is not required.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.