A neural network filtering approach for similarity-based remaining useful life estimation

The role of prognostics and health management is ever more prevalent with advanced techniques of estimation methods. However, data processing and remaining useful life prediction algorithms are often very different. Some difficulties in accurate prediction can be tackled by redefining raw data parameters into more meaningful and comprehensive health level indicators that will then provide performance information. Proper data processing has a significant importance on remaining useful life predictions, for example, to deal with data limitations or/and multi-regime operating conditions. The framework proposed in this paper considers a similarity-based prognostic algorithm that is fed by the use of data normalisation and filtering methods for operational trajectories of complex systems. This is combined with a data-driven prognostic technique based on feed-forward neural networks with multi-regime normalisation. In particular, the paper takes a close look at how pre-processing methods affect algorithm performance. The work presented herein shows a conceptual prognostic framework that overcomes challenges presented by short-term test datasets and that increases the prediction performance with regards to prognostic metrics.


Introduction
Prognostics is a promising technology which enables to manage assets based on their remaining useful life (RUL) and future health conditions [1,2].The potential of prognostics relies on its capacity to anticipate the evolution of anomalous conditions in time.It seeks to provide enough time for maintenance operation and sets off the alarm for necessary actions.A key feature in prognostics is to accurately predict the RUL of systems whose current condition and historical data are available.Such data-driven prognostics makes use of historical condition monitoring information for analysing and modelling of desired system output [1].Most of these data-driven approaches originated from "conventional numerical" methods and "machine learning" applications [3].
When a data-driven model is planned and tested, it has to face the challenges brought up by the complexity of realworld systems [24].In general, these include dealing with incomplete knowledge of future load conditions, inaccurate estimation of the current state of health, poor evolution models, sensor noise and varying operating conditions.In conventional methods, the selection of a prognostic algorithm for complex systems depends on an understanding of the challenges associated with the type of systems [25].Therefore, it is not surprising that most prognostic research work to date has been theoretical and restricted to a small number of models, and there have been few published examples of prognostic applications being fully fielded in condition monitoring of complex systems that are exposed to a range of operating conditions [26].
Indeed, prognostics of complex engineered systems remains an area of active research and development.Increasing trends appeared in the prognostic field have been summarised in the literature [27][28][29][30][31]. Machine learning methods and related techniques can offer an important part of the solution [3].They can be used when an explicit degradation model is not available, but sufficient condition monitoring data have been collected.Artificial neural networks are the most common machine learning methods used in prognostic applications [32][33][34]; other methods include decision trees [35], support vector machines [36][37][38], case-based reasoning [39], clustering [40], classification [41], Bayesian methods [42,43] and Fuzzy logic [44,45].Such approaches make use of historical data for estimating the health conditions rather than building models based on physical system characteristics.
A core issue encountered in making meaningful RUL estimations is to account for different kinds of uncertainties from various sources in the whole exercise such as process noise, measurement noise, and inaccurate process models [1].Particularly, uncertainties arising from complex systems make identification of their source even harder.Complex systems are formed of various interacting components in which the collective actions are difficult to deduce from those of the individual elements, predictability is limited and responses do not scale linearly [46].In this respect, among the data-driven approaches, the artificial neural networks (ANNs) have been the most used methods to prognostics, due to their capability of approximating non-linear complex functions [1].
Condition monitoring of the complex systems is usually based on multiple sensors that receive information on the system health and recognise any potential failures at an early stage so that corrective actions can be suggested in a timely manner.However, evaluation of sensor data from various components is often a challenge due to complicated interdependencies between measured data and actual system conditions [47].
As a function of operating conditions, complex systems work at superimposed operational margins at any given time instant.The wear process of such systems is not deterministic, and usually not one-dimensional [48].The multidimensional and noisy data stream is monitored from a large number of channels (such as use or environmental conditions, direct and indirect measurements which are potentially related to damage) from a population of similar components [49].Therefore, a simple analytical model is unable to present the wear phenomena and one should consider the decision-making process in multidimensional condition monitoring case [50].In such cases, a wellknown pre-processing step is to perform component-wise normalisation to provide a common scale across all the features of condition monitoring data [51].
While the importance of multidimensional data and the multiple axes of information have been recognised in the literature, there is still a gap in analysis of such data, leaving the analysts with yet more information to process through the complex systems [52].This can be attributed to an incomplete understanding of the multidimensional failure mechanisms and lack of correlation between different but similar data sources.
The review of literature in this section shows that, due to the incomplete understanding on the multidimensional failure mechanisms and lack of support between data sources, current methods lack the ability to deal effectively with complicated interdependency [47], multidimensional data [48] and noisy data [49].Further conventional preprocessing is unable to deal with this efficiently.Moreover, emerging technologies for the Internet of Things (IoT) still face some enormous challenges on data security and confidentiality [53].
Beginning with introducing a novel data filtering architecture, the framework proposed in this paper addresses these shortcomings by considering both regime standardisation methods and neural network filtering model.Among the data confidentiality measures concerned, the ones about network filtering on independent sources are particularly elaborated, including data population and characteristics, normalising different operations with regard to each other, data integrity, as well as dimensionality reduction and RUL estimation.
In the work shown here, a component-wise normalisation method is first used to standardise multidimensional data for a better understanding of the multidimensional failure mechanisms.Then, a neural network function is trained to map between a dataset of numeric inputs (multidimensional raw data) and a set of numeric targets (normalised data).The network function here is a kind of dynamic filtering, in which one can fit multidimensional mapping problems well.
These two data processing steps play an important role in modelling performance health indicators for prognosis.A similarity-based remaining useful life estimation model identifies the best matching portion of training health indicators for each test health indicator and produces future multistep predictions of system wear levels.
The method is evaluated by "final test" subset of PHM08 data challenge from NASA Prognostics Center of Excellence data repository.Results demonstrate that performance deterioration of initially trained subsets can be used to successfully predict RULs of test subsets.
The developed prognostic model is based on the fundamental notions of the prognostics such as the degradation over time, monotonic damage accumulation, detectable ageing symptoms and the correlation of these symptoms with RUL [49].With consideration of these, this work has the objectives to investigate data filtering and processing techniques as well as remaining useful life predictions.

Background and motivation
As prognostic technology advances to maturation in reallife applications, the desire for using data-driven processing methods has considerably risen.Data-driven prognostic approaches are used for modelling of the desired system output where historical condition data are available [54].However, the issue of output assignment arises when a set of trajectories with different initial wear levels and multiregime operating conditions is inserted into the same datadriven filtering model.This is the case in some common datasets such as the C-MAPSS datasets (of which the PHM08 dataset is a part) [55].
Ensuring performance and safety in complicated and safety-critical systems is a major issue, and in particular, complexity is of the most prominent problems that must be tackled to make theoretical frameworks applicable to real industrial applications [56].The main motivation of this paper is to provide an adaptive model for remaining useful life estimations on multidimensional condition monitoring data.
Multidimensional data are defined in terms of dimensions, which are organised in dimension hierarchies [57].To model the multidimensional space, let be the space of all dimensions.Operational trajectory, T, in this space is a set of time, t, operating conditions (or regimes), r, and the condition monitoring data, x. and, T, t, r, x, ∈ (1) where t follows a sequential order (and each variable of t is unique), and r can only have certain values.In other words, the unique values of the operating conditions vector, "θ(r)", are limited to the number of regimes, "np".
Considering that the sensors working at different operating conditions provide similar readings with those in the same regime, the vector, x = (x 1 , x 2 , x 3 , • • • x n ), would align in certain domains, ϑ r i , which are bounded by lower l r i and upper limits u r i .
Figure 1 shows a sample of multidimensional trajectory (the data are taken from the PHM08 dataset).The sensor readings represent various operating conditions and condition monitoring measurements which, for this dataset, falls into six cluster domains (ϑ).
To produce meaningful information from such a trajectory, Wang et al. and Peel [51,58] proposed a componentwise "multi-regime normalisation" method to standardise the sensor readings according to each other within the same domain.In these cases, the normalisation is applied into the PHM08 dataset which is formed of multiple trajectories with distinct health levels that can be found in the condition monitoring data.The health levels in these trajectories evolve with exponential characteristics [48].where d is an arbitrary point in the wear-space (the trajectories are observed with some non-zero initial wear degradation), å and b are the model parameters.In this scenario, each trajectory in the dataset starts at a distinct point and follows a characteristic "h" pattern.Considering the multiple trajectories, the entire dataset with its all components needs to be standardised together to preserve the characteristic wear levels.For the case considered in the works of Wang et al. and Peel [51,58], all trajectories were available at the same time.The "normalisation at once" has therefore not been a major issue.Nevertheless, it would be rather unlikely to find such data in a real-life scenario due to the restrictions on data proprietary considerations and confidentiality [59].In a real-world scenario, the "multi-regime normalisation" should be repeated for each novel incoming trajectory in order to calculate the changed population characteristics such as h and d.
Heimes [32] provided a "neural network filtering" approach that can be an alternative to the "multi-regime normalisation" .Unlike the regime normalisation methods, a trained neural network function can filter the necessary degradation information.ANNs are computational algorithms loosely inspired by the observed behaviour of biological neural networks of the brain, and they are compromised of data processing neurons [60].The networks form a set of interconnected functional relationships between many input series and a desired unique output where the relationship can be trained for optimal performance [11].Since an operational trajectory of a complex system is formed of multiple sensor observations, there are multiple corresponding values at each time point which can be used to train network function with multi-input series.
The basic method for supervised neural networks is to train a non-linear model using data representing the cases of interest by reducing measurable errors through a regularisation algorithm.In certain complex engineering applications, the observations from the system may not be precise, and the desired exact results may not have a direct mathematical link with the input data.An ANN is a convenient tool for modelling such a system without knowing the exact relationship between input, x, and output data series, y [61].
However, the supervised learning task of inferring an ANN function from "labelled historical condition data" carries the potential risk of failing to identify population characteristic, wear levels ("d") and "h" pattern.Without accurately specifying the initial wear levels and "h" pattern, the network filtering may be thrown off since the initial bias may dictate the sensitivity to RUL estimation.
By considering both component-wise multi-regime normalisation methods proposed by Wang et al. and Peel [51,58] and the neural network filtering model of Heimes [32], an initial unsupervised learning task that learns hidden degradation behaviour can be used.In other words, the initial wear levels can be identified as desired, and the "normalisation at once" issue can be solved by a neural network-based supervised learning data filtering method.Then, the remaining useful life of a test sample can be predicted by using the actual lifetime of similar examples so that the final prediction of RUL can be collaboratively estimated from multiple historical instances.The novelty presented in this paper is to perform the output parameters assignment (unsupervised learning) and data filtering (supervised learning) steps sequentially.
The proposed model combines the work of "multi-regime normalisation" [51,58] with the work of a neural network filtering approach ( [32]), so that normalised trajectories can form an output, y for the network filtering.The main hypothesis for this theory is that there is the possibility that, after applying normalisation for a certain number of trajectories, novel data can be filtered independently.Following this hypothesis, it is also expected that the new model will be more effective for the similarity-based remaining useful life estimations.

Experimental data
To start the acquisition of condition monitoring information for effective prognostic applications, the first major issue is to find available data sources.A common database is an important instrument for understanding the problems and developing the methodologies.If such a database includes information from several relevant operational domains, it can form the basis for data acquisition and required actions [62].Application of the common database practice also allows for the comparison of current activities and potential fields of improvement with regard to other methods in the literature [63].
One of the most applied datasets in the literature is the C-MAPSS datasets.Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) was developed by NASA to simulate realistic large commercial turbofan engines [48,64].The software is coded in Matlab and Simulink (The MathWorks, Inc.) environments with some editable input parameters that allow the developers to enter specific values of their choice regarding operational profile, environmental conditions, etc [64].
C-MAPSS code was used to carry out the simulation of PHM08 Challenge DataSet and Turbofan Engine Degradation Simulations [55].Trajectories differ from each other and were simulated under various combinations of operational conditions, regimes and fault modes.The datasets have been made publicly available for training and validation of results [55].
The features of C-MAPSS engine degradation simulations include the following characteristics that make them practical and suitable for developing prognostic algorithms on multistep ahead remaining useful life calculations [55,59].
-Each dataset contains multiple multidimensional time series representing sensor magnitudes over time and three operational settings that indicate variations of operational regimes.-The sensors are contaminated with high levels of noise that simulate the variability of parameter readings during operations.There is also very little system information and no sensor labels available to developers.-Each trajectory has a distinct degree of an initial wear level and manufacturing variation.This wear level is considered normal, and it is unknown to the users.-The fault signature is "hidden" on account of operational conditions and noise.-Datasets are divided into training and test trajectories (individual subsets).The training trajectories are implemented to build up to train remaining useful life prediction algorithms, and therefore, the instances are formed of complete run-to-failure data which can be used to feed the test trajectories that are only set up by shorter instances up to a certain time prior to adopted system failure.The main challenge for users is to predict RULs of "test" subsets by learning from "training" data.-Each dataset is from a different instance of a complex engine system.The complete dataset can be regarded as representing a fleet of an aircraft of the same type (Table 1).
The raw values in the dataset are regarded as a snapshot of the parameters taken during a single cycle, and each column corresponds to a different variable (see Table 2).The first two columns are respectively the trajectory unit

Prognostic model
The contribution of this study is to provide a multipleregime normalisation and health indicator (HI) assignment as an output to neural network function.PHM08 data exhibit multiple operational regimes that may cause prognostic models without a detailed pre-processing step to have a risk of excessive error rates.Main challenges encountered during the data pre-processing step include determining the initial wear level, filtering noise and calculating HI.The differences between the initial wear levels can occur due to service-life differences.Although they are not considered abnormal, they directly affect the analysis of useful operational life of the engines.Noise filtering is a non-trivial undertaking while assessing the true state of system's health.The identification of operability margins for the HI is a crucial step to determine safe operation regions of engines.
Data pre-processing is performed to transform raw data into an understandable format that can be consumed by an automated filtering process.Data analysis that is based on poorly partitioned data could produce misleading outcomes.Therefore, proper pre-processing of data must be given consideration before carrying out the remaining useful life calculations.
Each trajectory in the dataset includes some noise, has a specific initial wear level and is scrambled by the effects of operational conditions.The algorithm is, thereby, required to deal with unfolding these complexities.
Figure 2 shows raw sensor signals of a specific training trajectory.The sensors represent various system conditions and performance measurements.
Figure 3 shows the behaviour of a single sensor (sensor 4-column 9).It can be seen in the plot that the data are highly scattered and it is hard to perform a regression that represents the system degradation.A meaningful observation from the raw sensor is, therefore, not possible without first carrying out a transform that allows better In order to provide a HI that is useful for prognosis, a data processing approach is required that includes feature extraction, data cleaning and feature selection.The characteristic features of raw data and system conditions are extracted, and then outliers are removed.
The steps taken from new data intake to the neural network fitting and multistep ahead RUL calculations are shown in Fig. 4. The following sections describe the steps in more detail.

Data pre-processing
Based on data observations obtained from a single sensor such as on Fig. 3, it can be noticed that certain sets of data points align in similar regions.When one looks only at data from a specific region (say those between y-values of 1110 to 1150 as seen on Fig. 5), it becomes apparent that readings align in a domain (ϑ r i ) as suggested in Eq. 4. Raw sensor values in this range could be regarded as coming from a certain operational regime.

Regime identification and clustering
The first step of data processing is to identify these operational regimes in all trajectories.The number of regimes can be found by finding the number of clusters in the operational settings.For the PHM08 challenge dataset, multiple regime clustering was carried out via various methods, such as k-means [65], Gaussian mixture models [66], nearestneighbour clustering [67], Fuzzy c-means [68] and neural network clustering [69].
Further observations reveal that the three operational settings (altitude, mach number and sea-level temperature as given in Table 2) are concentrated in six different clusters, pointing out six operating regimes including many data points at each sensor parameter as such in Fig. 3. To identify the regimes in sensor parameters that are highly scattered due to the operational settings, a clustering analysis is needed to group data points in such a way that the operational setting variables in the same group are more similar (with regard to their maximum and minimum values) to each other than to those in other groups.Operating condition 3 has constant values that are correlated with different regimes (see Table 3).This paper utilises that correlation to assign the different operating regimes.
The mathematical expression for clustering is given by; f (x p ) = arg k=1,...R find(x = k) (10) where arg k=1,...R find(x = k) defines the index of the occurrence of regimes, k, in the data, x, and R denotes the number of the unique values in operating condition 3.An illustrative sample of the sensors after regime assignment is given in Fig. 6.Comparing to Fig. 5 in which the plot's axis is limited by a certain data range, all six regimes are grouped successfully, and if a correlation is carried out to cluster and standardise the data into a common regime, sensors can provide more meaningful information on system degradation.

Normalisation and re-assembling
A normalisation method can carry out adjustments by returning raw values into a common scale (for example the z-score).The z-score is a dimensionless quantity that is the result of subtracting the population mean of each regime from each individual raw sensor value and then dividing the difference by the population standard deviation.
Computing a z-score for each regime requires knowing the mean and standard deviation of the regime population of each sensor to which a data point belongs.The equation to calculate the standard score of a raw value is given as; where, x is the raw data values for regime r, σ is the population standard deviation and μ is the population mean or that feature.Since the data are made up of n scalar observations, the population standard deviation is: and the population mean is: A key point in multiple regime normalisation is that calculating z requires each regime's population mean and deviation over all entire time series.This will baseline the entire dataset at once.This process is applied to all regimes and to all sensors separately, the standardised sensors are reassembled to form the normalised dataset.Figure 7 shows all standardised sensors in a training trajectory.

Sensor selection
Next, it is required to evaluate how well the sensors correlate with the degradation pattern.Signals which do not adequately correlate with a monotonic exponential trend will be removed [58].To that end, the three prognostic parameter choosing measures of monotonicity, prognosability and trendability are used to identify the meaningful sensors [70].
Monotonicity is a straightforward measure to characterise the underlying positive or negative trend of the sensors.It is defined by: where n is the number of training trajectories in a particular history.The monotonicity of a sensor population is calculated by the average difference of the fraction of positive and negative derivatives for each path.A monotonicity measure outcome close to 1 indicates that the sensor is monotonic and useful for RUL estimation, whereas an outcome close to 0 indicates that the sensor is a non-monotonic signal and not suitable for further consideration.
Prognosability is calculated as the deviation of the failure points for each path divided by the average variation of the sensor during its entire lifetime.This measure is exponentially weighted to provide the desired 0 to 1 scale: The prognosability measures close to 1 indicate that the failure thresholds are similar and the sensors are available for prognosis, whereas the measures close to 0 show that the failure points are different than each other and the sensors are incapable for the prognostic calculations.
Trendability is given by the minimum absolute correlation computed among all the training trajectories.The mathematical expression is represented by: Results of trendability, monotonicity and prognosability parameter features are used to compare whether the candidate sensors are useful for individual-based prognosis.By quantifying these results for given results, the parameters can be used along with any traditional optimisation methods [70].By defining a fitness function as a sum of these threeprognostic parameter choosing metrics, the sensors can be compared to determine the most suitable ones.It has been observed that the sensors can provide useful degradation information when each measurement is close to "1".With respect to the results of prognostic parameter choosing measures in Table 4, the ten sensors 2, 3, 4, 7, 11, 12, 15, 17, 20 and 21 are accepted as useful for the calculation of HI and applied in the following sections.
Figure 8 shows these normalised useful sensors to be used in further prognosis applications.

Health indicator assessment
To produce a single adjusted cycle (s), multiple readings from useful sensors ( x ) are aggregated by taking the mean of all sensors at each time step (i).
x ji (17) where n is the number of multiple readings from useful sensors.These aggregated time series contain noise, and there is a risk that neural network might learn the noise during the training stage.Therefore, a regression model is used to describe the relationship between the adjusted cycle index (s, the aggregated variable of Eq. 17 However, if a fitted HI from the noisy-adjusted cycle is used in training, it will be possible to standardise the network estimations for each novel estimation.
To preserve the original degradation pattern, the following two-term power series fitting model is used to identify a standard HI.
where an approximation to a power-law distribution s b includes two fitting terms ȃ and c.The main reason for selection of this fitting approach is that the fitted HIs, y, for training trajectories have only increasing values and early stages are behaving in such a way that the fitting has minimum wear increase levels.Observing from the adjusted cycles that the failure occurs at a certain stage of operations and the system before this point is relatively stable, the two-term power series model can describe the hypothetical degradation model effectively.It is noticed that while the failure starts and ends at particular time points at each trajectory differs, the trajectories fail at similar wear levels.Since this research estimates the RULs by a similaritybased prediction method, the length of training trajectory time series after the most similar location is accepted as the RUL instead of identifying a threshold point and the HI's crossover.

Neural network training
HI identification can be applied to the entire dataset, and both training and test trajectories can be normalised and filtered according to each other for remaining useful life predictions.However, the regime identification and clustering steps might be problematic when the number of regimes increases or the identification sensors behave differently.Additionally, due to the nature of z-score normalisation, all testing and training trajectories should be normalised at once in order to adjust initial wear and threshold points.To avoid these issues, a neural network fitting model is designed as an alternative method to map between raw training inputs and an HI output.
The network is a feed-forward model which takes a set of input vectors (raw data), and then arranges another set of output vectors (HI) as a target data.Neurons, which are the building blocks of the neural network, evaluate these input state variables.The nodes in the hidden layer are performed by the following function.
where the state variables (x i ) are multiplied by fixed realvalued weights (w i ) and bias b is added.The neuron's activation z is obtained as a result of the nodes and the nonlinear activation function of neurons f x [71,72].
For the network structure shown in Fig. 10, the general network equation can be denoted by The network used for fitting function is a two-layer feedforward network, with a sigmoid transfer function (f h ) in the hidden layer and a linear transfer function in the output layer (f o ).The default number of hidden neurons is not set to a certain number.Instead, an outer loop is designed with an increasing number of hidden neurons.The optimum hidden layer size is accepted according to the mean squared difference between the initial output series and what is estimated.
The first step to train the network structure is the configuration of input and output series.Inputs are configured Fig. 10 Neural network learning stage as "10 × time" cell of useful raw sensor data while the target series are configured as a 1 × time cell of the HI calculated in the previous section.In order to increase the accuracy of configuration between trained outputs and targets, multiple networks are designed in a double-loop design over the increasing number of hidden nodes of an outer loop and random weight initialisations of an inner loop [73].Both loops terminate when minimum desired error occurs and mean squared error is used to reduce error numbers in training.At the end of both loops, the least erroneous network is used to choose the best design mode for validation.
Neural networks with multiple nonlinear hidden layers are effective models that can contain a large number of parameters, but both overfitting and computational overhead might lead to poor network training, especially when data are presented to the predictions of numerous different large neural networks at testing stages [74].These network training could memorise the samples for the specific training case, but there is always a risk that they cannot be trained properly to generalise the upcoming testing cases.That is why artificial neural networks (similar to other artificial intelligent machine learning models) are prone to overfitting the training data [75].
In order to avoid such training problems, this works applies Bayesian regularisation method into the network training [76,77].Other well-known alternatives to the Bayesian method are the "Levenberg-Marquardt" and "scaled conjugate gradient" algorithms [78].When the Levenberg-Marquardt model is used to avoid overfitting, the network training needs more memory but less time and it automatically stops when generalisation concludes improving, as specified by an increase in the mean squared errors.In the case of the "scaled conjugate gradient", the regularisation requires less memory and it is available when the training data is short.Bayesian regularisation requires more time in comparison to other methods due to the adaptive weight minimisation but can result in satisfactory generalisations from difficult or noisy datasets [79].
The training function in the Bayesian model is based on Gauss-Newton approximation to the Hessian matrix.It updates the weight and bias values according to Levenberg-Marquardt optimisation [76].This function diminishes a compound of squared errors and weights in pursuance of reducing the computational overhead and overfitting in training, and then defines the correct compound so as to produce a convenient network generalisation [77].Although this definition of Bayesian regularised neural network can take longer to train, it obtains a better solution compared to other methods.
To validate that the trained neural network accuracy is sufficient, the same training raw data are inserted into the obtained network function.The validation resulted in a similar pattern, but it is contaminated with noise as expected (Fig. 11).In this case, the response is matching with the original HI used in network training, and the network can now be put to use on new inputs.The analysis of the network response is performed with a test trajectory (Fig. 12).While the length and initial wear level are different, the model could accurately identify HI curve of the inserted data.

Network library
Although the network function trained with certain data provides desired results, there is always a risk that if the network is trained with other input and output values, the estimations might result in undesired estimations due to different weight and bias values.To that end, alternative neural networks, f NN , with inputs and outputs from different trajectories are trained on similar problems and each trained function is stored in a network library, L.
where nl is the number of trained functions in the library.When all novel trajectories have been inserted into that library, there will be multiple estimations for a single-input trajectory.Figure 13 demonstrates such a set of network library outputs that result in similar exponential growth patterns.The network library is formed of multiple network functions that have been trained by different data.When the same input data, x, is applied to these functions, they result in similar outputs, y NN  1:nl , no matter which trajectory is used in their training.
Each trajectory in Fig. 13 is an estimated output of the same raw training input resulting from a trained network function in the library.The consistency between different estimations shows that ANN functions trained with different  Considering that the library results in multiple estimations for a single input, a final HI estimation similar to Section 4.1.4is applied for dimensionality reduction.For further stages, referring to Eq. 17, the moving average of all these alternative network results is applied to filter a final HI. y NN jq (23) where the random sequence s r is the mean of p-moving average of nl number of HIs.As the window size of pmoving average is specified as a numeric duration scalar for next steps, the average contains the parameters in the current location plus upcoming neighbours.When the window is expanded prior to the s r , the centred average equation can be expressed as: y NN jq (24) Directional window is a temporary duration matrix, and its size is formed of two distinct elements, the positive integer scalars of moving average windows size "p" and the number of library estimations "n".This moving average helps to smooth out HI action by filtering out the "noise" from the network library results.However, the characteristic starting and failure points are of critical importance to the defining HI feature and they are required to be included in the filtered outputs.Regarding the prior and posterior values as well as the sequence itself, the exact windows size calculation is over (2p + 1) × n elements.
When initial starting and the final ending points are concerned, the modified moving average method for a full matrix of library estimations is calculated by; where l is the length of the estimations.Short dimensions to operate along the matrix are specified as a positive integer scalar for starting points and an integer scalar smaller than the length of matrix for the ending points.The rest have the full dimensions that moving average operates along the direction of the defined window slide.Thus, the moving average method could be applied to starting and ending points of the time series.

Similarity-based distance evaluation
Once different neural networks are trained to form a generalisation of the input-output relationship, the network library is used to estimate all HIs of training and test trajectories in the same dataset.The goal of supervised similarity learning is to learn from samples a similarity model that measures how similar or related two trajectories are.This method is closely related to distance metric learning in which the task of learning is a distance function over training and test trajectories.In practice, the similarity learning ignore the condition of identity of indiscernibles and learn the best fitting objects.
RUL estimation of test trajectories is used as a valid case to compare the test case against the full operational periods.
The difference between the two trajectories is computed as a similarity measure.
This similarity can be expressed as a distance between two vectors and is calculated as: where te is the test trajectory, tr is the corresponding part and n is the length of test trajectory.In Fig. 14, pairwise distance between two sets of observations is initially calculated at time step "1".However, the best-matching training units can be in the later part of the curve.The testing curve is, therefore, moved step by step to the end of the base curve.The pairwise distance is calculated and stored for each step by the following equation; where n tr is the length of the base curve (training trajectory) and n te is the length of test trajectory.
After the testing curve is moved to the end of a baseline case, the location of the best-matching units is calculated by identifying the minimum pairwise distance value.
Mn (te,tr) = min d (te,tr) (1) , d (te,tr) (2) , • • • , d (te,tr) n tr −n te (28) The location of the best-matching part at the training baseline, Lc te,tr , is calculated by: Lc te,tr = arg find d (te,tr) (j ) = Mn (te,tr) ; (29) The prior parts before this location are removed and the followings are used for further estimation as seen on Fig. 15.
The length of time series after the calculated location of the best-matching part, L te,tr , equals to the remaining useful life of the test trajectory.As a single calculation with a single baseline can be expanded to all training trajectories, the calculations with more training data samples could increase prediction accuracy in multistep ahead estimations.The final RUL is estimated by the mean value of corresponding RUL calculations of the minimum ten distance values of all baselines.A demonstration of these best-matching trajectory parts is shown in Fig. 15.

Results
There have been several thousand downloads of C-MAPSS dataset files since their first release in 2008, and as of 2014, more than 70 papers referring to these datasets were found in the published literature [59].
The presented algorithm was applied to all trajectories in PHM08 data challenge and RUL was calculated for each test subset.The results were evaluated by the prognostic metrics shown in Table 5.Several metrics are used here to allow for comparison of findings on several different levels.These metrics aim principally at performance validations for prognostics applications.Since they are mostly focused on applications where run-to-failure data are available, their usage has a particular importance for the model development stage where metric feedback could be used to integrate the prognostic procedures.Table 7 shows the comprehensive set of metrics as computed from PHM08 Challenge leader board of final test set [59].Ranking is established based on PHM scoring function, which is a weighted sum of RUL errors [82,83].The following equations describe the scoring function.
for e < 0 for e ≥ 0 (32) where "Sc" is the computed score, n pr is the number of predicted units and e is the error term.The scoring function is an asymmetric metric that penalises late predictions more than the early predictions, a 1 = 13 and a 2 = 10, due to the fact that late predictions imply more costly failure consequences.There is no upper penalty limit for predictions, and thereby, a single high error rate among the predictions in the dataset can dominate the final score.It is particularly risky for test units with a short history in which both early and late predictions are highly possible.In order to avoid such a risk, the maximum threshold value for RUL predictions is capped at 135.This value was gleaned from the training data (i.e.none exceeds 135) and assumes that future unseen trajectories will adhere to that same characteristic.Also, it is observed that the trajectories shorter than 50-time cycles are prone to result in excessive early or late predictions.Considering the asymmetric characteristic of the scoring function, the range of the short trajectory RUL predictions is directly adjusted between 100 and 130.
The algorithm explained in this paper is applied to the validation dataset which contains 435 test samples and the results were sent to PCoE for performance validation.The model achieved a total score of 5530.12, which is the overall leading score at this time (see Tables 6 and 7).
The distinguishing difference between false positive (FP) and false negative (FN) rates deserves a special consideration for final test validation.Their percentage can be compiled to measure the consistency and reliability of developed models.As seen in Table 7, the positive rates in the leader board are predominantly higher than the negative rates.This might be a direct result of a biased merging of multiple RUL estimates through a weighted calculation due to the high penalty scores for late predictions.However, the early predictions might also dominate the final score in a real case scenario and might result in catastrophic failures.This work did not apply any weighting method between early and late predictions.Accordingly, the calculated rates are very close to each and the developed model could be regarded consistent and reliable in terms of FP and FN rates when comparing the results in Table 7.

Conclusion and future work
The algorithm developed in this work has been effective for RUL calculations of C-MAPPS datasets.A data-driven filtering approach can successfully assign HI targets for similarity-based remaining useful life calculation.Since the multiple regime normalisation is applied by the trained neural network, there is no need for the standardisation of the entire dataset at once.Thereby, both training and test trajectories could be processed individually by considering their initial wear levels.
During the course of model development, it was observed that a mathematically defined synthetic output vector also performed satisfactorily for neural network training.Due to the constraints of the scoring metrics (one-time limited submission and variance in initial wear levels), this work used a step by step HI (output) data processing.
A smart selection system produced a library of synthetic output models and the best-matching degradation model was selected for the raw test data.For future works, we envision to explore for the developed network model with automated output selection and normalisation for various failure mode situations.

Fig. 1
Fig. 1 Multidimensional data representation of a single trajectory

Fig. 4 Fig. 5
Fig. 4 Flowchart representing data processing and multistep ahead remaining useful life calculation

Table 1
Characterstics of PHM08 challenge dataset

Table 2
[48]8 challenge dataset parameters available to participants as sensor data[48]The challenge is still open for researchers to develop and compare their algorithm performance against the winners of the challenge.Users can train their algorithms using training data, and then evaluate the RUL prediction performance on test trajectories.There are two different test subsets in PHM08 dataset, namely test and final test.A webbased application is available to upload the test results that calculate an aggregate score feedback.Once algorithms are trained to satisfaction, users can apply their methods to the final test dataset and send the vector of predicted RULs to the Prognostics Center of Excellence for evaluation.A score can only be submitted once for the final test.

Table 4
Results of prognostic parameter suitability metrics

Table 5
Prognostic metrics

Table 6
Performance of the developed model