1 Introduction

In recent years, machinery systems are becoming ever more complex and require new expertise in terms of safety and performance. A complex system is composed of many components that are interrelated and interacting elements forming a complex whole [1]. Condition monitoring of such advanced systems can be used to recognise any potential problems at an early stage and therefore reduce the risk of failures. However, the data are monitored from various sensors of components and their interpretation is often challenging because of the complicated inter-dependencies between monitored information and actual system conditions [2]. Moreover, the operational conditions and regimes force some systems to operate under dynamic operational margins. The health degradation of such systems is not always deterministic, and commonly multi-dimensional [3]. The data streams are monitored from various channels [4] and advanced decision-making processes are required to model the multidimensional health degradation phenomena [5].

The challenges brought up by the complexity of real-world applications should be considered during the development and test of a new prognostic model [6]. The literature has recognised the importance of complexity by proposing different algorithms for the multiple axes of information and multidimensional data. However, there is still a lack of information and analysis in such techniques, leaving the field with yet more requirements for further research, particularly on the common issues of incomplete understanding of multi-dimensional mechanisms failure modes [7].

2 Background and Prognostic Definitions

Assuming a system is in proper working order before its use and no maintenance is necessary, it starts to operate with a specific initial health level that is mostly stable during the early periods of operations. This stability continues until a critical stage where an early incipient distortion occurs, and then the risk of failure grows with time. When a prognostic algorithm is developed, the main goal is to estimate this failure time, at which the system cannot operate under desired conditions [8]. This estimation is a statement about an uncertain event in the future but it is based on the fundamental notions of systems’ deterioration, monotonic damage accumulation, pre-detectable ageing symptoms and their correlation with a model of system degradation [4]. With regard to these notions, a prognostic framework detects, diagnoses, and analyses system deterioration with the goal of estimation of the remaining useful life (RUL) before the failure. The accuracy of RUL estimations is a key notion in condition based maintenance strategies, and it also has a critical value to improving safety, reliability, mission scheduling and planning and lowering costs and down time [9]. The deviation between actual time-to-failure (ATTF) and the estimated time-to-failure (ETTF) is of critical importance to this accuracy.

ATTF, or commonly known as True Remaining Useful Life, is an unknown future variable that can only be known after occurrence of failure [10]. ETTF, on the other hand, is the amount of time from the current moment to the systems’s functional failure [11]. As defined by the industry standard ISO-13381 [12], ETTF along with the risk of failure modes is the basic definition of prognostics. Therefore, it is generally the principal centre of prognostic studies and the estimation of RUL [13]. Accordingly, it is defined as [14]:

$$\begin{aligned} X-t|X>t,Z(t), \end{aligned}$$

where X is the RUL variable and Z is the preceding condition profile until current age (t). The time units used in the measurement are related with the nature of operations such as cycles in commercial aircrafts, hours of operation in jet engines, kilometres or miles in automobiles [4].

When this conditional variable corresponds to ETTF, it is defined as:

$$\begin{aligned} E\left[ X-t|X>t,Z(t) \right] \end{aligned}$$
Fig. 1
figure 1

Typical system deterioration model

Figure 1 shows a typical system deterioration model in which a healthy system with a certain level of initial health starts to degrade after an initial problem in the system, and eventually reaches a critical functional failure state [7, 15, 16]. The point of potential failure starts an exponential change of state from the stable zone with minimum deterioration to a functional failure of the system. The health degradation can be found by the condition-monitoring readings that exceed the alarm limit of potential failure. The act of prognosing the course of degradation is between the initial detection and the failure conditions [17] and follows sequential major processes of “data pre-processing”, “existing failure mode”, “future failure mode” and “post-action prognostics” [18].

In the data pre-processing stage, the system information is first received from sensors and then the features indicating fault conditions are defined. The existing failure mode applies the failure mode analysis to determine the failure effects of these features for feature extraction and fault classification. In the future failure mode, multi-step ahead estimations are performed using the fault evolution and predict RUL of the system. Then, the post-action prognostics proposes the maintenance actions that need to be done after RUL estimation.

Fig. 2
figure 2

Multidimensional condition monitoring data

One of the most significant issues of prognostics is utilizing the data which is versatile and falls into various categories. These include value type of single value collected at a specific epoch, waveform type of condition monitoring variable time series and multidimension type of time series from multiple operating conditions [19].

When one considers the computational prognostic algorithms under dynamic regimes, special attention is given to the multidimension type data due to the operational differences between the regimes [20]. In these complex cases, the systems are formed by various interacting components and the common system behavior cannot be easily deduced from individual elements so that the predictability of system is limited and the responses do not scale linearly [21].

In Fig. 2, a randomly generated sample of multidimensional trajectory is shown. The data is represented as a 3-tuple of “ regimes (\(\mathbf {D_b}\)) with the number of M”, “regime levels (\(\mathbf {L_b}\))” and “a single sensor measurement (\(x_{1:r}\))”. Such multidimensional condition monitoring data is always in need of extensive signal processing to produce meaningful information.

A multidimensional data set is formed by hierarchies of dimension levels [22]. To simulate these hierarchies, let \(\mathbf {\Omega }\) be the space of all dimensions and \(\mathbf {\Psi }\) be the space of all dimension levels. For each regime level (dimension) D, there is a set of regime values belonging to it. With regard to these, a basic multidimensional data set of \(\mathbf {C_b}\), can be represented as a 3-tuple [22]:

$$\begin{aligned} \mathbf {C_b} = <\mathbf {D_b, L_b, R_b} > \end{aligned}$$


$$\begin{aligned} \mathbf {D_b}=<D_1,D_2,\ldots D_n, M> \end{aligned}$$

is a list of regimes \((D_i,M \in \mathbf {\Omega })\) and M is a dimension that represents the unique values in \(\mathbf {D_b}\) and also the measure of \(C _b\).

$$\begin{aligned} {\mathbf{L}}_{\mathbf{b}}=<DL_{b1},DL_{b2},\ldots DL_{bn}, ^{*}ML> \end{aligned}$$

is a list of regime levels and boundaries \((DL_{bi},^{*}ML \in \mathbf {\Psi })\). ML is the multi-valued dimension and boundaries level of the measure of \(C _b\). As the M can only have certain regime values, ML can only have the same number of dimension levels.

\(\mathbf {R_b}\) is a set of condition monitoring data and it is formed of multiple regimes and regime levels according to M and ML.

$$\begin{aligned} x=[x_1,x_2,\ldots ,x_{n}] \end{aligned}$$

In order to estimate the system failure time (\(t_b\)) in this multi dimensional cell data, a function of health measure (H) needs to be evaluated [23].

$$\begin{aligned} H=\frac{t }{t _b}, \quad 0\le H <1 \end{aligned}$$

where t is the current time and \(t_b\) the functional failure time. For a single monitored sensor channel with r number of monitored sensor points, the expression can be redefined as:

$$\begin{aligned} R_n(t )= \,& {} R_n\left( \frac{t }{t_b} \cdot \theta \right) = R_n(H \cdot t ), \end{aligned}$$
$$\begin{aligned} n= \,& {} 1,2,\ldots ,r \end{aligned}$$

where the monitored observations are over some life distance, \(\varDelta \theta\) and the number of signals (p) are multiple. Therefore, the values of observations will substantially change [23].

$$\begin{aligned} R_n(t_m)= \,&R_n(m\varDelta t)=R_{mn}, \end{aligned}$$
$$\begin{aligned} n= \,&1,2,\ldots ,r, \quad m=1,2,\ldots ,p \end{aligned}$$

These multiple multivariate raw sensor readings (\(R_{mn}\)) also contains noise. Referring to a single sensor readings as in equation 8, the condition monitoring data with the noise and/or the measurement uncertainties \(\varepsilon\) is defined as:

$$\begin{aligned} R_n(t )=R_n(H \cdot t )+\varepsilon (t ), \quad 0< t \le {t_b} \end{aligned}$$

where \(\varepsilon\) is assumed to be Gaussian white noise (in analogy to white light having uniform emissions at all frequencies) and its distribution is identical, statistically independent and uncorrelated [24,25,26]. The probability density function of (\(\varepsilon\)) is given by:

$$\begin{aligned} p_{G}(\varepsilon ) = \frac{1}{\sigma \sqrt{2\pi }}e^{-\frac{(\varepsilon -\mu )^2}{2\sigma ^2}} \end{aligned}$$

where \(\mu\) and \(\sigma\) represent the mean value and the standard deviation in the normal distribution. [27].

3 Review of Prognostic Methods

As discussed in the introduction section, RUL prediction methods need further analysis, and therefore the categorization of prognostic techniques, particularly around the issue of multidimensionality, proved to be challenging. The increasing complexity of systems will prompt prognostic tools and methods willing to adopt evolving technologies, ultimately resulting in generation of more accurate estimates in different domains. With regard to to the way that they participate in estimation, the prognostic methods can be defined into three main categories of physics-based, knowledge-based and data-driven approaches [7, 9, 28, 29].

Fig. 3
figure 3

Classification of prognostic models. Shown here is a list of the computational models used by common prognostic applications

Figure 3 shows the different prognostic categories and their subtypes. The purpose of such categorization is to ensure that the discipline is established in a manner which is consistent with the operational and system-related objectives.

3.1 Physics-Based Models (PbM)

A typical PbM attempt to describe the evolution of deterioration based on a comprehensive mathematical model defining the physics-of-failure and degradation of system performance [28]. In such a model, the interactions between the system and the failure mechanism are formulated by using a combination of fault growth equations and knowledge of the principles of damage mechanics relevant to the monitored system. Crack growth modelling has been a common PbM approach in which it is assumed that an accurate mathematical model can provide knowledge for prognostic outputs. The Paris and Erdogan Law [30] is used in several crack growth modelling applications [24, 31,32,33]. The model relates the stress intensity factor range to crack growth under a fatigue stress regime.

$$\begin{aligned} \frac{da}{dN}=C\varDelta K^m \end{aligned}$$

where a is the crack length, C and m are constants of material characteristics, environment and stress ratio, \(\varDelta K\) is the range of stress intensity factor during the fatigue cycle and \(\frac{da}{dN}\) is the crack growth rate.

Other well-known PbM appraches also use Forman Law [34], Yu-Harris life equation for fatigue spall initiation and the Kotzalas-Harris model for failure progression estimation [35,36,37] and finite element analysis [38]. These applications have been applied in multiple domains, despite the wide variation in individual elements over the domain. The main challenge to apply PbM in “complex systems under dynamic regimes” is that these methods are mostly suitable for specific purposes and components rather than the complexity as a whole [39]. Due to their nature of being component specific, these prognostic models are hard to apply to alternative domains and most PbM are at component or subsystem level [40]. In the case of complex systems and multidimensional data, it is computationally expensive to describe the behavior of the system with different equations and identification of failure modes under different operating regimes requires extensive experimentation which is often built on a case-by-case basis [41].

3.2 Knowledge-Based Models (KbM)

KbM prognostics predict RUL from historical events by evaluating the similarity between a monitored case and a library of previously known failures [42]. When there is not an accurate mathematical model, KbM methods without requiring any physical modelling or prior principles are functional applications where the historical failure information of systems are available [43]. The common methods in KbM mimic human-like representation and reasoning such as “expert systems” [44, 45] and “fuzzy logic” [46,47,48].

On the other hand, the similarity-based prognostics, an alternative KbM method, can overcome the mimicking difficulties by removing the requirements to model the experience of a domain expert. Due to a potential relation to this removal, this method is occasionally classified under data-driven models by several studies [41, 49, 50]. However, it follows the certain KbM characteristics of similarity evaluation between monitored cases and the use of case library.

The common similarity evaluation methods use a set of complete run-to-failure operational patterns for RUL estimation of incomplete (test) degradation patterns [6, 26, 51,52,53]. The pairwise distance between complete (q) and incomplete patterns (p) is used to find the best matching units which is then used as the basis for RUL estimation.

$$\begin{aligned} distance_{i}=\sqrt{ \sum _{i=1}^l (q_i-p_i)^2} \end{aligned}$$

This method can also be applied into RUL estimation of complex systems but they initially require systematized dimensionality reduction and data-processing methods. Therefore, they are commonly used with data-driven approaches.

Figure 4 shows a sample of similarity based prognostics. The training data before the minimum distance locations are removed, and the rest of trajectories are accepted as the RULs.

Fig. 4
figure 4

Similarity based RUL estimation—single dimension

3.3 Data-Driven Models

In data-driven prognostics, the condition-monitoring data received from system indicators are regularly processed and analyzed. Unlike the PbM and KbM, they use systems’ own condition monitoring data rather than mathematical models or human expert models. A typical data-driven example includes the determination of precursors of failure and estimating RUL by considering historical records and estimation outputs from monitoring data [43]. These models have been found to be more effective in many operational cases and particularly in complex systems due to their simplicity in data handling and consistency in complex processes [39].

Data-driven approaches range from conventional stochastic and statistical methods to advanced black-box and deep learning methods. When one considers an ideal data-driven prognostic framework, one or more of these learning techniques should be included in the model.

3.3.1 Stochastic Algorithms

Stochastic data-driven prognostics are commonly Bayesian-based approaches and estimate the state of a process by a minimum prediction covariance recieved from measurements. They result in a probability distribution of RUL rather than precise estimations, and can estimate both current and future states of nonlinear systems as well as the RUL by tracking the degradation growth before the failure threshold [54].

In Bayes’ theorem, the probability of an event (P(A)) is defined by prior knowledge related with that event. This forms a reference point for updating estimations with due consideration of relevant evidence [55].

$$\begin{aligned} P (A|B)=\frac{P (B|A)P (A)}{P (B)} \end{aligned}$$

where \(A\) and \(B\) are two different observable events and the probability of \(B\), \(P(B) \ne 0\) .

If there is available condition monitoring data, a Bayesian network using the above mentioned theorem can model the degradation growth over time for prognostic forecasting. The most common types of these networks are Particle filters [54, 56,57,58,59,60], Kalman filters [61,62,63] and hidden Markov models [64,65,66,67,68].

In particle filtering , the Bayesian theorem is used to approximate the conditional state probability distribution using a swarm of points samples (particles) that have potential information of unknown parameters. This approximation is based on a state transition function \(f_s\) and a measurement function \(f_m\).

$$\begin{aligned} {x}_t= \,&f_{s}({x}_{t-1},{p}_t,\dot{\varepsilon }_t) \end{aligned}$$
$$\begin{aligned} {z}_t= \,&f_m({x}_t,{\varepsilon }_t) \end{aligned}$$

where \(\dot{\varepsilon }\) and \({\varepsilon }\) are process and measurement noise, x is the damage state, z is measurement data, p is model parameters and t is time [57].

With regard the Bayesian theorem, this approximation can be expressed as:

$$\begin{aligned} P ({x}_t|{z}_{(1:t)}) \end{aligned}$$

and can be recursively done in estimation and update stages of particle filtering where the computations are carried out by Monte Carlo sampling.

When “\(f_s\) ” is accurately defined to a damage model representing system dynamics, the particle filtering can be applied into nonlinear systems. Kalman filtering, on the other hand, is a better choice in a linear system due to its lower computational requirements. The state and measurement equations reduce to the following forms [69].

$$\begin{aligned} {x}_t= \,&{{\varvec{Fs}}} {x}_{t-1},{p}_t,\dot{\varepsilon }_t\end{aligned}$$
$$\begin{aligned} {z}_t= \,&{{\varvec{Fm}}} {x}_t,{\varepsilon }_t \end{aligned}$$

where \(\dot{\varepsilon }\) and where Fs and Fm are respectively the state evolution and measurement matrices. They are assumed to be known and, with them, the Kalman filter model estimates the state of a process and minimizes estimation covariance by adding the measurement related to the state.

Hidden Markov Models are simpler Bayesian networks comparing to particle and Kalman filtering methods. They assume that an observation at “t” is generated by some process whose state \(P ({x})\) is hidden from the observer and the state of this hidden process satisfies the Markov property in which future behavior is predicted from the current or present behavior [70].

The Bayesian network models require accurate degradation modeling for prognosis and have difficulties with multidimensional data. The regime of a damage state at any given time instant may not match with the upcoming regimes and conditions. Therefore, a prior data processing stage is necessary to apply these models. Also, there might be difficulties in describing the damage behavior of multiple sensor data which might have different fault types and states.

3.3.2 Statistical Algorithms

In statistical data-driven models, the results are precise estimations rather than a probability distribution, and the damage progression is based on condition monitoring data. The common prediction samples of this type are the relatively simpler trend extrapolation methods in which the health degradation is associated with a single dimensional time series which is assumed to follow a monotonic trend [71,72,73].

Figure 5 shows various extrapolation methods of curve fitting. Single degradation parameters are plotted as a function of time and a threshold level is predefined to decide the failure point. Although the estimation differences in these models may not be significant for mature samples which are close to failure, this will not be be the case for samples in their early lifetime.

Fig. 5
figure 5

Extrapolation methods for RUL estimation—single dimension

Fig. 6
figure 6

Standartization of multidimensional data

Autoregressive moving average models are also widely used time series forecasting methods [74]. They fit the data to a parametric time series model and extract features based on this. The model first fit the curve in moving average part, and then adds it to the autoregressive model output to estimate the future behaviour [41]. Considering a time series \(x=1:t\), the model with autoregressive model of order P and the moving average model of order Q is expressed as:

$$\begin{aligned} {x}_{t}=c+\varepsilon _t+\sum _{i=1}^{P}\phi _{i}{x}_{t-i}+\sum _{i=1}^{Q}\Theta _{i}\varepsilon _{t-i}, \end{aligned}$$

where \(\phi\) and \(\Theta\) are respectively auto-regressive and moving average terms [75]. Such a method is generally suitable for short-term predictions but it does not provide reliable long term estimations due to its sensitivity to initial conditions and the systematic errors in the predictor [74].

Dimensionality Reduction:

All the time series prediction and life estimation models introduced so far show that they lack the ability to deal with multidimensional failure mechanisms and further data processing methods are required before RUL estimation. The multidimensional time series are inconsistent with each other and operate under different regimes and conditions. Therefore, they need a feature extraction transformation from high-dimensional (multi regime) space to a space of single health level dimension (single regime) [53]. Such dimensionality reduction will standardize the time series from their actual regime domains to a notionally common domain that will provide meaningful information for prognosis.

Figure 6 illustrates a dimensionality reduction process for a single sensor. The data is standardized and transformed in a form that there is only one valid regime for all sensors.

The “regression” models are one of the common models that can effectively used for both feature extraction and dimension reduction. In these applications, the multi-regime data is mapped into a lower-dimensional space where all measurement findings are maximized. In similar complex system domains, such models have been applied for standardization of the multi-regime data into a single space for further time series prediction and estimation models [52, 53, 76, 77].

A typical sample of these category, the multiple linear regression, is based on the calculation of the relationship between multiple explanatory inputs and a predefined target. The model fits a linear equation to monitored data and based on the following formula [78, 79].

$$\begin{aligned} y=x\beta +\epsilon \end{aligned}$$

where y is the target vector, x is the input and \(\beta\) is the coefficient estimates. When one considers n number of time series, the equation is expressed as:

$$\begin{aligned} y=\beta _1 x_1+\beta _2 x_2 +\beta _3 x_3 + \cdots +\beta _n x_n \end{aligned}$$

As the complex models include multivariate time series and multiple signals, the formula is modified by considering x as a matrix rather than a vector.

$$\begin{aligned}&y_i=\beta _1 x_{i,1}+\beta _2 x_{i,2}+\beta _3 x_{i,3}+ \cdots +\beta _p x_{i,n}, \end{aligned}$$
$$\begin{aligned}&for \quad i=1,2,\ldots ,n \end{aligned}$$
$$\begin{aligned}&for \quad j=1,2,\ldots ,p \end{aligned}$$
$$\begin{aligned}&y=\begin{bmatrix} y_1\\ y_2\\ \cdots \\ y_n \end{bmatrix} x=\begin{bmatrix} x_{1,1}&x_{1,2}&\cdots&x_{1,p}\\ x_{2,1}&x_{2,2}&\cdots&x_{2,p}\\ \vdots&\vdots&\vdots \\ x_{n,1}&x_{n,2}&\cdots&x_{n,p} \end{bmatrix} \beta =\begin{bmatrix} \beta _{1}\\ \beta _{2}\\ \vdots \\ \beta _{p} \end{bmatrix} \end{aligned}$$

Considering that a data set is formed of multiple operating conditions and fault modes with unknown characteristics, the above-mentioned “regression” model can perform the feature extraction and standardization by applying the calculated coefficients, \(\beta\), into similar matrix of observed instances [52]. However, this is a supervised model that requires a pre-defined target vector to calculate the coefficients. As the characteristic features of each instance (operational trajectory) differs from each other, any modeling of the target vector would face the problem of the damage progression forming in complex systems’ operational compositions such as initial wear level or degradation pattern.

An alternative method in this category is the principal component analysis (PCA) which is a statistical dimensionality reduction procedure using an orthogonal transformation to convert a set of correlated input variables possibly into a set of linearly uncorrelated principal components. These target components are achieved from singular value decomposition of rectangular matrices, x [80]. In prognostics, the first principal components of multi-dimensional data are used as the health indicator for RUL estimation [52, 53, 76, 77].

$$\begin{aligned} y_1=\beta _{11}x_{1}+\beta _{12}x_{2}+\beta _{13}x_{3}+\cdots +\beta _{1p}x_{p}=\beta _{1}^{T}x \end{aligned}$$

where \(\beta\) is a matrix of coefficients determined, x is the input matrix and y is the target . PCA is an unsupervised model which means that it does not require a pre-defined target variables. In prognostics of multidimensional data, it can be applied into individual run-to-failure trajectories; however, this restricts the standardization of multiple instances into a common scale, and will not allow the determination of the operational characteristics.

Multi regime normalization is a common data processing step to provide a common scale across all the dataset characteristics before the prognostic analysis. A standard method to perform component-wise standardization is the standard score [6, 26, 51, 52, 76, 81,82,83] which is denoted as :

$$\begin{aligned} N(x^d)=\frac{x^d-\mu ^d}{\sigma ^d}, \quad \forall d \end{aligned}$$

where \(x^d\) are the multidimensional data in a specific regime feature d, and \(\mu ^d\) and \(\sigma ^d\) are respectively the mean and standard deviation of the same regime. It is proposed that such a component-wise “multi-regime normalization” method can standardize the multidimensional data according to each other within a single domain [51, 81]. In contradistinction to the lack of ability to consider trajectory characteristics in the regression analysis and PCA, this method can deal with the damage progression in complex systems under dynamic regimes by considering the population features. Standard score can normalize multiple trajectories with their all components and preserve the operational characteristics such as the damage progression and initial health levels of different operations.

Fig. 7
figure 7

A sample demonstration of z-scores in a normal distribution

In Fig. 7, a sample z-score grading method in a normal distribution is shown along with standard deviations, cumulative percentages and percentile equivalents. When the “multi regime normalization” is applied to multiple operational trajectories, “the standard score” determines each regime’s unique population mean and standard deviation. This will allow the simultaneous normalization of the entire data. On the other hand, in these cases, all trajectories should be available at the same time and normalization should be done at once. Nevertheless, it would be rather unlikely to find such a scenario in a real-life scenario due to the restrictions on data proprietary and confidentiality [84]. When the “multi-regime normalization” is applied into a real-world case, the standard score method should be repeated for each novel trajectory to reconsider the altered population characteristics.

3.3.3 Artificial Neural Networks and Deep Learning

Artificial neural networks (ANNs) are the most common examples of modeling techniques in data-driven prognostic approaches [85]. These statistical models are directly inspired by, and partially modeled on the behavior of the biological neural networks of the brain. The neurons, the basic units of neural networks, are capable of modeling and processing nonlinear relationships between input and output parameters in parallel [86,87,88]. The connection between inputs and outputs is achieved by exposing the network to a set of input samples, training the network, and re-adapting the network to minimize the errors [86].

ANN is formed by the functions of multiplication, summation and transfer functions [89]. The first stage, the neurons multiplies the inputs by weighs and the sum of weighted inputs are fed to o the transfer function. In this process at the neurons, the weights are automatically altered to increase the compliance of the model with the input data [86]. Considering a neuron with a single n-individual element input vector as:

$$\begin{aligned} x_1,x_2,x_3,\ldots x_t \end{aligned}$$

the element inputs are multiplied by weights.

$$\begin{aligned} w_{1,1},w_{1,2},x_{1,3},\ldots w_{1,t} \end{aligned}$$

The neuron also includes a bias b summed with the weighted inputs to model the net input, N.

$$\begin{aligned} N=w_{1,1}x_1,w_{1,2}x_2,x_{1,3}x_3,\ldots w_{1,t}x_t=W\times x+b \end{aligned}$$

This sum, N, is the argument of the transfer function f.

$$\begin{aligned} y=f(n) \end{aligned}$$

These equations are the keys in establishing a set of interrelated functional relationships between numerous input series and desired outputs [88]. The final governing equation for a multi-layer network function is denoted as:

$$\begin{aligned} y_{(t)}&=f\left[ \mathbf {x}_{(t)} \right] \nonumber \\&= f_o \left\{ b+\sum _{h=1}^{n_h}w_{h} f_h \left( b_h + \sum _{i=1}^{ n} w_{ih} x_{(i)} \right) \right\} \end{aligned}$$

where the fixed real-valued weights, \((w_i)\), are multiplied by the input data, \((x_i)\), and the bias, \(b\), is added. The neuron’s output, \(y\), is obtained by the nodes and transfer function of neurons [89, 90]. Figure 8 shows such a multiple-layer neural network model in which the layers are formed of their own variables.

Fig. 8
figure 8

Feed-forward neural network structure

In common ANN based prognostics, the condition monitoring of a complex system may not produce precise data, and the desired information is not always directly linked with the input data. In such scenarios, ANN is a convenient computational model to understand the system behaviour without knowing the exact correlation between input and output parameters [91]. Because of this ability on modelling non-linear processes along with their faster and easier calculation framework, neural networks have found many implementations in complex engineering systems.

One advanced prognostic application of these neural network models covers the deviation between healthy and deteriorated signal measurements from a complex system [92]. This network structure is formed with back-propagation through time gradient calculations and designed to overcome the challenges in adaptation, filtering and classification. A Multi-Layer Perceptron (MLP) neural network model is at first defines the difference between the healthy and failed system conditions by assuming the earliest samples in each time series as a healthy time line and the latest samples as degraded time line. Then, it predicts the number of cycles remaining before the failure [92].

A similar method is proposed for the regression application for prognostics and combined with the Kalman filter method for predictions over time [81]. In this work, it is pointed out that the data pre-processing, regime identification and exploration are principal initial stages of the prognostic framework.

An alternative ANN prognostic model, echo state network-based prognostics, form a supervised learning principle and architecture for recurrent neural networks [83, 93]. In this model, a random, large and fixed recurrent neural network is driven via the input signals by inducing each neuron within the network and combining a desired output from the response signals.

For multi regime conditions, a further multilayer feed-forward architecture applies an error back propagation algorithm to develop the damage estimation models for different regime levels, i.e., take-off, climb, and cruise [94]. Similarly, the earlier trajectory similarity-based RUL estimation method is expanded with a Radial Basis Function Neural Network (RBFN) [6]. From this work, it is observed that the similarity-based prediction model has substantial advances in their prediction performance over ANN-based prediction models.

For complex systems under dynamic regimes, the use of neural networks for RUL estimation is extended for higher performance levels of the learning methodology [95, 96]. Particularly, the feature selection procedure is applied within the he neural network based prognostics and it is shown that it should be performed according to the predictability of features [97]. For multi-step estimations, the network structures such as non-linear autoregressive ANN models form dynamic filtering frameworks in which past monitoring information is used to predict the future values [98]. Nevertheless, this type of network models has challenges in estimating the exponential damage propagation and might require a recurrence relation model to transform the data for network training [98].

Neural network based prognostics have been used in various complex domains with promising learning results [7, 40, 81, 85, 92, 99,100,101]. However, it may not always possible to train the functions as desired and the network structures at several prognostic phases may not provide the expected results, especially on the time series with complex and exponential degradation patterns. In these cases, the network functions act as an autonomous system that recursively simulate the system behaviour [102, 103]. The multi-step ahead network estimations might be particularly difficult in the cases where the previous monitoring information is limited and the failure will be in the longer term [25].

Conversely, the neural filters are robust models for prognostic and provide a synthesized mapping between the simulated or experimental data and a desired target [104]. These models have been used for multi-regime condition monitoring information with high prognostic performance [40, 92, 99]. Since the multi-step ahead network-based estimations are challenging [25, 105], a synthesis of neural networks with alternative prognostic methods is essential for higher prognostic performance and various applications can be found in the literature [40, 81, 98,99,100,101, 106]. Each of these can be considered as a hybrid prognostic framework.

4 Hybrid Applications

Considering the prognostics for complex systems, a certain application may not always accurately perform in the domains with interrelated components, and therefore a hybrid prognostic method or a fusion of models will improve the prognostic characteristics and capability [107]. The damage progress of complex systems under superimposed operational is not deterministic, and usually multidimensional [3]. A composed application for these cases is generally unable to handle with the damage progress and one should consider more advanced strategies [5].

No free lunch theorem (NFLT), also known as the impossibility theorem of optimisation suggests that a general-purpose universal strategy is impossible for optimisation and a scheme can outperform another one when concentrating on a specific case [108, 109]. Particularly in prognostics, there are no framework that is ideal for all problems [110]. Therefore, the prognostic models are commonly fused with alternative applications by considering the system data. Such hybrid application category consists of the combination of different prognostic models and applied in different domains [17]. A hybrid model may have more general importance when the desired features are added, and the drawbacks are removed. Also, it can leverage the strengths of the applications by leading to improved RUL estimation results in terms of prognostic metrics.

5 Conclusion

Prognostic use for the complex systems under multiple regimes is becoming in recent years more and more significant, which can be seen also by the amount of papers published on various domains. Researchers are becoming aware of the necessity of accurate RUL estimations and they need various approaches in order to acknowledge the benefits for the predictive maintenance applications. Computational methods can play a significant role in this process in particular due to the development of new computational and cognitive paradigms.

The historical evolution of computational prognostic methods and their use in complex systems shows that the sophisticated algorithms have emerged and enabled advanced analysis. In the future, these methods will play even a more significant role, due to the huge amount of data monitored and stored by advancing technologies. As shown in the paper, the current algorithms provide various tools that can significantly help researchers but it is important to choose a suitable method to apply its ability for specific domains. However, instead of selecting a single best algorithm, it seems that the best solution is to use combination of methods when solving new and complex prognostic problems.