Introduction

Fault diagnostics of practical systems is a very important problem that needs to be solved robustly to be able to make giant leaps in reliability and safety. Diagnostics is essentially an epistemological problem that requires us to make intelligent inferences based on data, which could be derived from empirical observations or computer models, and are often incomplete, noisy and uncertain. Although there is a rich and varied literature,Footnote 1 we feel that many of the diagnostic techniques in use are quite ad hoc and heuristic, resulting in lack of general applicability.

This paper presents innovative and rigorous techniques involving the nonlinear characteristics in a computational intelligence setting to diagnose changes in complex systems. Our approach consists of developing diagnostic methods using a combination of nonlinear dynamic analysis and computational intelligence techniques. In this paper, several applications are chosen with sufficient generality to demonstrate applicability to a host of disciplines.

The theoretical approaches are validated using data from fault simulators at Villanova University and Case Western University; we also validate our algorithms using experimental data from practical machinery provided by United Technologies Research Center (UTRC, USA) and Federal University of Uberlândia (Brazil).

The rest of the paper is organized as follows. Section 2 describes a family of methods that were originated and derived by our team called Phase Space Topology (PST). In Sect. 3, we present a recent development in PST using an example of bearing defect analysis. Section 4 summarizes some of the applications that were investigated to generalize the applicability of our developed methods. Finally, Sect. 5 concludes the paper.

Phase Space Topology Family of Methods

A Brief History of PST Development

The PST family of methods, which is based on characterizing the system phase space using quantitative measures, first originated in our previous work [1]. It should be noted that other studies [2,3,4,5] have also studied extraction of information from the system’s phase space for damage detection, although they have specialized it for the problem of structural health monitoring (SHM). They are indeed superficially similar to our method, but significant differences do exist. Most of these methods require exciting the structure by a chaotic input to extract a single feature and are directed toward SHM applications. In addition, the PST methods outlined below introduce original concepts such as Kernel density-based features, density approximation and orthogonal basis functions as a feature extraction technique.

We first focused on a nonlinear pendulum (shown in Fig. 1), which was diagnosed using a set of four features extracted from the phase plane trajectory of the system to characterize the nonlinear response in the periodic regime. The extracted features were related to the phase portrait eccentricity, harmonic amplitude and excursion variation. Finally, an Artificial Neural Network (ANN) was built using the numerical data for the estimation of parameters of a defective nonlinear pendulum set up with acceptable accuracy.

Fig. 1
figure 1

The nonlinear pendulum experimental setup

Kernel Density-Based Features

In Samadani et al. [6], an expansion to the previous study was developed to include the multi-periodic response regime of the pendulum which was characterized using an additional set of features extracted from the constructed density distribution of the phase space. To demonstrate how the Kernel density is estimated, let X = (\(x_{1}\), \(x_{2}\), …, \(x_{n}\)) be an independent and identically distributed sample data drawn from a distribution with an unknown density function f. The shape of this function can be estimated by its kernel density estimator (the hat, \(\hat{}\), indicates that it is an estimate, and the subscript indicates that its value can depend on h).

$$\begin{aligned} \hat{f_h}(x) = \frac{1}{nh}\sum \limits _{i=1}^n K \left( \frac{x-x_i}{h} \right) , \end{aligned}$$
(1)

where \(h>\) 0 is a smoothing parameter called the bandwidth, and K (.) is the kernel function which satisfies the following requirements:

$$\int\limits_{{ - \infty }}^{\infty } K (u){\text{d}}u = 1,\quad K( - u) = K(u)\forall u.$$
(2)

There is a range of kernel functions that can be used, including uniform, triangular, biweight, triweight, Epanechnikov and normal. Due to its conventional and convenient mathematical properties, we use the normal density function in our approach, defined as follows:

$$\begin{aligned} K(u)=\frac{1}{\sqrt{2\pi }}e^{-\frac{1}{2}u^2}. \end{aligned}$$
(3)

An example of the periodic and multi-periodic phase planes along with their corresponding density distributions of the nonlinear pendulum is shown in Fig. 2. It was shown that, depending on the geometry and shape of the phase space, the density diagram contains peaks of various heights and sharpness at multiple locations. This stems from the fact that the dynamical system occupies more time at specific regions of the space causing higher densities in those regions. The additional feature set that was extracted from the density distribution included the location, the height and the sharpness of the peaks in the density profile. The results showed that the introduced features are fully capable of characterizing the nonlinear response of the system in the periodic and multi-periodic domains with high complexities. It also demonstrated that the developed parameter estimation approach works effectively for both analytical and experimental data.

Fig. 2
figure 2

The phase plane and the corresponding density distribution for a periodic domain response and b multi-periodic domain response

Subsequently, in [7, 8], the same approach was validated using a 3-DOF nonlinear mass–spring–damper system where the diagnostics was treated as a parameter estimation problem. The results showed the effectiveness of the approach in characterizing the system behavior and capturing the dynamical changes while changing six parameters of the system simultaneously. Despite the success of this approach, the need to search for the peaks in the density diagrams made it difficult or sometimes even impractical to implement, especially for systems with noisy or more complex phase space patterns. This led to new advances continuing the development of PST, based on approximating the density distribution with an orthogonal basis, the details of which are described below.

Recent Contributions to the Development of PST

Density Distribution Approximation

A feature set is said to be ideal when it is able to reconstruct the original data. This assumption motivates the approximation of the density distribution of the phase space using a series of orthogonal basis where the coefficients of these basis will be used as features.

Let z be a state of the system and \(y_d= \hat{f_h}(z)\), its density computed using the kernel density estimator. \(y_d\) is then approximated with Legendre orthogonal polynomials. Legendre polynomials can be directly obtained from Rodrigues’ formula which is given by

$$P_{m} (z) = \frac{1}{{2^{m} m!}}\frac{{d^{m} }}{{{\text{d}}z^{m} }}\left[ {(z^{2} - 1)^{m} } \right],\quad m = 0,1,2, \ldots$$
(4)

It can also be obtained using Bonnet’s recursion formula:

$$(m+1) P_{m+1}(z) = (2m+1) z P_m(z) - m P_{m-1}(z),$$
(5)

where the first two terms are given by

$$P_{0} (z) = 1,\quad P_{1} (z) = z.$$
(6)

The coefficients of the Legendre polynomials are obtained using the least squares method assuming the following linear regression model:

$$\begin{aligned} f(z,\beta ) = \sum _{j = 1}^m \beta _j P_j(z). \end{aligned}$$
(7)

Letting

$$\begin{aligned} Z_{ij}= \frac{\partial f(z_i,{\varvec{\beta }})}{\partial \beta _j} = P_j(z_{i}), \end{aligned}$$
(8)

the estimated coefficients are given by

$$\hat{\beta } = (Z^{{\text{T}}} Z)^{{ - 1}} Z^{{\text{T}}} y_{d} .$$
(9)

The coefficients \({\hat{\beta }}\) constitute the features in our approach that can be used in classification or regression problems. The approximated density using Legendre polynomials is then calculated using the following:

$$f=Z{\hat{\beta }}.$$
(10)

Root mean square error (RMSE) and Pearson’s correlation coefficient (PCC) are calculated to compute the quality of the fit using the following equations:

$$\mathrm {RMSE}=\sqrt{\frac{1}{N}VV^T}, \quad \mathrm {PCC}=\frac{a^T b}{\sqrt{(a^Ta)(b^Tb)}},$$
(11)

where \(V=(y_d-f)\) is the residual vector, N is the number of points in the density function, \(a=\left( y_d-E\{y_d\}\right)\) and \(b=\left( f-E\{f\}\right)\), where \(E\{.\}\) is the expected value.

Artificial Neural Network (ANN)

ANNs are a set of computational models that are designed to recognize a relation or a pattern between inputs and outputs. Nowadays, they are widely used for regression, classification, prediction and optimization problems. ANN consists of an interconnected group of nodes called artificial neurons and each node has a corresponding weight. The knowledge of the system is stored as a set of ANN connection weights. The process of mapping inputs to outputs is called training, which is performed by adjusting the weights as learning proceeds. One of the most used algorithms in training ANN is the backpropagation algorithm, which optimizes the weights of the ANN by propagating an input forward through the network layers to the output layer where the calculated output is compared with the desired output. The error values of the calculated output and the desired output are then computed and propagated backwards. These errors are traced back to each associated neuron to update the weights.

In the last decade, neural networks have received considerable attention in the field of fault diagnostics. One of the main issues in ANNs is selecting an optimal feature set that can represent the system in different operating domains. The general trend in many applications is extracting a large number of features to be used. On the one hand, it works in some scenarios because the nonlinearity of systems makes it hard to define the relevant information that is needed. On the other hand, for applications where the available data are small with respect to the dimensionality of the system, extracting all features might be impossible [9]. In many cases, some of the extracted features are redundant or irrelevant and one can achieve better performance by discarding such variables [10].

Our goal has always been to extract just enough number of features that are relevant and provide rich information about the system. This can be achieved by combining our understanding of nonlinear dynamics in the developed algorithms. In this study, it is demonstrated that the selected number of features extracted by PST method provides very high level of information about the faults and the nonlinear effects on the system.

Example Application: Bearing Diagnostics

Rotating machinery are probably among the most important components in industry. Rotating machines are composed of different sub-systems interacting with each other in a nonlinear fashion; changes in any of these components can significantly affect the overall performance. In particular, rolling element bearing defects are one of the major sources of breakdown in rotating machinery. The rotating fault simulator machine shown in Fig. 3, is considered to study a variety of different machinery defects under various operating conditions, such as rotational speed, load and unbalance. It basically consists of a motor-driven shaft mounted on two bearings. Shafts and bearings with different sizes and conditions can be used. Various vibration sensors can be used such as accelerometers and proximity probes.

The PST algorithm is best explained with an example application; in this case, we choose the bearing diagnostics problem. A flowchart of the algorithm is illustrated in Fig. 4. Many traditional techniques that have been developed for bearing fault detection involve pattern recognition, which is only effective for an a priori defined operating condition set, shaft rotational speed in this case. Therefore, the classification model is required to be retrained each time the operating condition (rotational shaft speed) changes because of the dependence of the dynamic response on them. The lack of effective techniques under variable operating speeds is a well-established need, which motivates the development of this method.

Fig. 3
figure 3

Rotating fault simulator machine

Fig. 4
figure 4

Flowchart of bearing diagnostics algorithm

In this study, different rotational speed configurations between the training and testing sets of the classification model are investigated in (A) known speed domains and (B) bounded speed domains. To achieve this goal, the classification problem was initially trained on a set of speeds and then the classifier was tested on the same set of speeds. The classifier was trained and tested at the same 19 rotational speeds. In the second case, the bearing fault diagnostic approach was generalized to variable operating speeds within a given range of speeds (i.e., 300–3000 rpm). In this case, the classifier was trained on one set of speeds and then tested on another different set of speeds in which the speed test domains were completely bounded by the training speed sets as shown in Fig. 5. The analysis of both of the aforementioned procedures is described in the sequel in detail.

Fig. 5
figure 5

Speed configurations for training and testing the classification model a known speed sets and b bounded speed sets

Case (A): Known Speed Domains

To study a variety of different bearing defects under various rotational speeds (300–3000 rpm), a rotating fault simulator machine, shown in Fig. 3, was investigated. Four bearing conditions were investigated: healthy bearings (H), bearings with inner race defects (IR), outer race defects (OR) and ball defects (B). Vibration data were measured using two GE/Bently Nevada 7200 series proximity probe sensors in orthogonal directions and installed close to the bearing housing.

Legendre polynomials were used to approximate the estimated density distribution of the horizontal vibration signal for every rotational speed and bearing condition. The Legendre polynomial order was selected based on the least polynomial order that gives a reliable fit between the estimated and approximated density functions. To compute the quality of the fit, root mean square error and Pearson’s correlation coefficient were calculated. Based on these metrics, Legendre polynomials of order 20 were used to approximate the estimated density functions. Coefficients for the Legendre polynomials were computed for each of the 760 signal samples using the least squares method as shown in Eq. (9). The computed coefficients for each case, which represent the feature vector for that case, were saved in a 21-element vector (using only the horizontal vibration signal). Because of high impact of the rotation speed on the response of the dynamic behavior, it was used as an additional feature, making the total number of features equal to 22. From all available sampled data (760 samples), 50% of them (380 samples) were used to train the ANN, 25% of them (190 samples) were used to validate the training algorithm and the remaining 25% (190 samples) were used to test the classifier. The backpropagation algorithm was used for training the ANN. An ANN with ten hidden neurons was considered for this problem.

The performance of the selected classification model is presented using confusion matrices. A confusion matrix compares between the results of the predicted and the actual classes. Each row of the matrix shows the prediction results for the corresponding class at that row while each column shows the actual class. The main diagonal elements of the matrix show the correct classified prediction for each corresponding class, which are known as true positives. For each row of the matrix, all elements except the main diagonal elements are the misclassified prediction for the corresponding class, which are known as false positives. For each specific class, the summation of elements on its corresponding column excluding the main diagonal elements are called false-positive elements. The performance of the classifier can be evaluated using certain evaluation matrices derived from the confusion matrix, such as sensitivity, precision and overall accuracy. Table 1 shows the predictions for test data using the neural network classifier for case (A).

Table 1 Prediction of various defect classes for case (A)

As can be observed, the classifier has predicted all defects with no misclassification and with 100% sensitivity, 100% precision and 100% overall accuracy. These results are remarkable for a variety of reasons. First, it demonstrates that combination of the proximity sensor data and the PST features can address the challenges in fault identification at low rotational speeds (below 10 Hz). Second, no a priori knowledge of the system was integrated in extracted features. Therefore, the PST approach can be applied to diverse dynamical systems in an automated fashion with minimal adaptation or dependence on expertise. In conventional bearing diagnostic techniques, specific characteristics of the system such as ball pass frequencies are searched for; however, the proposed methodology does not require any additional analysis and functioned well without it. We should be cautious to note that other operating conditions may need additional feature combinations. Finally, no feature selection or ranking algorithm [11] was used here to select the optimal feature set. Since the coefficients decrease by increasing the number of orthogonal functions, the calculated coefficients are ranked naturally.

Case B: Bounded Speed Domains

For this case, density distributions were constructed using both horizontal and vertical vibration data for every speed and bearing condition. Then using Legendre polynomials of order 20, the estimated density functions were approximated. The feature set was constructed using the first 15 Legendre polynomial coefficients in each direction in addition to the rotational speed of the shaft making the total number of features equal to 31. The feature vector was then used as an input for training the ANN classifier. The classifier was modeled with ten neurons in the hidden layer and trained using backpropagation algorithm. For training, the classification model, four rotational speeds were selected including the machine operating range boundaries (300 and 3000 rpm) and two middle speeds (1200 and 2400 rpm). For these speeds, the available vibration data of 160 total samples at different bearing conditions were used for training and the remaining 600 samples obtained at the other speeds (e.g., 420, 600, \(\ldots\), 2820 rpm) were used for testing the trained classifier.

The classification results for the test data are represented as a confusion matrix in Table 2. As can be observed, the classifier overall accuracy is 96.7% with 20 misclassifications out of 600 predictions, which indicates a high prediction rate of classifying the four bearing conditions. Most of the misclassifications corresponded to bearings with ball defects. To evaluate effectively, the precision and sensitivity parameters were calculated and presented in Table 2. In summary, the classifier had both high precision and sensitivity for the majority of the bearing conditions. The classification model worked virtually perfectly in distinguishing between healthy and defective bearings since there were zero misclassifications.

Table 2 Prediction of various defect classes for test data, Case B

Summary of Applications

This section presents a summary of some of the applications that we investigated including bearing fault diagnostics, gear fault diagnostics and crack shaft diagnostics. In the following subsections, a brief introduction to each system along with the main contributions are described. Table 3 summarizes the main contributions that were achieved.

Table 3 Summary of other investigations

Other Studies in Bearing Diagnostics

We have performed various investigations on the bearing set up (shown in Fig. 3) to develop robust techniques to diagnose bearings. In [11, 22,23,24,25], we have applied conventional methods such as fast Fourier transform, envelope spectrum, and discrete wavelet transform over a span of rotational speeds as well as nonlinear physics-based modeling [26,27,28,29]. Accelerometer data were used to extract features to diagnose bearings with inner race, outer race and ball defects. Mutual information was then used as a ranking technique, and the optimal feature subset corresponding to the highest classification accuracy was determined. An overall accuracy of 97.0% was achieved using this procedure.

In [16, 30], we introduced the mapped density method to discriminate simultaneous bearing faults under various rotational speeds. In this work, we studied the use of the information provided by proximity probe sensors. The method had significant success in fault discrimination for a single and two bearing fault configurations (accuracy of 97% for a single bearing fault and 92% for two bearing faults). Moreover, the results indicate that this method has high performance in distinguishing between different bearing conditions signatures (accuracy of 88%).

In [13,14,15, 31], the PST method was introduced. In this work, we performed bearing diagnostics on different rotational speeds domains and obtained very good results (overall accuracy of \(96.7\%\)). We have also applied other nonlinear techniques such as recurrence plots [32], Gottwald and Melbourne’s 0-1 test and the Higuchi fractal dimension [33].

Gear Fault Diagnostics

Gear fault diagnostics is still a challenging task because of the highly nonlinear characteristics of faults and its complex nonstationary dynamics. Our work investigated gear fault diagnostics using vibration data of a helicopter gearbox mock-up system provided by UTRC. The experimental setup is shown in Fig. 6. This work studied multiple test gears with different health conditions such as healthy gears (H) and defective gears with root crack on one tooth (SCD), multiple cracks on five teeth (MCD) and missing tooth (MTD). The vibrational signals were recorded using a triaxial accelerometer installed on the test gearbox.

In [19, 34], we presented the application of recurrence plots (RPs) and recurrence quantification analysis (RQA) in the diagnostics of various faults in a gear-train system. It also applied mutual information to rank the extracted features order to obtain an optimal feature set. Results indicated that RQA parameters provide valuable information in characterizing the dynamics of various gear faults to discriminate the healthy gear condition from defective conditions. Also, outstanding performance was achieved using RQA parameters to identify various gear conditions with 100% accuracy, 100% recall and 100% precision in detecting multiple cracks and missing tooth conditions.

In [18], the PST method was applied to detect anomalous behavior and to diagnose various gear defects. Results indicates 99% accuracy in classifying between different gear conditions.

Crack Detection

Experiments were conducted on a crack propagation simulator test rig, shown in Fig. 7, in collaboration with Federal University of Uberlândia, Brazil. This machine consists of a flexible steel shaft mounted on two roller bearings. Two orthogonal proximity probes oriented in the horizontal and vertical directions were used to measure the vibration response of the shaft. The crack propagator was used over a period of 24 h to produce a fatigue crack, which is the first damage condition. Then the second fatigue crack was produced using the crack propagator for another 24 h period.

In [20], the PST method integrated with mutual information was applied to detect cracks and to identify the level of degradation. Mutual information was used to only select the most relevant PST-extracted features. Results show 100% performance using this algorithm; it is notable that only three features were necessary to detect cracks and identify the crack level.

Fig. 6
figure 6

Gear-train experimental setup

Fig. 7
figure 7

Crack experimental setup

Conclusion

This paper summarizes the development of a novel family of methods that originated and developed at VCADS. It discusses briefly some of the applications such as nonlinear pendulum, gear-train, cracked shafts and electro-hydraulic servo actuators [21]. It also presents a recent investigation of PST for bearing defect analysis. These methods are based on the hypothesis that the nonlinear response of dynamical systems contains valuable information about the system, given the intrinsic nonlinear nature of real systems. We conclude with a strong message that the key to robust and accurate diagnostics is algorithms that are able to zoom in on these nonlinear aspects of the response and can characterize them effectively.