Basic research on machinery fault diagnostics: Past, present, and future trends

Machinery fault diagnosis has progressed over the past decades with the evolution of machineries in terms of complexity and scale. High-value machineries require condition monitoring and fault diagnosis to guarantee their designed functions and performance throughout their lifetime. Research on machinery Fault diagnostics has grown rapidly in recent years. This paper attempts to summarize and review the recent R&D trends in the basic research field of machinery fault diagnosis in terms of four main aspects: Fault mechanism, sensor technique and signal acquisition, signal processing, and intelligent diagnostics. The review discusses the special contributions of Chinese scholars to machinery fault diagnostics. On the basis of the review of basic theory of machinery fault diagnosis and its practical applications in engineering, the paper concludes with a brief discussion on the future trends and challenges in machinery fault diagnosis.


Introduction
Machinery fault diagnosis is becoming increasingly important in the field of process monitoring due to greater demands for mechanical systems that provide higher performance, safety, and reliability. Advancements in science and technology have led to the development of mechanical systems, such as those found in wind turbines, aircraft, high-speed trains, and machine tools (Fig. 1).
Meanwhile, engineers must find ways to guarantee the performance of these systems so that they can execute the required functions under the stated conditions for a specified period of time. Some of these functions include monitoring the operation conditions of machines, identifying whether an abnormal condition or fault arises in machines or components, determining the original cause of abnormal conditions or faults, assessing its level of severity, and predicting the remaining useful life or trends of abnormal conditions. Machinery fault diagnosis is one of the key techniques for continuous maintenance ( Fig. 1 [1,2]), which can help avoid abnormal event progression, reduce offline time, forecast residual life, and reduce productivity loss. In turn, these can help avoid major system breakdowns and catastrophes.
The key components of mechanical equipment would inevitably generate different faults of varying degrees because of complex and severe conditions, such as heavy load, high temperature, and high speed.  display some possible fault modes in various machine components in transmission systems, including helicopters, a hot strip milling production line, and wind turbines. Machine faults also occur and lead to serious outcomes even in sophisticated machine systems. One such example is the "bad rub" incident of the F135 engine, which resulted in the F-35A catching fire during take-off [7][8][9].
Machinery fault diagnosis techniques involve observing a mechanical system over a period of time using periodically sampled measurements from an array of sensors, extracting fault-sensitive features from these measurements, conducting statistical analysis of these features to determine the current health state of the system, and predicting the remaining useful life and trend of the fault. For example, engine health management (EHM) is a collection of capabilities to create customized designs that best meet the needs of individual users. An EHM system in the F135 engine is designed to provide real-time data to maintainers on the ground, drastically reducing troubleshooting and replacement time by as much as 94% over other legacy engines. Such a system consists of both engine-hosted and ground-based elements. The enginehosted element generates data from on-board sensors and performs basic fault isolation and prediction, thus supporting on-wing maintenance. In comparison, the ground-based element supports long-term degradation trending, thus providing planning information that can be used by aircraft fleet managers. The health usage and monitoring system (HUMS) monitors the condition of critical components and systems in helicopters, especially the drivetrain, so that the timely detection of progressive defects or faults is possible and maintenance can be performed before such defects can have an immediate effect on operational safety. HUMS records vibration measurements taken at various critical components using different sensors, and then stores these measurement data in a removable memory for further diagnostics and prognostics. HUMS is also widely used for other mechanical systems, such as wind turbines and machine tools, especially for systems with high operational reliability requirements. As can be gleaned from the information above, fault diagnosis is a major research topic that has attracted considerable interest from industrial practitioners and academic researchers.
The available literature ranges from analytical methods to artificial intelligence and statistical approaches, including four basic research directions of machinery fault diagnosis (Fig. 1). These four directions are the four key procedures involved in machinery fault diagnosis. Prior to mechanical fault diagnosis using various methods, the root cause of fault generation in the view of fault mechanism is analyzed first. For example, in diagnosing the bearing fault, the characteristic frequencies must also be calculated. The second step is signal acquisition through various sensors. Subsequently, signal processing along with feature extraction has to be implemented to reduce the dimension of raw data and to obtain useful information representing faults. The performance of mechanical fault diagnosis largely depends on selected signal processing techniques and appropriate feature extraction. Finally, a mapping relation between the fault-sensitive features and certain faults can be established by intelligent methods for fault detection and identification.
The remaining part of this paper is organized as follows. Section 2 reviews the modeling methods for the typical fault of the rotating machinery and composite structure. Section 3 briefly describes the data acquisition step using different sensors. Section 4 reviews the signal processing methods for fault feature extraction. Section 5 reviews the ideas and methodologies employed to carry out fault intelligent decision making. Section 6 highlights the development of machinery fault diagnosis in China. Section 7 lists the possible future development trends and challenges of machinery fault diagnosis, pointing out some existing challenges in diagnostics. Section 8 provides

Fault mechanism
Understanding the mechanism of fault generation and propagation is the foundation of mechanical fault diagnosis. The model-based approaches generally use physicsspecific and explicit mathematical models of the machines being monitored. Most model-based approaches are represented by complete mass, stiffness, and damping matrices of the system based on input-output and statespace models, where mechanical faults are performed by introducing external forces in equations. Currently, various model-based diagnostic approaches, including analytical method, finite element (FE) method, and combined analytical--FE approach, have been applied to conduct fault diagnosis of a variety of rotating machineries and composite structures, such as gearboxes, bearings, rotors, and cutting tools.

Rotating machineries
Rotating machineries play an important role in industrial and economic development and have become increasingly complex due to the rapid advancements in the industry. Notably, the reliability and robustness of rotating machineries have also undergone significant improvement. Yet, some occasional failure events often lead to unexpected downtime with huge economic losses. Model-based approaches in rotor dynamics consider the model of a fully assembled machine, which consists of the sub-models of the rotor, bearing, gear, and foundation. In addition, fault diagnosis on the component level is also very important.

Component level
The three basic rotating components (i.e., bearing, gear, and rotor) play an important role in industrial rotating and transport machinery applications. However, these components are prone to breakdown. Among all the mechanical parts, rolling element bearings are widely used and easily damaged. Bearing defects serve as warnings for other possible faults in rotating machineries. For example, misalignment or imbalance can lead to bearing defects [10]. The dynamic modeling of rolling element bearings is helpful in understanding the mechanism of vibration generations in a faulty rolling element bearing and to improve the efficiency of the vibration-based condition monitoring and fault diagnosis [11]. The time-varying and transient dynamic behaviors of bearing components under high-speed and other complex operating conditions make the vibration of a faulty bearing rather complicated. Such complications, in turn, make it more difficult to carry out vibration-based diagnosis techniques. Therefore, a dynamic model considering transient and time-varying motions of bearing components is urgently needed to predict the dynamic behavior of faulty rolling element bearings.
One of the most comprehensive dynamic models of rolling element bearings was proposed by Gupta [12][13][14][15][16][17]. Gupta's model considered the high-speed effect, transient impact between bearing components and lubrication and cage effect, thus provides a powerful tool to investigate the transient dynamics of rolling element bearings. Based on Gupta's model, the dynamics and vibration responses of ball [18,19] and cylinder roller bearings [19] with localized surface defects on raceways are investigated. The finite size of a rolling element, an important factor in this case, largely affects the impact between a rolling element and a raceway defect [20]. Therefore, when modeling the localized defects, the finite size of a rolling element has also been considered in Ref. [21]. Moreover, the lubrication effect and impacts at cage/ball and cage/guiding ring contacts have been considered in Ref. [21]. Therefore, the model proposed in Ref. [21] capably analyzed the timevarying orbital speed of a ball, which largely affects the defect frequencies of raceways. Patel et al. [22] developed a dynamic model of deep groove ball bearing in the presence of defects on either of races under steady and dynamic loading condition. They concluded that the amplitude of vibration velocity in multiple defects is higher than a single defect on either race.
Meanwhile, gears are the most important elements in the gearbox. They are subjected to wear and fatigue even under normal operating conditions, which means that they are often subjected to premature failure. Local faults of gears are more dangerous because they tend to develop rapidly once initiated and often have significant effects on power transmission. If the most important local faults are not detected in a timely manner, dramatic consequences might occur, such as tooth breakage, pitting, and scoring. Ma et al. [23] developed an FE model of a geared rotor system with tooth root crack, while considering the effects of the extended tooth contact and tooth root crack on the time-varying mesh stiffness. Furthermore, considering the accurate transition curve, misalignment of gear root circle, and base circle, they developed an improved model for obtaining the mesh stiffness of the perfect and cracked gear pair, and then validated the model by the FE method [24].
In the last three decades, several scholars have focused on the diagnosis of rotor cracks in rotating machineries. The excellent review papers by Dimarogonas [25], Wauer [26], and Gasch [27] cover many aspects of this area and present valuable information and knowledge in this field. In 2008, as the guest editors, Bachschmid and Pennacchi [28] edited an issue in Mechanical Systems and Signal Processing about crack rotors. The challenge of modeling a crack is one of the most significant issues in this area. In relation to this, the dynamic behavior of rotors with transverse crack has been studied by many authors [25,27]. Pennacchi et al. [29] proposed a model-based transverse crack identification method in the frequency domain to investigate the dynamic behavior of cracked horizontal rotors, which they validated by experimental results obtained on a large test rig. Papadopoulos [30] reviewed the strain energy release approach (SERP) for modeling cracks in rotors and presented some extensions and limitations of SERP. They found that when more than one crack appears in a structure, the dynamic response becomes more complex depending on the relative positions and depths of these cracks. Sekhar summarized the different studies on double/multi-cracks and noted the identification methods in vibration structures, such as beams, rotors, pipes [31]. Gasch [32] studied the dynamic behavior of the Laval rotor with a transverse crack. Despite these abovementioned studies, explicit mathematical modeling may not be feasible for complex systems, because it would be very difficult or even impossible to build mathematical models for such systems.

System level
Modeling the whole assembled systems is more significant compared with modeling the individual components. The model-based fault diagnosis in rotor systems is essentially a multiple-input and multiple-output inverse problem. The typical faults in rotating machineries, including rotor bow, rigid coupling misalignment, transverse crack, and axial asymmetry, have been modeled as equivalent forces by modal representation [33]. The M-estimate technique is more robust and accurate than the traditional least squares method and has been applied to identify unbalances of rotor in a gas turbogenerator of a power plant [34]. Past studies [35,36] presented a model-based method exploiting analytical redundancy for detecting faults in a gas turbine process; the authors then tested the model on a single-shaft industrial gas turbine prototype model. In order to monitor a powerful 20-cylinder diesel engine, Desbazeille et al. [11] modeled the angular speed variations at the crankshaft free end, including the crank shaft dynamic behavior and excitation torques. Then, they optimized the mechanical and combustion parameters of the model by actual data and employed neural networks to identify healthy and faulty conditions. Hou et al. [37] considered the maneuver load of a climbing-diving flight and modeled an aircraft rotor-ball bearing system and analyzed the nonlinear dynamic behaviors of cracked rotors in flight maneuvers. Subsequently, they studied the nonlinear responses of a cracked rotor-ball bearing system by considering the breathing mechanism of the transverse crack and the maneuver load of a climbing-diving flight [37]. Lu et al. [38] analyzed the nonlinear dynamic characteristics of a rotor system supported by ball bearings with pedestal looseness.
Liang et al. [39] developed a dynamic model to simulate the vibration source of a planetary gearbox and investi-gated the vibration properties of healthy and cracked tooth conditions. Ma et al. [40] established a rubbing model between the rotating blade and elastic casing based on the law of conservation of energy. Recently, Ma et al. [41] reviewed the dynamics of cracked gear systems mainly from three aspects, namely, crack propagation prediction, time-varying mesh stiffness calculation, and vibration response calculation. In a latter study, Ma et al. [42] established an FE model of a rotational shaft-disk-blade system and simulated the rubbing between the blade tip and casing using contact dynamics theory. Furthermore, they also investigated the dynamic behaviors of a perforated gear system by considering the effects of the gear crack propagation paths [43]. Hu et al. [44] proposed a FE node dynamic model of the gear-rotor-bearing system with different lengths of crack considering time-varying mesh stiffness, backlash, transmission error excitation, flexible shaft, and supporting bearing. Rolling element bearings are often at the heart of rotating machineries and tend to suffer from faults more frequently. Gui et al. [45] established a gear-bearing coupling dynamics model of planetary gear trains based on a nonlinear bearing dynamics model with two degrees of freedom and a bending-torsion coupling dynamics model to study fault diagnosis of localized bearing defects of planetary gear system. Tadina and Boltežar [46] considered the centrifugal load effect and radial clearance and developed an improved bearing model to investigate the vibrations of a ball bearing during the run-up, which introduced various surface defects due to local deformation [45].

Reciprocating machineries
The structure of reciprocating machineries is much more complex as it has both rotating and back-and-forth motion parts. The working environment is usually tough and bears heavy loads during operation. Therefore, the classical fault diagnosis methods that are used for rotating machineries with steady rotating speed may be ineffective when evaluating reciprocating machineries (e.g., reciprocating compressors, gas engines, and diesel engines) because the signals measured in the reciprocating machineries often contain strong noise components even if they are in the normal state. The typical characteristics of the reciprocating engine vibration include impact excitations, timevarying transfer properties, and non-stationary random response [47]. Sudden breakdown of reciprocating machineries and decreased machinery service performance often occur due to difficult fault detection in some parts of the reciprocating machineries. Although many studies have been carried out to achieve fault diagnosis of reciprocating machineries, the diagnosis and isolation of the faults of reciprocating machineries remain very challenging problems in this field.
To help solve the problem, Wang and Chen [48] developed a fault diagnosis method using the adaptive filtering technique and a fuzzy neural network with the aim of diagnosing faults of a rolling bearing used in reciprocating machineries. In another study, Wang and Chen [49] proposed a feature extraction based on information theory for reciprocating machineries. Lee and White [50] presented an enhancement scheme to aid the measurement and characterization of impulsive sounds and vibration signals for fault detection in reciprocating machineries. Shen et al. [51] developed rough sets theory to diagnose the valve fault for a multi-cylinder diesel engine, while considering the complex structure of the engine and the presence of multi-excite sources. Wang and Hu [52] investigated the use of basic fuzzy logic principle as a fault diagnostic technique for five-plunge pump used in oil field. El-Ghamry et al. [53] proposed the automated pattern recognition and statistical feature isolation techniques for the diagnosis of reciprocating machinery faults using acoustic emission (AE). They found that the nonuniform cylinder-wise torque contribution increased torsional vibration levels of the crankshaft and stress of mechanical components in reciprocating engines. Östman and Toivonen [54] developed a method for reducing the torsional vibration of the crankshaft system. Schultheis et al. [55] investigated the risk-based decision making for condition monitoring of reciprocating compressors. Goodwin et al. [56] provided an extensive review on theoretical and experimental work undertaken on the design and performance assessment of bearings in reciprocating machineries. Geng et al. [47] presented a systemic and detailed review of impacting excitations, time-varying vibration characteristics, and applicable analysis and diagnosis strategies for reciprocating engines.

Composite materials and structures
Composites consist of two or more distinct phases of constituent materials, which can provide enhanced properties that would be impossible with any of the monolithic materials alone. Therefore, especially in the past decade, composite materials have received considerable attention in many modern industries, such as aerospace engineering and wind power energy engineering. The use of composite materials contributes to the development of analysis techniques, which are capable of determining homogenized properties for composite materials with various microstructures and material constituents. These accelerate the development circle of a material system with the desired mechanical and physical properties by circumventing the traditional trial-and-error approach based on actual fabrication and laboratory testing. However, the anisotropic nature and hard-to-access property of the composite make the overall and local responses notoriously complicated. Hence, a good understanding and predictive capability of their stress-strain and failure behaviors is critical in the effective utilization of these materials.
New approaches are continuously being developed and proposed, but the majority of these are based on the FE method. The FE method can be easily used to solve physical and mechanical problems of composite materials because of the popularities of commercial FE software with convenient graphical interfaces. Furthermore, with the facility of standard explicit FE code, various types of constitutive theories for composite analysis can be easily accommodated into the FE framework. Pituba et al. [57] developed a contact FE in order to capture the effects of phase debonding, interface crack closure or opening, and the cracking process inside the matrix of fiber-reinforced composites. Zuo et al. [58] developed the wavelet FE method adopting B-spline wavelet on the interval to investigate static and free vibration problems of laminated composite plates. Nonetheless, the FE approach often requires very complex boundary conditions, which makes applying different loading combinations quite difficult. In addition, the FE method is sensitive to mesh discretization. Substantially refined meshes are needed for solving nonlinear and crack problems, which generally generate large stress and deformation gradients.
The finite-volume theory has been proven to be an attractive alternative to the well-established FE method [59]. Initially developed to help solve fluid mechanics problems, finite-volume theory has rapidly evolved during the past 20 years in the solid mechanics area after transitioning from the fluid mechanics field. The contributions of Pindera and his colleagues [60][61][62] have the spurred extensive use of finite-volume theory in predicting the stress-strain behaviors and fracture phenomena in a wide range of fiber-reinforced composite materials. The accuracies of homogenized and local responses have been shown to be comparable to those of the FE method but with even greater efficiency. Recently, Chen et al. [63] proposed a new multiscale method based on finite-volume theory and the classical lamination theory to investigate the effects of thermal residual stresses and loading rate on the global and local responses of laminated polymer composites that are widely used in wind turbine blades. The finitevolume theory has also been further extended to 3D domains by Chen et al. [64] to investigate the deformation behavior of composites with discontinuous reinforcements.
Monitoring the performance of a composite structure and damage prognosis is very important due to the current emphasis on sustainability and efficiency in modern structural designs. Utilization of composites in aerospace engineering and wind power engineering has entailed intensive research and development of nondestructive evaluation techniques in the past 30 years. The traditional nondestructive evaluation equipment is unable to provide efficient access to appropriate sections of the structures in real time. Therefore, new nondestructive evaluation approaches are continuously being developed and proposed so as to achieve real-time damage detection. In practice, acoustic emission monitoring and strain monitor-ing are two available structural health monitoring (SHM) methods that can potentially achieve continuous online monitoring. Joosse et al. [64] employed AE monitoring during their testing of fiber composite blades to detect the source of damage events and assess the blade condition. Schroeder et al. [66] utilized the fiber Bragg gratings to monitor loads in horizontal-axis 4.5 MW wind turbines. Tian et al. [67] proposed a damage detection method based on static strain responses using fiber Bragg gratings in a 220-kW wind turbine blade.
Another popular SHM technique for the composite material is the ultrasonic guided wave technique [68], which is widely acknowledged as one of the most useful tools for quantitative identification of damages incurred by plate-and pipe-like structures [68,69]. A typical sensor configuration consists of a sparse array of fixed or embedded piezoelectric disks. Response signals are recorded from the sensor array after certain excitation signals are transmitted. Given the high sensitivity of guided waves for various types of damage, damage can be located and quantified with some signal processing techniques. The time-reversal (TR) method of guided wave, as a spatial and temporal self-focusing technique, can improve the detectability of damage in composite plate-like structures [70]. Park and Sohn [70,71] investigated the TR process in a quasi-isotropic composite plate and developed a reference-free damage diagnosis technique based on TR to identify defects. Lin et al. [72,73] investigated different parameters affecting the guidedwave inspection resolution and developed the pulse compression method for carbon fiber reinforced plastic laminates. Hall et al. [74,75] proposed the minimum variance ultrasonic imaging method, which adaptively determines the weighting coefficients at each pixel based on traditional delay-and-sum imaging, and better imaging performance was achieved in a composite plate, such as fewer artifacts and robustness to multiple wave modes. Levine and Michaels [76] proposed a Lamb wave propagation model-based imaging method via sparse reconstruction to locate damage position. The method takes prior knowledge of the sparsity of structural damage and significantly improves the accuracy and precision of the identified damage location. Li et al. [77] developed a crack growth sparse pursuit method for composite wind turbine blades based on the model-based imaging method, and achieved accurate crack detection with correct locations and extension length.

Sensor techniques and signal acquisition
Data acquisition is the process of sampling and storing signals (information) that measure real-world physical conditions for condition-based maintenance. In practice, condition monitoring data, such as vibration [78,79], sound [80,81], temperature [82], and pressure [83], are versatile. Sensors are devices that convert physical parameters into their corresponding electrical or optical signals, which should be designed to have a small effect on what they measure. Basically, a good sensor must have the following capacities and features: Sensitivity to the measured properties; Insensitivity to any other properties that are likely to be encountered during application; and Resistance against the influence of measured properties.
The past 50 years have witnessed the rapid development of sensor techniques. On the basis of the physical phenomenon or physical properties to be measured, various sensors have been designed and used to collect different types of data, as summarized in Table 1. For example, AE sensors can detect transient elastic waves produced by sudden redistribution of stress in a material due to damage/crack initiation or propagation [84], making them very efficient tools for monitoring damage expansion. Accelerometers are some of the most frequently used sensors and are often used to measure global information, such as frequency and mode shape [85]. Strain gauges not only provide localized measurement, they are also good at capturing static or dynamic measurands at a relatively low variation rate [86]. Fiber-optic sensors measure local strain, are immune to electromagnetic interference, and are suitable for long-distance data transportation [87]. A comprehensive review of various types of sensors for composite materials can be found in Ref. [88].
Nowadays, cables are still the most widely used tools in sensor data communication. Delivering data through cables is a very stable technique, and users need not worry about bandwidth and data packet dropout. However, many difficulties and potential troubles are encountered in cable displacements, switches, and replacements, which in turn, restrict the use of cables especially in an industrial environment. Numerous cables are needed when placing transmission networks of traditional equipment, which increases the installation and maintenance costs. Moreover, once the cables are damaged, the process of replacing these is often very complicated and may even be impossible in some cases. With technological progress, wireless sensor networks (WSNs), as a new signal collection and transmission technique, can provide an alternative solution to cost-efficient data communication in the fault diagnosis of mechanical equipment [89,90]. The advantages of WSNs are well known: Support from fixed networks is not necessary, countless wireless sensors can be arranged flexibly, and sensor positions in remote locations would be easier to install and maintain.

Signal processing
It is difficult or even impossible to make sense of the information buried in a raw signal directly; because such signals obtained from an instrument measuring a vibration response always contains noise. In relation to this, signal processing is a technique that uses various algorithms to analyze and transform raw signals into a meaningful representation of the information contained in the raw signal while suppressing the effects of noise. Accordingly, condition signals must be analyzed using signal processing methods to generate fault-related characteristic features that facilitate decision making. Signal processing methods, such as wavelet and wavelet packet methods, empirical mode decomposition (EMD), time-frequency distributions, minimum entropy deconvolution, spectral kurtosis (SK), and envelope analysis, are widely used in mechanical fault diagnosis [91]. These methods can be categorized into three aspects: Time domain, frequency domain, and timefrequency domain. These methods are not totally independent, and in many cases, are complementary to one another. The choice of such approaches and characteristic features depends on the nature of the signal and the required information [92].

Spectral kurtosis
Narrowband filtering is a common method for fault detection of rotating machines, although it needs extra frequency band information. Moreover, the oscillation frequency and time duration of the impulse response are relative to the dynamic parameters of the mechanical system, which are difficult to estimate dynamically for condition monitoring. Thus, classical filter-based methods require historical data or a priori knowledge to determine filter parameter. Compared with the classical approach, SK can automatically indicate the optimal frequency at which to perform amplitude envelope demodulation to obtain an envelope signal without requiring historical data or a priori knowledge. Thus, SK has become one of the powerful techniques for vibration signal analysis, especially for extracting periodic impulses induced by localized fault in rotating machine components, such as bearings [93][94][95][96] and gears [97,98].
Early research on SK can be traced back to 1983, and it  [101]. The application of SK to machinery fault diagnosis was first outlined by Antoni and Randall [102,103], who conducted a very thorough study of the definition and calculation of SK for this purpose. Their work served as the cornerstone of SK theory and its subsequent application in machinery fault diagnosis. In the following years, the application of SK to the fault diagnosis of rotating machines has attracted a considerable amount of attention.
The main purpose of SK techniques in machinery fault diagnosis is to generate filters to extract the periodic impulses (or impulse responses) from background noise or other interactions. The early research on SK in machinery fault diagnosis was based on STFT. The map formed by the STFT-based SK as a function of frequency and window length is called a kurtogram. However, all possible window widths should be enumerated to optimize the filter, which is computationally expensive and can hamper the practical application. On the basis of the multi-rate filter-bank and quasi-analytic filters, the fast kurtogram was further developed to carry out the computation quickly [104]. Since then, the fast kurtogram has been widely applied in machinery fault diagnosis because of its effective computation.
Benefiting from the development of the time-frequency analysis (TFA) techniques, some interesting SK methods have been studied over the years. One example is the wavelet transform (WT), which is used as an alternative for time-frequency decomposition and can be an equivalent of the kurtogram [105]. The Morlet WT has also been investigated as a filter bank to construct an adaptive SK filtering technique in order to extract the signal transients [106]. Considering that wavelet packet transform (WPT) could process nonstationary transient signals more efficiently than STFT, Lei et al. [107] replaced STFT with WPT to improve the original kurtogram. Chen et al. [5] presented a type of quasi-analytic wavelet tight frame (QAWTF), which is generated from dual-tree complex WT, to replace the multi-rate filter-bank or STFT to map a new kurtogram. The main merit of the QAWTF is that it can achieve finer frequency resolutions and more comprehensive frequency partition, while offering a good approximation of the four fundamental requirements for a feasible detection filter [104]. Various efforts to improve the performance of adaptive SK have also been presented in Ref. [108]. Most time-frequency decomposition-based SK techniques share the same idea that the optimal combination of center frequency and bandwidth can maximize the kurtosis of filtered signals, thereby generating filters to extract the most impulsive signals from background noise.
The original framework of SK is effective under some conditions. However, its performance is not very good when encountering strong noise or non-Gaussian noise, especially the fault-unrelated sporadic impulse, which causes the incorrect selection of the optimal filter [109][110][111]. Strategies to solve the problem can be generally divided into two categories: Preprocessing and SK indicator improvement strategies. The former uses other signal processing techniques to reduce the interference and thus improve the performance of the SK. An early attempt to use an autoregressive model (AR) to prewrite the signal in order to increase impulsiveness has been carried out by Randall [105]. Similarly, the AR model has been used as a preprocessing technique to remove the disturbance caused by discrete frequency noise, such as rotating frequency components [112]. Hence, the minimum entropy deconvolution (MED) technique is used to deconvolve the effect of the transmission path and clarifies the impulses and thus enhances the surveillance capability of SK to overcome overlapping impulse responses [113]. He et al. [114] studied a similar idea for multi-fault diagnosis, which helped construct enhanced kurtosis or improved filter procedure to comprehensively consider the composition of the complex vibration signal, thus reducing the effect of the interference to filter periodic impulses. To achieve this goal, Chen et al. [5] proposed an enhanced signal impulsiveness indicator called "spatial-spectral ensemble kurtosis," which simultaneously considers the Gaussian noise, harmonics, and sporadic impulse. Numerical validations, experimental tests, and engineering applications demonstrated that the proposed ensemble SK indicator is more robust than the original SK indicator. Smith et al. [111] used the knowledge of the bearing parameters to set the bandwidth, and then selected the optimal center frequency through a stepping process to overcome the electromagnetic interference. The method specified the bandwidth to be as narrow as possible so that the signal-to-noise ratio is maximized due to a wideband interference, after which the electromagnetic interference could be reduced, thus improving the impulse detection.
Inspired by the benefits of SK in impulse detection, the SK technique has also been combined with other advanced signal processing techniques to improve their performance. A kurtosis-guided adaptive demodulation technique for bearing fault detection based on tunable Q-factor WT has been presented in Ref. [115]. Patel et al. [116] dealt with the detection of local defects existing on races of deep groove ball bearing in the presence of external vibrations using envelope analysis and Duffing oscillator, in which they selected the key parameter of envelope analysis (i.e., the center frequency) using SK for filter lengths 32 and 64. Based on SK and cross-correlation, Tian et al. [117] presented a fault feature index using principal component analysis and a semi-supervised k-nearest neighbor distance measure for bearing fault detection and monitoring in electric motors. Moreover, apart from vibration signal analysis, the SK technique has also been investigated in other measurement methods, such as current signals [118][119][120] and AE signals [121,122]. Beyond the machinery fault diagnosis, SK has also been considered in other research fields [123][124][125]. Furthermore, the statistical properties of the SK estimator were also investigated [126,127].
In the current work, we used a practical application case to demonstrate the effectiveness of the SK method for machinery fault diagnosis [5]. To investigate the potential transient vibration features hidden in the vibration signal ( Fig. 5(a)), which were measured from a machine tool (detailed information in Ref. [5]), a dual-tree wavelet decomposition combined with the classical kurtosis indicator (evaluated in the time domain) was applied to process this signal. The corresponding optimal sub-band was selected as [5867,6400] Hz (Fig. 5(c)). However, by inspecting the associated time-domain signal, only a record of high-frequency noise was detected. "Spatial-spectral ensemble kurtosis" was introduced to enhance the processing result. The resulting kurtosis distribution is shown in Fig. 5(d). By retrieving the associated timedomain signal concentrated in the frequency band of [2400,2800] Hz and its envelope spectrum (Fig. 5(e)), we can find repetitive single-side damping components located with a constant interval of 0.075 s (13.33 Hz). This periodicity is exactly the same as the rotating frequency of the worm shaft according to the drive-chain parameters. Hence, the incipient fault features caused by a fault on the worm shaft are successfully extracted using an improved kurtosis method.

Sparse decomposition analysis
In the past 20 years, sparse theory has received considerable attention and made remarkable achievements in the field of signal and image processing. Recently, sparse decomposition (also called sparse representation or sparse regularization) has been widely used in fault diagnosis of rotating machineries. Particularly, sparse decomposition is a powerful tool in extracting the impulsive component of bearing. For example, when a fault occurs in a bearing, Fig. 5 (a) The time domain signal and (b) the frequency spectrum of the collected vibration signal; (c) kurtogram distribution using the original concept of kurtosis; (d) kurtogram distribution using an improved "spatial-spectral ensemble kurtosis"; (e) the retrieved fault features and (f) its envelope spectrum periodic or quasi-periodic impulses appear in the time domain of the vibration signal; meanwhile, the corresponding bearing characteristic frequencies (BCFs) and their harmonics emerge in the frequency domain [128]. However, in the early stage of bearing failures, the BCFs usually carry very little energy and are often hidden by severe noise and higher-level vibrations. Cui et al. [129] developed an adaptive matching pursuit algorithm for fault diagnosis of rolling element bearing, which established the dictionary according to the characteristics of rolling bearing faults. He et al. [130] proposed a new approach for fault diagnosis of rolling element bearing based on sparse representation, which helped construct the dictionary by using the unit impulse response function of the damped second-order system derived from the fault signal. For capturing the underlying structure of a machinery fault signal, Tang et al. [131] proposed a sparse representationbased latent component decomposition method for weak fault detection of rolling bearings and gears, which eventually generated the dictionary learning scheme. Zhang et al. [132] proposed an algorithmic framework based on nonlocal self-similarity for feature extraction of aero-engine bearings. Subsequently, Zhang et al. [133] proposed a weighted sparse model with convex optimization framework for bearing fault diagnosis. He and Ding [134] proposed a local time-frequency template matching method for bearing transient feature extraction. Wang et al. [135,136] used the sparse representation method with wavelet dictionary for extracting the transient feature in a faulty gearbox, in which wavelet was selected by correlation filtering. A comparison study demonstrated that the proposed sparse representation method outperformed the EMD in transient feature extraction [137]. Li et al. [77] proposed a sparse pursuit algorithm for pursuing the extension of the crack in wind turbine blade. Qiao et al. [138] proposed a novel force identification method based on sparse deconvolution, which proved to be more accurate and efficient than the common Tikhonov regularization method, considering the sparse nature of impact-force in the time domain. Subsequently, Qiao et al. [139] proposed a sparse representation frame of identifying force on mechanical structures; they used Dirac, Db6, Sym4, and B-spline dictionaries to represent the impact force and the discrete cosine dictionary to represent the harmonic force. Lin et al. [140] proposed a novel blade tiptiming method based on sparse representation for reconstructing unknown multi-mode blade. He et al. [141][142][143] introduced the periodic group sparse model for bearing fault diagnosis, which used the nonconvex penalty to explore sparser solutions.
Compressed sensing (CS), a method first proposed by Donoho, helps overcome the traditional Nyquist rate and enables the unique solution of the under-determined equations [144]. Tang et al. [145] developed a sparse classification strategy based on CS theory for rotating machinery faults, which helped construct a learning dictionary to represent the vibration signal. Chen et al. [146] proposed a new method based on CS for extracting impulse components in the fault gearbox; this method effectively learned the sparse dictionary from the noisy signal. On the basis of the union of redundant dictionary for wind turbine gearbox fault diagnosis, Du et al. [147] proposed a sparse feature identification method which, which identified multiple faults in the wind turbine gearbox. Chen et al. [148] also proposed a sparsityenabled signal decomposition method for fault localization of automatic tool changers. Wang et al. [149] proposed a CS-based sparse time-frequency representation (TFR) method for remote machine health condition monitoring, which proved to be useful in bearing and gear fault diagnosis. By considering the joint sparsity nature of impact-force in the temporal and spatial domain, Qiao et al. [150] proposed the compressed sensing frame for impactforce identification, which simultaneously identified multiple impact locations and force history from highly incomplete and inaccurate measurements.
In the current work, we used a practical application case (i.e., the transmission system in a wind turbine) to illustrate the effectiveness of the sparse decomposition method for machinery fault diagnosis [147]. Figures 6 and 7 show a typical result of using sparse decomposition methods to decompose a gearbox vibration signal into the harmonic component, impulsive components, and random components. The gearbox fault is shown in Fig. 4

Time-frequency analysis and wavelet transform
Compared with SK and sparse representation, TFA and WT are classical signal processing techniques in conducting machinery fault diagnosis; these are derived from inner product theory and are proven effective in nonstationary signal processing. Feng et al. [151] provided a review and summarized the development and applications of TFA in machinery fault diagnosis over the past year. Peng et al. [152] and Yan et al. [153] presented review papers in 2004 and 2014, respectively, on using wavelets as a powerful tool for signal analysis in fault diagnosis in rotary machines. Therefore, in this subsection, the current review mainly discusses the progress made over the past several years.
Synchrosqueezing transform (SST) is related to the time-frequency reassignment family and can effectively improve the readability of the TFR of nonstationary signals [154]. The WT-based SST was proposed by Daubechies et al. [155] in the context of audio signal analysis and was further studied as an alternative theoretical way to understand EMD with a convenient mathematical framework. Although standard time-frequency reassignment methods (STFRM) provide a direct and powerful TFR of nonstationary signals, signal reconstruction techniques using STFRM remain lacking [154]. By contrast, the SST improves the time-frequency energy concentration in a similar manner to STFRM, but most importantly, it remains invertible, thereby enabling mode reconstruction as in EMD. Li and Liang [156,157] first introduced the SST in gearbox fault diagnosis, in which they proposed a generalized SST for representing the time-frequency pattern of vibration signals to improve the blurred TFR caused by nonstationary operating conditions. Later, similar SST methods were used for fault diagnosis of wind turbine planetary gearbox [158] and bearing [159].
To overcome the shortcoming of the WT-based SST in the higher frequency region, Cao et al. [160] proposed a zoom SST to generate both excellent time and frequency resolution in a specific frequency region, thereby improving the instantaneous frequency (IF) estimation. The effectiveness of the zoom SST has been validated by rub-impact fault diagnosis. Variable operating conditions of machines always induce a vibration signal with fast varying IF, especially in significant speed changes. Even though rotating machines work in a stationary condition, some mechanical faults in rotating machines cause timevarying stiffness, thereby resulting in fast oscillation phenomenon of IF for the vibration signals [161,162]. To address this issue, the matching synchrosqueezing wavelet transform (MSWT) has been proposed, in which a chirprate estimation is introduced into a comprehensive IF estimation to match the time-frequency (TF) structure of the signals with fast varying IF, thereby attaining a highly concentrated TFR as the standard TF reassignment methods. Most importantly, the MSWT retains the reconstruction benefits of the SWT. The effectiveness of MSWT has been verified by a case study of a dual-rotor turbofan engine for aero-engine vibration monitoring [163] (Fig. 8). In weak signal detection, a special TFA method called nonlinear squeezing time-frequency transform (NSTFT) has been proposed [164,165]. Compared with the SST using reassignment strategy, the NSTFT combines two TF representations to emphasize the coefficient at the IF and to squeeze the coefficient around the IF. Moreover, the NSTFT is only relevant to the signal phase and is independent of the signal amplitude; thus, it can be used for weak signal detection and weak fault diagnosis.
A further indication of progress in the TFA over the past several years is the emergence of parametric TFA methods, which are applied in mechanical fault diagnosis. Peng et al. [166,167] systematically studied the parametric TFA, and first proposed polynomial chirplet transform and splinekernelled chirplet transform [168]. They then generalized warblet transform [169], and later parameterized TFA [170,171]. Benefiting from the advantage of the parametric TFA in improving TFR, Yang et al. used the method for wind turbine condition monitoring [172], and later used it for dispersion analysis for broadband guided wave [173] and system identification [174]. All parametric TFA methods use the parametric time-frequency basis function to approximate the analyzed signals, in which the more precise the approximation by parametric basis functions, the better the resulting TFRs. As opposed to parametric TFA methods, matching demodulation transform (MDT) does not have to devise ad-hoc parametric time-frequency basis functions, and can generate TFRs with satisfactory energy concentration with an iterative algorithm, gradually matching the true IF of the signal [175]. The effectiveness of the MDT has been verified by the application in rubimpact fault diagnosis [161].
As a special TFA method, WT is widely researched in the field of mechanical fault diagnosis. Recently, classical WT techniques have also been widely applied [176][177][178], and some new WT methods for mechanical fault diagnosis have been studied and introduced. Inspired by the systematic research on WT by Selesnick et al. [179][180][181], overcomplete rational dilation discrete WT and tunable Q-factor WT have been studied and applied for bearing and gearbox fault diagnosis [115,[182][183][184]. As opposed to the classical WT using a single wavelet function to capture fault-related features, the multi-wavelet concept offers multiple wavelet functions, thus matching one or more faults for diagnosis [185][186][187][188].
Although the techniques of TFA and WT for mechanical fault diagnosis have been researched for more than two decades, some challenges remain in using TFA and WT for mechanical fault diagnosis. For example, the essence of the TFA and WT is a type of inner product between the signal to be analyzed and time-frequency atoms or wavelet functions. The more similar the signal to the timefrequency atom or the wavelet function, the better the defect-related features to be extracted. The essential similarity between the impulse response caused by localized faults and the time-frequency atoms or wavelet functions guarantees superiority in fault feature extraction. Therefore, given that the TFA and WT have become increasingly mature and new theoretical contributions are being made, they will continue to be the most appealing techniques to dominate the field of mechanical fault diagnosis. TFA and WT are also considered powerful tools in SK and sparse representation.

EMD, LMD, and VMD
EMD is one of the most powerful signal processing techniques, particularly in nonlinear and non-stationary signal processing. At present, EMD and Hilbert-Huang transform (HHT) have been widely used in fault diagnosis of rotating machineries. Lei et al. [189] surveyed and summarized the recent research, development, and application of EMD in terms of key components, such as rolling element bearings, gears, and rotors. Babu et al. [190] applied HHT to detect the transverse breathing crack from time response of the cracked rotor passing to its critical speed. Lin and Chu [191] applied HHT on AE feature extraction of natural fatigue cracks induced on rotating shafts, and demonstrated that HHT is a better tool for conducting natural fatigue crack characterization compared with fast Fourier transform (FFT) and continuous WT (CWT). A past study investigated the start-up transient response of a rotor with a propagating transverse crack via EMD; the authors extracted the one-, two-, and three-time rotating frequency vibration components during the startup process [192]. Given that HHT has the capability of processing nonlinear vibration signals, Zhang and Yan [193] proposed an HHT-based signal processing method to obtain the natural frequency of the multi-cracks cantilever beam with a higher resolution. In Ref. [194], three signal processing tools, namely, STFT, CWT, and HHT, are compared to evaluate their detection performance and computational time in a rotor bearing system. Xu [195] proposed a methodology based on translation-invariant denoising and HHT to detect rolling element bearing faults against strong background noise. Li and Wang [196] summarized the development and application of HHT for solving the problem of rolling bearing fault diagnosis from several aspects. Lei et al. [197] introduced the enhanced empirical mode decomposition (EEMD) for fault diagnosis of rotating machineries, in which the problem of the Fig. 8 MSWT representation of vibration signal of the dual-rotor turbofan engine. The LPR and HPR represent the "low-pressure rotor" and "high-pressure rotor", respectively. The reconstructed signal shows the evidence for vibration jumping fault in the engine, as indicated by the arrow T1 [163] mixing modes is partially solved by adding white noise to the original signal. Feng et al. [198] proposed a new method based on EEMD and the Teager energy operator to extract the characteristic frequency of bearing fault, which demonstrated better performance than the traditional spectral analysis and the squared envelope spectral analysis methods. Meanwhile, Ricci and Pennacchi [91] introduced a merit index for the automatic selection of the intrinsic mode functions used to obtain the HHT spectrum, which they verified by using a spiral bevel gearbox with high contact ratio. Wu et al. [199] utilized the instantaneous dimensionless frequency normalization and HHT to characterize the different gear faults, including worn tooth, broken tooth, and gear unbalance, under variable rotating speed levels. Furthermore, the support vector machine (SVM) has been used to classify the different gear faults.
Another adaptive time-frequency method, namely, local mean decomposition (LMD), has been applied to decompose the non-stationary signal into a number of product functions. LMD was developed by Smith [200] in 2005 and was originally used as a TFA tool of the encephalogram signals. The LMD method is similar to the EMD method, but the former is actually better than the latter in certain aspects. In Ref. [201], LMD has been proposed for rub-impact fault diagnosis, which can extract the transient fluctuations of the IF of the fundamental harmonic component. In Ref. [202], the authors applied the LMD method to the gear and roller bearing fault diagnosis and proved that LMD has better performance compared with EMD [202]. Feng et al. [203] proposed a joint amplitude and frequency demodulation method based on LMD for fault diagnosis of planetary gearboxes, whereas Liu and Han [204] used LMD to decompose the non-linear and non-stationary fault bearing signals into a series of product functions for feature extraction. Variational mode decomposition (VMD) is a newly developed technique for adaptive signal decomposition, and can non-recursively decompose a multi-component signal into a number of quasi-orthogonal intrinsic mode functions. Wang et al. [205] proposed a novel method for the rub-impact fault diagnosis of the rotor system based on VMD, and proved that multiple features can be better extracted with the VMD than empirical WT (EWT), EEMD, and EMD.

Intelligent diagnostics
Traditionally, fault diagnosis requires expertise in the specifics of diagnostic application. Thus, highly trained and skilled personnel are needed. Various artificial intelligence (AI) techniques have emerged in the field of fault diagnosis. Intelligent fault diagnostics simulate the inference process of the thinking pattern of the human. Thus, by capturing, transferring, and processing, the diagnosis information, the operation condition, and fault of the monitoring machine can be decided intelligently.
Intelligent fault diagnostics also enable the learning and automatic capture of the diagnosis information for providing real-time diagnostics. The intelligent fault diagnostics technologies and practical diagnosis systems in the assessment of complex mechanical equipment are crucial in the conduct of mechanical fault diagnostics. Numerous intelligent system approaches for fault diagnosis have been developed, such as artificial neural network (ANN), support vector machines (SVMs), particle swarm optimization (PSO), deep learning, and Bayesian networks [206]. In the following section, different fault intelligent diagnostic approaches are discussed, with emphasis on various AI and statistical approaches. 5.1 AI approaches AI approaches have been increasingly applied to mechanical fault diagnosis and have improved system performance over conventional approaches [207]. Numerous studies have been conducted on intelligent diagnosis of rotating machineries. Among these studies, ANNs are one of the most commonly used methods; these employ signal processing techniques for fault extracting features and further input the features to ANNs for classifying faults [208]. Various neural network (NN) models are available. The feedforward neural network (FFNN) structure is the most widely used NN structure in mechanical fault diagnosis [209][210][211]. Gebraeel and Lawley [212] proposed a NN-based degradation model that uses real-time signals to estimate the failure time of partially degraded components, which they then validated on rolling element bearings. Vyas and Satishkumar [213] used an ANN with a back-propagation learning algorithm to detect unbalance, misalignment, and roller bearing looseness in a small-scale test-rig. Jack and Nandi [214] compared NNs and support vector machines in condition monitoring applications. Saravanan et al. [215] attempted fault diagnosis of spur bevel gear box by extracting features using WT, which they then used as NN inputs for classification purposes. The results showed that the developed method can reliably diagnose different conditions of the gearbox. Nguyen et al. [216] applied genetic algorithm (GA) for optimal feature selection in mechanical fault detection of induction motor. Based on specific distance criteria, they introduced GA to reduce the dimension of features. Another study used the decision tree and multi-class support vector machine to illustrate the potentiality and efficiency of the classification method. Spoerre [217] applied cascade correlation neural network (CCNN) to bearing fault classification, and found that CCNN can apply the minimum network structure for fault diagnosis with satisfactory accuracy. Other NN models applied in fault diagnostics are backpropogatation neural network (BPNNs) [218], recurrent NN [219], and counter propagation NN [220]. The above ANN models usually employ supervised learning algorithms that require external inputs, such as prior knowledge about the target or desired output. Nyanteh [221] developed a novel approach to short-circuit fault detection in a permanent magnet synchronous machine using ANNs, in which the PSO algorithm is applied to increase convergence time of ANN weights. Samanta [222] extracted time-domain features and used three optimized NNs to detect pump faults. Wang and Too [223] applied the unsupervised NNs, selforganizing map (SOM), and learning vector quantization in rotating machinery fault diagnosis. Wang et al. [225] proposed a method of fault diagnosis for non-stationary fault signals of rotating machineries, which used EEMD and a SOM NN to extract features and classify them, respectively. Support vector machine is a relatively new computational learning method that is based on statistical learning theory and has been widely used in mechanical fault diagnosis. Windodo and Yang [206] surveyed the application of SVM in mechanical fault diagnosis including rolling element bearings, induction motors, machine tools, pumps, compressors, valves, turbines, and so on. Yang et al. [225] applied artificial bee colony algorithm for SVM parameter optimization of gearbox fault diagnosis, and found that the accuracy of the artificial bee colony algorithm is higher compared with GA and PSO. Widodo et al. [226] studied the incipient fault diagnosis of lowspeed bearings using multi-class relevance vector machine and SVM. Another study [227] employed the Hilbert transform-based envelope spectrum analysis to extract fault bearing features, and then used the improved SVM to classify the fault rolling bearings into ball fault, inner race fault, and outer race fault. Liu et al. [228] proposed a novel model for fault diagnosis based on EMD and multiclass transductive SVM, which they applied to diagnose the faults of the gearbox. Moreover, Samanta and Nataraj [229] used time-domain features to characterize the bearing health conditions and then used ANNs and SVM for bearing fault diagnosis. Meanwhile, Seera et al. [230] proposed an ensemble of hybrid intelligent models for condition monitoring of induction motors; the model consisted of the fuzzy min-max NN and the random forest model, which comprises an ensemble of classification and regression trees. Shen et al. [231] proposed a new intelligent fault diagnosis scheme based on the extraction of statistical parameters from a wavelet packet transform, a distance evaluation technique, and a support vector regression-based generic multi-class solver. Another study [232] applied wavelet packet decomposition to clean the noisy signals, and then extracted the informative feature vectors by using EEMD. Finally, the states of the bearings are classified by SVM. Rajeswari et al. [233] applied EEMD for signal processing and feature extraction, hybrid binary bat algorithm for feature selection, and machine learning algorithms for classification purposes in gear fault diagnosis.
In practice, applying AI approaches in mechanical fault diagnosis is not easy due to the lack of efficient procedures for obtaining the training data and specific knowledge, which are required to train the models. At present, most of the applications in the literature simply use experimental data for model training [229]. Although these methods work well in intelligent fault diagnosis, they retain two deficiencies: (1) The features are manually extracted depending on considerable prior knowledge about signal processing techniques and diagnostic expertise, and (2) the ANNs adopted in these methods have shallow architectures, thereby limiting the capacity of ANNs to learn the complex non-linear relationships in fault diagnosis issues [234].

Deep learning
As a breakthrough in AI, deep learning holds the potential to overcome the aforementioned deficiencies and can automatically map input samples into hierarchical feature representations. The fault diagnosis method based on deep architectures results in fault feature extraction becoming inessential. Certain deep learning methods such as deep belief network (DBN) and deep convolution neural network, have been developed to conduct machinery fault diagnosis.
Recently, deep neural network (DNN) has become a popular approach in machine learning for its promised advantages such as fast inference and the ability to encode higher-order network structures. Although ANNs require supervised learning, DNNs work well with the help of unsupervised learning. DNN with the deep architectures can adaptively capture the representative information from raw signal via multiple nonlinear transformations and approximate complex nonlinear functions with a low error [208]. DBN uses a hierarchical structure with multiple stacked restricted Boltzmann machines and works by a layer-by-layer successive learning process [23]. Ma et al. [235] applied DNN for bearing acceleration life test, which used the time-domain and frequency-domain features as raw inputs. Tao et al. [236] proposed DBN for bearing fault diagnosis by using multi-sensor information, in which time-domain statistical features from three sensors served as the inputs. Chen et al. [237] applied DBN-based DNN for gearbox fault diagnosis, in which a feature vector consisting of load and speed measure, time-domain, and frequency-domain features served as inputs. Shao et al. [238] proposed DBN for induction motor fault diagnosis, which directly selected raw vibration signals as inputs. Tran et al. [234] proposed an approach to fault diagnosis of reciprocating compressor valves based on Teager-Kaiser energy operator and DBN. Meanwhile, Tamilselvan and Wang [239] presented a multi-sensor fault diagnosis method for health state classification via DBN. Aircraft engine health diagnosis and electric power transformer fault diagnosis have been used to demonstrate the advantages of the proposed approach over SVM, backpropagation neural network, SOM, and Mahalanobis distance. Jia et al. [208] presented a DNN-based intelligent method for diagnosing the faults of rotating machineries, the performance of which they verified in fault classification in five datasets from rolling element bearings and planetary gearboxes. Gan et al. [240] proposed a novel hierarchical diagnosis network based on deep learning for the fault pattern recognition of rolling element bearings. The experiment demonstrated that the proposed method performed fault classification more excellently than did BPNN and SVM. Li et al. [241] proposed a deep statistical feature learning method for detecting faults and fault patterns of rotating machineries, which has a better fault classification than SVM. Guo et al. [242] presented an automatic denoising and feature extraction method based on deep learning. Bearing rolling fault and gearbox fault experiments demonstrated that the proposed deep fault recognizer method had higher accuracy than DBN without denoising. Ahmed et al. [243] applied DNN frameworks with two and three hidden layers based on sparse Autoencoder for automatic fault detection and classification of bearings.

Statistical approaches
Uncertainties, such as measurement noise, environment fluctuation, operational variability, and other factors from feature estimation algorithms, are inevitable in mechanical fault diagnosis. In this case, probabilistic models can be established. The hidden Markov model (HMM) is an effective pattern recognition method that has been widely used in speech recognition, visual recognition, and fault diagnosis. HMM is a joint probabilistic model of a set of random variables representing the hidden states as state variables given the observation sequence. Xin et al. [244] studied the rolling element bearing diagnostics by using HMM and validated its performance via numerical experiments. Bunks et al. [245] applied HMM to analyze the Westland helicopter data, including gearbox fault class information and vibration response with different faults. Another study treated the fault classes and measured vibration signal as states in the hidden Markov chain and as realizations of the observation process, respectively. Dong and He [246] proposed a more general model, hidden semi-Markov model, for analyzing pump experimental data in pump diagnostics. Xu and Ge [247] presented an intelligent fault diagnosis system based on an HMM. Ye et al. [248] considered the application of two-dimensional HMM based on TFA for fault diagnosis. Zhou et al. [249] proposed a new fault diagnosis model for rolling element bearing based on shift-invariant dictionary learning and HMM. The method has been proven to have better performance than the k-nearest neighbor and BPNN in terms of feature extraction or classifiers.
Baydar et al. [250] investigated the use of a multivariate statistical technique, known as principal component analysis (PCA), for analyzing the time waveform signals in gear fault diagnosis. González and Fassois [251] proposed a novel supervised PCA-type statistical methodology for damage detection, by using data records from the healthy and damaged states of a scale wind turbine blade under various conditions. Mao and Todd [252] presented a statistical model for quantifying the uncertainty of transmissibility (output-to-output relationship) magnitude estimation. Song et al. [253] proposed an intelligent condition diagnosis method for rotating machineries using the probability density analysis and the canonical discriminant analysis. Lei et al. [254] presented a new intelligent fault diagnosis approach based on statistics analysis, an improved distance evaluation technique and adaptive neuro-fuzzy inference system, which they then applied in fault diagnosis of rolling element bearings.
Wang et al. [255] proposed a Bayesian network for diagnosing the faults in a gear train system, in which six time-domain features are selected as the input to the Bayesian network. Mao and Todd [256] proposed a Bayesian recursive framework for ball-bearing damage classification, and selected the frequency response function as the main features. Wang et al. [257] proposed a Bayesian approach to extract bearing fault features, which represented a joint posterior probability density function of wavelet parameters using a set of random particles. Subsequently, Wang et al. [258] proposed a Gauss-Hermite integration based Bayesian inference method for estimating the posterior distribution of wavelet parameters.
Bearing fault experiments demonstrated that the proposed method has better visual inspection performance than the fast kurtogram.

Machinery fault diagnosis in China
Compared with other developed countries, China is relatively later in terms of the research and application of machinery fault diagnosis technologies [90]. However, many research universities and institutions in the country have undertaken many efforts towards machinery fault diagnosis. The early research on machinery fault diagnosis in China began at Xi'an Jiaotong University, Tsinghua University, Shanghai Jiao Tong University, Huazhong University of Science and Technology, Harbin Institute of Technology, Northwestern Polytechnic University, Northeastern University, and Dong Fang Turbine Co., Ltd., among others. Since the 1960s, exploratory development of fault diagnosis method has been carried out. Notable progress has been made in the sub-fields of reliable signal acquisition and advanced sensing technology, failure mechanism, fault feature extraction, intelligent diagnosis of complex mechanical equipment as well as the R&D of the practical diagnostic system, and so on. Diagnostic technologies continuously improve and new technologies are emerging. In this section, although not all the details are covered, it attempts to briefly summarize the contributions of Chinese researchers on machinery fault diagnosis.
Qu and co-investigators [259][260][261], using the synthetical application of the multi-sensor information fusion technique in the field of rotor balancing, proposed a new analysis method based on field balancing method in 1989. This method uses FFT spectra and combines the ordinary spectra of rotor vibration both in horizontal and vertical directions. Unlike the traditional FFT spectra in rotor vibration monitoring, their proposed method synthetically uses frequency, amplitude, and phase information. Therefore, they called it "holospectrum" because it realizes a full utilization of the rotor precession information. The holospectrum was widely used for the diagnosis of many oil refineries and chemical plants in China in the 1990s. Qu's group was awarded the 2nd Prize in the National Award for Technological Invention in 2003 by the Chinese government for their ongoing work. This award is one of the highest state-initiated science and technology awards.
To overcome the difficulties in traditional finite element method (FEM) for solving crack singular problems, He and co-investigators [262][263][264][265] derived wavelet finite element methods (WFEM). Compared with traditional FEM, WFEM has several advantages for modal analysis of crack problems. One attractive feature is that WFEM has the ability to accurately represent general functions with a small number of wavelet coefficients and to characterize the smoothness of such functions from the numerical behavior of these coefficients. Furthermore, given that the condition numbers of WFEM is independent of mesh size, WFEM can avoid numerical instability in traditional FEM in the analysis for the crack problems. In addition, when orthogonal Daubechies wavelet functions with compact support are used as interpolation functions, the stiffness matrixes generated by WFEM are sparse, thereby making the computational time considerably shorter. He's group was awarded the 2nd Prize during the National Award for Technological Invention in 2009 for contributions related to the identification of cracks in a rotor system based on WFEM.
To prevent and eliminate faults of machines by engineering means, Gao and co-investigators [266][267][268] proposed the fault self-recovery theory based on the systematic theory and the idea of "self-recuperation" therapeutic method in modern medicinal science. In their investigations, the machines can heal themselves when malfunctions occur, as can human beings and living animals. Thus, mechanical faults can be controlled and eliminated during the machine's runtime, thereby shortening the downtime of machines. The fault self-recovery theory can provide a theoretical basis for developing a new generation of machines that have the self-recovery ability.
Wen and co-investigators [269][270][271][272][273] constructed the concept and the theoretical framework of vibration utilization engineering following a long research period.
The utilization of vibration and wave is regarded as one of the most valuable technological applications and has been rapidly developing in recent years. In their work, they developed and studied several new craft theories and techniques, and the results have been widely used in engineering. Their work on vibration utilization engineering has been summarized in six books and over 400 research papers.
The machinery fault diagnosis technique has created huge social and economic benefits because it is closely related to the industry. We end this section by presenting the number of sponsored programs and the total number of awards given since 2006 (Fig. 9) [274]. The key programs for machinery fault diagnosis sponsored by the National Natural Science Foundation of China (NSFC) since 2011 are also listed in Table 2 [274]. As can be seen, the number of sponsored programs and the number of awards given have increased greatly since 2011.

Research trends and challenges
At present, the problems related to the basic research on machinery fault diagnosis can be summarized in "eight more and eight less" as follows: More to study fault behavior, less to failure mechanism; more to study rotating machineries, less to reciprocating machineries; more to study general machineries, less to specialized machineries more to study single method, less to comprehensive diagnosis; more to study component-level fault, less to system-level fault; more to study obvious fault, less to weak fault; and more to study simulation data, less to engineering data. Therefore, breakthroughs related to the basic research of machinery fault diagnosis in these five directions must be realized: Breakthroughs from behavioral research to mechanism study, from qualitative to quantitative research, from single to group fault research, from severe to weak fault research, and from componentlevel to system-level fault research.

From behavioral research to mechanism study
Based on the theory of "what you see is what you get" in the research, only sparse knowledge is obtainable about the interpretation and diagnosis of mechanical faults. The failure mechanism is the root cause of the reflection of the fault in nature. Therefore, further scientific research on failure mechanisms is needed. Given the lack of previous samples, mechanical faults of new equipment may be ignored with a traditional diagnosis method.
Considering the rapid development of science and technology, many novel, large-scale, and high-speed mechanical equipment are being developed and widely applied in practical fields, such as wind power equipment, industrial gas turbine, railroad locomotives, aircraft power transmission, and shield tunneling machine. As regards the mechanical, electrical, and hydraulic systems in these novel rotatory and reciprocate mechanical equipment, the fault mechanism and evolutionary dynamics under special operational conditions must still be analyzed and researched. For example, for typical misalignment faults, we need to build mathematical and mechanical models and experimental platform, as well as to study the failure symptoms and frequency spectrum characteristics. As regards the research results based on clearance mechanism dynamics, we must study the frequency spectrum characteristics that correspond to different clearance sizes, as well as build the quantitative relationship between clearance size and signal features for guiding the fault diagnosis of clearance mechanism. Therefore, future fault diagnosis will certainly focus more on mechanism research.

From qualitative to quantitative research
The procedure of fault diagnosis has four levels: First, we identify whether a fault exists; second, we position the fault; third, we evaluate the damage degree of the failure; and finally, we predict residual life and assessing reliability. The first two layers are called qualitative research, and the last two layers are called quantitative research. The former is the basis of the latter. The third and last layers are closely linked because residual life prediction and reliability assessment can never be achieved without precise damage degree evaluation.
The quantitative research of faults requires the recognition of fault locations, types, and degrees; the law in fault occurrence, development, and evolution is found. Therefore, providing the fundamental basis of mechanical equipment safe analysis, reliability assessment, and residual life prediction is possible. For the classical structure of major equipment, such as aero engine rotor, large aircraft frame, large wind turbine gearbox, and classical composite construction, first, we should carry out dynamic online diagnosis of crack damage. Then, based on the quantitative diagnosis of crack damage, we should study the state degradation recognition and residual life prediction. Therefore, the focus of fault diagnosis research is expected to shift from qualitative research to quantitative research.

From single to group fault research
The diagnosis of a single fault is mainly based on signal processing methods, through which the vibration signal  features and the frequency spectrum of other interference element can be easily divided. Therefore, the diagnosis of single fault can be easily implemented. However, its low accuracy and poor generalization limit its application in the field of industrial engineering. Furthermore, failures may be due to several reasons, especially the failure of rotating machineries. Therefore, diagnosing a single fault of mechanical equipment can lead to false diagnosis or misjudgment. Such failures as abrasion, peeling off, and cracking of the mechanical equipment core part consistently occur simultaneously and successively. The vibration signals are always performed as the inter coupling of fault characteristic signals instead of the simple superposition of multisingle faults. The generation of fault mass can bring much more difficulties in fault diagnosis and, hence, it is expected to be the main development direction of future fault diagnosis. In fact, fault mass diagnosis is a problem of multi-fault pattern recognition, and we need to study the one-time separation and diagnosis method for fault mass coupling features.

From severe to weak fault research
Severe fault means that the mechanical fault has been developed to late stage with obvious fault features, and the performance degenerates, thereby leading to a major accident if we do not deal with it in time. By contrast, for the fault diagnosis of this stage, fault features can be extracted easily, and the fault conditions can be easily recognized. Major accidents can be avoided if the underlying reasons are diagnosed in a timely manner. However, the meaning of mechanical fault diagnosis is providing "treatment protocols" instead of "death certificates." Despite this situation, the severe fault diagnosis (late stage diagnosis) is the "death certificate" of mechanical equipment definitely. Therefore, engineers and managers must master the degradation process of the equipment and the dynamic evolutionary process of the failure, check erroneous faults at the outset, and take the corresponding remedial actions for different fault conditions. In other words, we must transform from strong fault research to weak fault research.
The weak fault is the fault in the early stage or potential fault, whose symptoms are not obvious and feature information is weak. The weak feature maybe occurs that even the mechanical fault is in the later stage, but the fault information is submerged by noise, thereby leading to the fault feature weakening and the difficulties in recognizing such faults. Therefore, future weak fault diagnosis must study the effective weak fault feature enhance methods and feature extraction methods with strong noise. To extract weak faults accurately, the mapping relation of fault evolution process and signs should be studied to ensure the precision and effectiveness of weak feature extraction.

From component-level to system-level fault research
The component fault of mechanical equipment is mainly focused on monitoring and diagnosing the faults for the key components, such as gears, bearings, rotors. However, the interaction between mechanical systems is often the root cause of failure. The fault diagnosis of components can only find out the induced failure, but cannot completely cure the hidden problems of mechanical systems. Therefore, future research should first regard the mechanical equipment as a multilayered, non-linear complex whole. First, the complex multi-dimensional and multi-parameters system model is built. Then, processing from the system integrity and relation occurs, as do studying the dynamic characteristic, interrelation and dependencies of different parts; and obtaining the primary results of components fault. Finally, determining the root cause of system failure and the primary failure occurs, thereby resulting in the complete curing the hidden trouble of mechanical system.
Given the growing popularity of condition monitoring, prognostics, remote fault, the Internet of Things, Industry 4.0, and cloud computing, the volume of data available for fault diagnosis has significantly increased. This large volume of relevant data is now referred to as "big data" [1]. At present, quantitative studies are lacking to understand the essential characteristics of the complexity of big data. The traditional signal processing methods are not effective in executing big data processing. Hence, the key challenges in handling this high volume of data are as follows: Diversity in data types (variety), uncertainties in the data (veracity), and in some cases the speed of data collection and decision making (velocity) for fault diagnosis purposes [1]. The increasing amount of data collected requires the development of new fault diagnosis models. Compared with the conventional data-driven methods that are unable to handle large-scale data, deep learning is suitable for processing large-scale data. Popular initiatives worldwide have focused on mechanical fault diagnosis. In this case, renowned research groups have made changes based on the new situations emerging in the industry. For example, the Center for Intelligent Maintenance Systems in the US, as a National Science Foundation Industry/University Cooperative Research Center (I/UCRC), is a leader in the discovery of new methods to assess machine degradation and predict the health of industrial systems including e-manufacturing, e-maintenance, cyber machine systems, cloud-based machine monitoring and manufacturing, intelligent cyber machine systems, and so on [275].

Concluding remarks
Machinery fault diagnosis is currently far from being considered a complete subject. Fundamental research on machinery fault diagnosis and breakthroughs in the relevant technologies are motivations for promoting its development. In the near future, the basic research on machinery fault diagnosis should be based on engineering applications, various related research, the proposed solutions for scientific problems, and on independent innovations. Furthermore, programming and establishing the standard database of fault diagnosis should be encouraged. The repetition of construction and research can be avoided by sharing the typical engineering cases as well as the standard experimental data, algorithms, and verification models. Finally, many key issues can be examined through basic research that adopt the discoveries made in the fields of mathematics, information, mechanics and materials science, thereby leading to a deeper extension of current research on machinery fault diagnosis.