A Feature Inherited Hierarchical Convolutional Neural Network (FI-HCNN) for Motor Fault Severity Estimation Using Stator Current Signals

Motors, which are one of the most widely used machines in the manufacturing field, take charge of a key role in precision machining. Therefore, it is important to accurately estimate the health state of the motor that affects the quality of the product. The research outlined in this paper aims to improve motor fault severity estimation by suggesting a novel deep learning method, specifically, feature inherited hierarchical convolutional neural network (FI-HCNN). FI-HCNN consists of a fault diagnosis part and a severity estimation part, arranged hierarchically. The main novelty of the proposed FI-HCNN is the special inherited structure between the hierarchy; the severity estimation part utilizes the latent features to exploit the fault-related representations in the fault diagnosis task. FI-HCNN can improve the accuracy of the fault severity estimation because the level-specific abstraction is supported by the latent features. Also, FI-HCNN has ease in practical application because it is developed based on stator current signals which are usually acquired for a control purpose. Experimental studies of mechanical motor faults, including eccentricity, broken rotor bars, and unbalanced conditions, are used to corroborate the high performance of FI-HCNN, as compared to both conventional methods and other hierarchical deep learning methods.


Introduction
Motors are widely used in manufacturing applications that require a rotating force due to their low cost and high reliability. In spite of their high reliability, motors are subjected to mechanical and electrical faults because of their exposure to unexpected stresses, such as in-use damage and environmental conditions. The degradation of motors can lead to deterioration in product quality, therefore it is crucial to diagnose the motor state and evaluate the fault severity [1].
To cope with these problems, motor current signature analysis (MCSA) has been studied for fault diagnosis (FD) and severity estimation (SE), due to its ease of implementation [2]. In particular, SE is crucial to enable proper maintenance decisions before a failure of the system. For condition-based maintenance, SE can be easily extended to fault prediction by estimating the growth of the severity [3]. Fault severity is usually defined by the size of the fault; thus, the degradation behavior of the feature is analyzed for SE [4,5]. Most MCSA techniques for FD and SE have been developed based on the domain knowledge in general; they can be categorized into physics-based and data-driven approaches. Physics-based MCSA mainly derives spectral features that identify the particular fault using motor-specific parameters. For example, mathematical models have been formulated to investigate inter-turn shorts in stator windings [6]; further, the stator current spectrum was analyzed for SE of unbalance, eccentricity, and bearing faults [7]. These methods can be applied to generic motor systems; however, real-world applications are limited because specific motor expertise-which is not easily known-is necessary. Studies on data-driven MCSA, on the other hand, make an effort to extract fault-sensitive features and apply the proper learning method. Many signal processing methods, including wavelet decomposition [8], discrete wavelet transform [9], and empirical mode decomposition [10] have been used to design significant features. Several artificial intelligence methods have been applied to learn the manual features, for example, genetic algorithm [11], support vector machine (SVM) [12], and artificial neural network [13]. These methods do not require motor-specific expertise; however, they do require complicated signal processing techniques (which are labor-intensive) to create meaningful, handcrafted features. Therefore, the lack of domain knowledge hinders both MCSA approaches and results in suboptimal features that have difficulty discriminating fault modes and evaluating the fault severity. The analysis becomes even more complicated in cases with multiple fault modes with various severity levels. Thus, new FD and SE research is needed to further study and improve their performance in situations with minimal domain knowledge.
With this in mind, deep learning (DL), which can be a part of data-driven approaches, can help to ease the problem of limited domain knowledge. In particular, the autonomous feature extraction of DL has brought splendid results in many machinery health monitoring situations [14][15][16][17][18][19]. The convolutional neural network (CNN) approach, which is known as one of the most effective DL models, has demonstrated powerful performance with vibration signals for rotating systems, such as bearings [20,21] and gearboxes [22,23]. In the case of motors, several previous studies have mainly focused on devising efficient input data for training CNN models using vibration signals [24][25][26]. For stator current signals, however, there exist only a relatively small number of studies. For example, Ince et al. [27] used a 1-D CNN architecture to detect a motor bearing cage fault. In [28], SincNet was adopted to classify multiple faults, including broken rotor bars and bearing faults.
In this paper, a DL-based SE method for mechanical motor faults is proposed using stator current signals. We call the new method feature inherited hierarchical CNN (FI-HCNN). The structure of the proposed model uses a hierarchical CNN (HCNN) that follows the flow of performing SE after FD; further, it enhances the performance of SE through a novel connecting architecture. Each SE module of FI-HCNN learns the level characteristics of particular fault modes from the latent features, which are representations formulated in the FD module. This is possible due to the structure of the proposed FI-HCNN approach, where the latent features in the FD module are used as inputs to the corresponding SE modules. Moreover, the proposed FI-HCNN considers the continuity of fault severity by learning SE modules with regression. Although the severity can be computed by probability interpolation of each fault severity in the case of classification, the premise of linearity of all fault severities is required for this approach. Two main contributions are made in this research: (1) To the best of our knowledge, this is the first time a stator current signal has been applied to the hierarchical DL structure to diagnose and evaluate motor faults. While existing studies on DLbased motor FD use vibration signals, we propose a new DL method for FD and SE of induction motors based on the stator current signal. (2) A special connecting structure is suggested and its benefits for SE are explored. Through this connection, the latent feature spaces in the prior module are transferred to the subsequent modules and used to support the SE module as it captures the sophisticated level-specific features of a particular fault. The performance of FI-HCNN is analyzed by examining its performance compared to both conventional MCSA methods and other HCNN methods.
The remainder of this paper is organized as follows. The related work that is necessary to understand the proposed FI-HCNN method is detailed in Sect. 2. Section 3 describes the developed FI-HCNN method. The effectiveness of the developed method is discussed in Sect. 4, including comparisons with several existing methods and experimental validation. The conclusions are presented in Sect. 5.

Convolutional Neural Networks (CNN)
A CNN consists of three types of layers; the convolution layer, the pooling layer, and the fully connected (FC) layer. These layers operate differently from a multilayer perceptron (MLP), which is formed with sets of FC layers. CNN has two distinctive properties that make its performance better in terms of memory and statistical efficiency, specifically: (1) sparse connectivity, and (2) parameter sharing [29]. Sparse connectivity refers to the fact that each node in a layer is connected to a limited number of previous nodes by a filter (also called kernels) smaller than that of the input. This architecture-where the filters in a layer are just connected to the nodes in the receptive fields (not connected to all inputs)-encourages the filters in the frontal layer to concentrate on low-level features and combine them into high-level features as the layers are stacked. Parameter sharing means that the weights in a filter are applied equivalently in one layer. These two properties are illustrated in Fig. 1. Through these two properties, CNN learns the most adaptive filters for the objective of the model and configures the sets of the feature map.

Hierarchical Networks
A hierarchical network consists of a parent and two or more child modules. In image classification, several hierarchical models have been proven effective by categorizing the superclass in the parent module and classifying the fine classes in the child modules. Figure 2 depicts the schematic of a hierarchical network, where the total number of classes is N 1 + N 2 + … + N k ; these can be categorized into k superclasses. For example, when the superclasses are set to "animal" and "building" the possible fine classes could include "cat" and "dog" for the former, and "schools" and "hospitals" for the latter. In the field of computer vision, several studies have developed algorithms to construct appropriate superclasses and classify images. In [30], the tree-based priors encouraged transfer of the input to the related classes. The hierarchical exclusive graphs in [31] classified the large-scale objects with theoretical interpretations. In [32], the algorithm was able to pretrain the fine classes independently by using a combination of shared low-level features and additional input.

The Proposed Feature Inherited Hierarchical CNN (FI-HCNN) Method Using Stator Current Signals
This section details the proposed feature inherited hierarchical CNN (FI-HCNN) method. First, the special connected architecture of the hierarchical learning model is explained and the overall hierarchical structure, which consists of an FD module and several SE modules, is described.

Feature Inheritance Architecture
When the hierarchical network (as explained in Sect. 2.2) is applied for machinery health monitoring, the parent module can be matched to FD, and the child modules matched to SE for each fault mode. When the FD module and SE modules are deployed in the hierarchy, they reflect two different objectives, respectively: first, classifying a particular fault mode and then estimating its severity. In contrast to the ordinary hierarchical architecture, the proposed FI-HCNN delivers the latent features ( ̂ ) from the FD to the SE module. This concept is called feature inheritance. As shown in Fig. 3, the input data x evolves into learned representations that contain rich characteristics for the particular fault mode (C k ) in the FD module. These representations refer to ̂ . ̂ are used as the input to the SE module of C k ; they are learned to be regressed to the severity of C k ( S C k ) through C k 's SE module. Specifically, ̂ are the values calculated from the last pooling layer in the FD module. When the filters of the FD module are trained to highlight the characteristics of the fault based on the input data, ̂ -by passing through these filters-they are expected to develop into the features that contain significant and intensive abstractions about the particular fault mode. By extending without discarding ̂ , the learning of the SE modules can be more focused on capturing higher-level features; this can support the regression of fault severity. Therefore, the transmission of ̂ helps learn the degree of a specific fault and leads to enhanced SE performance.

A Hierarchical Structure for Fault Diagnosis (FD) and Severity Estimation (SE)
Using feature inheritance, which is the key idea of FI-HCNN, the overall hierarchical structure is configured as shown in Fig. 4. The proposed FI-HCNN method consists of three parts: (1) preprocessing, (2) FD, and (3) SE. Each fault mode has its own SE module, while the normal state does not go through any additional modules. x denotes the preprocessed current data, ̂ signifies the latent features, W FD and W SE are the weight matrices of the FD and SE modules, respectively, C is the fault mode, and S is the severity of each fault mode. The severity ranges from 0 to 1.

Part 1. Preprocessing
Before the hierarchical network starts learning, four steps of preprocessing (resampling, augmentation, normalization, and scaling) are executed on the raw current signals. First, the resampling adjusts all data by interpolation to have the same amount of information under the same operating conditions; this makes each datum unit have the same points in a revolution. Second, data augmentation is conducted by overlapping the amount of data of one revolution. This augmentation, which conserves the periodic characteristic of the current signal, not only has a positive effect on performance, it can also help the filters in the model to learn the relevant features. Third, normalization, which subtracts the self-mean and divides the total standard deviation, is used to homogenize the data of each experiment. Finally, the amplitudes of the current signals are scaled from − 1 to 1. The scaling of current signals allows expandability to signals from different sized motors and a decrement in the uncertain effects of the load torque condition.

Part 2. Fault diagnosis (FD)
After preprocessing, the refined current data enters the FD module. The FD module consists of three convolution layers, max-pooling layers, and one FC layer. Through the three convolution and max-pooling layers, the input data can be formulated as the features that reveal the fault characteristics. The FC layer is learned to classify the features to the fault mode. The task of the FD module can be explained as p Ĉ | , ∼ p(C| ) . The optimum can be achieved by minimizing the loss of the FD module (L FD ), given as where 1 is a coefficient of the L2-normalization and the loss is computed via cross-entropy; this is because the FD module tackles the problem of discrete classification. While the dimensions of the features decrease as they pass through the pooling layers, the number of features increases due to the increased filters as the layer becomes deeper. In addition, ELU activation is used in all convolutional layers to encourage the information under 0 to be conserved; this is defined as Both the ELU activation function and the increase in the number of filters according to the layer depth can compensate for the possibility of information that may be lost due to the stacked layers. As the weights of the filters are  (1), the input data passing through the updated filters formulates the features distinguishable to the fault modes. The features just before being flattened, which are denoted as ̂ in Fig. 4, are then transferred to the subsequent SE module.

Part 3. Severity Estimation (SE)
An SE module for each fault mode learns the severity of each corresponding fault mode. Each SE module consists of two convolution layers, followed by max-pooling layers and one FC layer. The two convolutional layers of the SE module, which have a larger number of filters than the preceding FD module, extract the more sophisticated features associated with the fault severity. The elaborate features are flattened and computed with the FC layer and then regressed to determine the fault severity. The task of an SE module can be explained as p Ŝ |̂ , ∼ p(S| ). The latent features ̂ , provide significant information about the corresponding fault mode to the SE module when L FD is sufficiently decreased. Then, ̂ are used to learn the W SE of the SE model by transferring the information to the SE module of the corresponding fault mode. The delivery of ̂ is expected to concentrate on learning the specific characteristics to assess the severity of each fault by minimizing the loss of the SE module (L SE ), described as where 2 denotes a coefficient of L2 normalization and f 2 is the estimated severity, as calculated from latent feature ̂ and W SE of an SE module. Since fault severity is the continuous variable, the loss is computed by mean squared error (MSE).
The hierarchical structure of the proposed method is illustrated in detail in Fig. 5. The FD module used to identify the fault mode and three SE modules for assessing the fault severity are hierarchically associated. The numbers in square brackets indicate the dimensions of the data passed through the layer. The numbers in parentheses refer to the number of filters, and the size of the filters is set to nine for all convolution layers. The pooling size is set to four for all pooling layers. The structure of the proposed FI-HCNN is designed based on the motor current signals described in this study, but it can be generally applied with minor adjustments depending on the amount and the type of data. In Fig. 5, the blue arrow represents an example flow of a test data sample. When the test data is classified as Fault 1, the latent features of the test data transfer to the SE module of Fault 1 and develop into features that indicate the severity of Fault 1. [120] [120] [60] [60]

Experimental Study
In this section, the proposed FI-HCNN method is validated using experimental data. After an explanation of the data, the results of the proposed FI-HCNN are discussed. Then, the performance of FI-HCNN is compared to that of traditional MCSA methods and other DL methods, of which the structures are related to the proposed FI-HCNN.

Description of the Experimental Data
A dataset from a 160 kW, 2-pole induction motor was used to analyze the performance of the proposed method. In the experiment, one phase of the stator current signal was acquired at 3600 revolutions per minute (RPM) with no load. There were a total of three mechanical faults with multiple severity levels, respectively: eccentricity, unbalance, and a broken rotor bar. These faults and severity levels are illustrated in Fig. 6. The severity was defined based on the degree of experimental settings that caused the severe conditions of the motor. A higher severity level means that the health state of a motor is more deteriorated. Eccentricity, which indicates an uneven air gap between the rotor and stator, was introduced at three different levels by moving the rotor 10%, 30%, and 50% of the original air-gap length from the center. The severity of eccentricity was denoted as 10%, 30%, and 50%. For example, 30% eccentricity is described in Fig. 6a. The broken rotor bar, which was emulated by drilling rotor slots to create a half and a whole break, had the severity of 50% and 100%, respectively. Unbalance was created by attaching weights to the rotor. According to ISO21940-11, the health state is balanced at a vibration of 2.5 mm/s, marked as G2.5; G40 is treated as a failure. The severity of unbalance was set to the ratio of the unbalance level, 16% and 40%; these values indicate G6.3 and G16, respectively. All specific conditions, such as the severity level and abbreviations used, are summarized in Table 1. Figure 7 shows an example of raw stator current signals from each health state. The magnitudes at 60 Hz and its harmonics were large in the frequency domain (see Fig. 7b) because the supply frequency and the rotating frequency were the same. Although the current signals of the ROTOR stood out, as the broken rotor bar itself highly affected the motor compared to other fault modes, the current signals of each health state (except those of the broken rotor bar) were not readily distinguishable in either the time-or frequency-domain.

Result and Discussion of the FI-HCNN Method
Since FI-HCNN solves a classification problem in the FD module and a regression problem in the SE modules, the accuracy of FI-HCNN is defined separately for each module. In the case of the FD module, the error is calculated as the summation of the incorrect samples divided by the number of total samples. Then, the accuracy is calculated by subtracting the error from one. The accuracy of each SE module is evaluated by calculating the root mean squared error (RMSE) between the prediction and actual fault severity. For example, 2% of RMSE means that the fault severity deviates by an average of 2% from the true severity. All of the methods examined in Sect. 4 are evaluated with this metric.
For preprocessing, all raw stator current signals were resampled at 120 points per revolution, and one sample was defined to include two revolutions and augmented with one revolution overlapped. The length of each sample was 240. Normalization and scaling were then conducted in sequence. The total number of data in the set was 3776, as each class has 472 data. The network was trained using 75% of the data set and tested with the remaining 25% of the data set. fourfold cross-validation was conducted. The entire training and test procedure was run 10 times with randomly selected data sets to study repeatability by investigating a 95% confidence interval. The hyper-parameters, which are adaptive to learn the modules using the given data sets, are detailed in     understandable because-as compared to other fault modes, such as a broken rotor bar-the influence of an ECC-related fault in the current signal can be weak at first [33]. Thus, it is probable that an ECC might be determined to normal at the incipient stage because the effect of an incipient ECC on the current signal is small. The FD performance can be confirmed by investigating the latent feature space, as shown in Fig. 9. The test data set was used to demonstrate the latent feature spaces of each pooling layer in the FD module. The latent space of POOL3 (Fig. 9c), which is ̂ , had more condensed clusters, compared to that of POOL1 and POOL2 ( Fig. 9a and b, respectively). In Fig. 9c, the NOR was formulated into one cluster, and the other health states appeared to be more distinguishable.
The performance of each SE module was evaluated using RMSE; 0.61 ± 0.05% for ECC, 0.54 ± 0.05% for ROTOR, and 0.65 ± 0.04% for UNB, respectively ( Table 3). The learning feasibility of the SE module was confirmed by analyzing the change of the estimation result depending on the loss. For example, the trend of the loss and its SE results are demonstrated in the case of UNB in Fig. 10. While the bias and variance error of the FI-HCNN method showed improvement in the final output, the errors remained in the common hierarchical model in which the input was used repetitively. Moreover, the RMSE result of FI-HCNN in the early stage of SE was smaller than that of the comparative HCNN, which reuses the raw data; this shows the effect of latent features in SE. These RMSE results are discussed more specifically in the following subsection by comparing them with the results derived from other methods.

Comparison with Conventional MCSA Methods
This section aims to investigate the performance of FI-HCNN, as compared to existing methods. Two studies were conducted to represent conventional MCSA methods; one was based on physics-based spectral features, the other was based on data-driven features computed with principal component analysis (PCA) of the magnitude of fast Fourier transform (FFT). The spectral features that were developed separately for each fault mode through theoretical analysis are summarized in Table 4; these results are based on [34,35]. n b is the number of rotor bars, s is the slip, p is the number of pole pairs, f s is the supplied frequency, f r is the rotating frequency, k and λ are positive integers, and μ is the arbitrary odd number. For the data-driven features, the principal components (PCs) of the FFT magnitudes are calculated. Instead of selecting the specific FFT magnitudes based on domain knowledge, PCA reduced the original FFT magnitude set (consisting of 120 data points), to a 22 PC set with 99% explained variance. After extracting the features using both physics and data-driven methods, the features were fed into SVM for FD and into support vector regression (SVR) for SE in common. The SVM and SVR methods both use quadratic polynomial kernels. The results of these two methods are summarized in Table 3. FI-HCNN shows about 2% better FD accuracy than other methods. As shown in Fig. 8, both conventional methods had more false alarms that indicate normal to faulty. In addition, the RMSEs of SE using the conventional methods were about 10 times worse than those of FI-HCNN, as shown in Fig. 11. Specifically, the reason for the low performance of both conventional methods can be described in terms of the extracted features; these features do not demonstrate the apparent trends of fault deterioration. In fact, the spectral features were overlapped, depending on the parameters, even though they are defined separately. For example, the similarity between eccentricity and other mechanical faults, such as a bearing inner race fault and a broken rotor bar [36,37] are revealed. Therefore, it is hard to declare that one spectral feature reflects only the effect of a particular fault mode. This is because the three fault modes (ECC, ROTOR, and UNB) share the relative characteristics that belong to mechanical failure and affect each other. Figure 12 shows the fault characteristic frequencies under a 3600RPM constant-speed condition. Most frequencies were overlapped because the supply frequency was the same as the rotating frequency and there was no slip at the constant-speed condition. Figure 13 shows some spectral features labeled by the fault modes; the number next to each fault mode refers to the fault severity. It is difficult to readily discriminate the fault modes and their severity because a significant amount of the feature values were overlapped. Also, the two main PCs of the FFT magnitudes are plotted in Fig. 14

Comparison with Other Hierarchical CNN Methods
This section intends to confirm the performance of the feature inherited structure in HCNN. Two concept models were constructed with HCNN with a repetitive hierarchical structure in which the input data is re-used in the child modules based on previous research [38][39][40]. The structures of all of the comparative HCNN models are described in Fig. 16. Figure 16a is the proposed FI-HCNN. Figure 16b is one of the repetitive HCNN (Rep-HCNN1) models, described in [38], where the child modules are modified from the parent module. The child modules of Rep-HCNN1 have the same structure as that of FI-HCNN. Figure 16c is the other repetitive HCNN (Rep-HCNN2), where the structure of the parent module and the child module are identical; as outlined in [39,40]. The notations in Fig. 16 are the same as those in Fig. 4. The hyper-parameters (e.g., learning rate, batch size,   The SE results of all of the comparative methods using the above models are summarized in Table 5; FI-HCNN showed the best performance among all results. The RMSEs of all of the fault conditions using FI-HCNN were about half of those observed for the other HCNN methods, as shown in Fig. 17. We can also confirm that FI-HCNN has a lower variance error compared to both Rep-HCNN1 and 2, as shown in Fig. 18.
To be specific, the superior results of FI-HCNN, as compared to Rep-HCNN1, support the idea that the propagation of the latent features is effective to enhance SE. The structures of Rep-HCNN1 and Rep-HCNN2, which receive the raw input data in common, have different filter designs;  Rep-HCNN1 extracts lots of features at the beginning, while Rep-HCNN2 extracts an increasing number of features through stacked layers. A possible reason for the slight improvement in Rep-HCNN2, as compared to Rep-HCNN1, is that the gradual learning by the stacked layers is more effective for training the raw input data. Through these comparative studies, we can confirm that the pre-trained latent features that learn the characteristics of the fault mode result in positive effects in the SE modules. There is also abundant room for further progress, by examining additional data in various fault conditions.

Conclusion
In this study, a new method-FI-HCNN-was proposed to identify the faults of induction motors and to calculate the fault severity. The structure of FI-HCNN was hierarchically composed to lead to an FD module that can learn the types of faults and an SE module that is able to estimate their severity. Fault severity was more accurately estimated in the proposed method, as compared to conventional methods, because the latent features, which contain the representations of the fault modes, are propagated from the FD module to the SE module to support the learning of severity. First, the performance of HCNN was confirmed by comparison with conventional MCSA methods. Specifically, spectral features and PCs of FFT magnitude from stator current signals were used with SVM for FD and with SVR for SE. In addition, two conventional HCNN models whose structures are similar to that of FI-HCNN were examined to confirm the superiority of the feature inherited structure of the proposed method. Through the experimental studies, FI-HCNN was proven to provide enhanced features that are more suitable for accurate estimation of fault severity, without the need for significant domain knowledge. FI-HCNN has the potential to learn more robust features through extended training that is available from the pretrained weights when additional fault mode data is included. Then, the latent features that are generated from the more sophisticated FD module can be applied to improve the SE performance. In future work, the training step can be enhanced by improving the loss function of FI-HCNN. Moreover, further study of FI-HCNN can be conducted in the presence of unknown faults.