Introduction

Permanent magnet synchronous motors (PMSMs) are used in various industries such as aerospace, autopilot electric vehicles, and robots owing to their multiple advantages, including a wide constant-power speed range, better dynamic performance, and ease of maintenance [1,2,3,4]. It is necessary to monitor faults in PMSMs to improve their performance, increase their lifespan, and lower costs.

In general, PMSMs include three fault types: electrical, magnetic, and mechanical. According to literature, electrical faults account for 36% [5] and mechanical faults account for approximately 50% of the total faults; mechanical faults are subclassified as eccentricity (10%) and bearing (40%) damages [6]. Other faults account for the remaining 14%. It is clear that bearing damage is more likely to occur in the motors. However, the unbalanced stress, vibration, and pulsation torque generated by eccentricity and demagnetization faults are the main causes of bearing damage [7,8,9,10]. In addition, demagnetization faults lower the output torque, which may further induce inter-turn short circuits because of the increasing temperature caused by the copper loss in the stator coils. Overall, the permanent magnet improves the PMSM performance, while also connecting the effects of various types of faults. Therefore, it is important to diagnose incipient demagnetization and eccentricity faults to reduce the occurrence of other faults.

Fault diagnosis is classified into traditional and intelligent methods. In traditional methods, PMSM fault diagnosis based on motor stator current analysis (MSCA) is popular because of its multiple advantages, such as high reliability, low cost, and noninvasive diagnosis. In the methods based on MSCA, fault features are extracted with time–frequency analysis methods, such as the fast Fourier transform and wavelet transform, or with other motor signal estimations, such as the back electromotive force, magnetic flux in the air gap, and vibration [11,12,13,14]. However, traditional methods based on MSCA are incapable of diagnosing multitype faults. Researchers have suggested that the fault feature of a PMSM can be extracted using newly developed deep learning. For instance, a local connection network structured with a normalized sparse autoencoder was used for fault feature extraction from the vibrations of a rotating machine [15]. A previously proposed dislocated time series CNN was employed to extract the fault features of an induced motor from vibration signals [16]. Bearing damages were detected with a derivative CNN based on an analysis of the vibration signals of the PMSM [17, 18]. However, the methods, which are based on vibration analysis, suffer from a remarkable shortcoming in that they cannot be applied to vibration systems, such as electric vehicles. This is because the sensors measuring the vibration signal of the motor are influenced by other vibration sources. In addition, few diagnostic methods analyze the motor’s stator current with CNNs, as these are not suitable for processing high-frequency signals, although being powerful tools for feature extraction in image recognition.

To overcome the aforementioned shortcomings, this paper proposes an intelligent method using a deep CNN and image recognition to diagnose eccentricity and demagnetization for the detection of multitype faults in an interior permanent magnet synchronous motor (IPMSM). To extract fault features from the IPMSM’s stator current with the CNN, a gray image transforming algorithm is proposed to transform current data to gray images using the autocorrelation matrix of raw data. The gray image transformation algorithm adapts the advantages of CNNs by transforming the motor current into gray images. In addition, the proposed image transformation method was tested by three different designed residual networks including Resnet-9, ResNet-15, and Pyramidal Resnet-9 which all obtain high accuracy.

In summary, the main contributions of this paper are as follows:

  1. 1.

    This paper proposes a gray image transformation method to avail the faults feature in CNNs, which bears several advantages. First, the proposed image transformation method causes the CNN model to converge easily even without normalization operating in the CNN structure, which is backed both by theory and practice. Second, the entire transformation process exhibits rigorous mapping logic. The gray images transforming with the autocorrelation matrix of the current signal exhibits a one-to-one mapping relationship; therefore, the gray images do not contain redundant pixels which increases the computational cast. Finally, the proposed image transformation method requires relatively little raw data, thereby allowing for more effective data processing.

  2. 2.

    This paper proposes a multitype faults diagnostic method for IPMSM solely using time-field analysis, which includes the reliability of the MCSA based, noninvasiveness, immune to load and speed methods, and robustness to factors linked to the drive system. In addition, the proposed diagnostic method allows for visualization-based motor fault diagnosis.

The remainder of this paper is organized as follows. In “Related works”, studies related to eccentricity and demagnetization are discussed. The introduction for finite element modeling and CNN is elaborated in “Finite element modeling and convolutional neural network”. The co-simulation analysis of fault motor models is presented in “FEM models and co-simulation analysis”. In “Proposed diagnosis method and research for current data transformation to images”, the diagnosis method is described and the proposed gray image transformation method based on the autocorrelation matrix is elaborated in detail and analyzed theoretically. Further, in “Comparative analysis of the effectiveness of the proposed image transformation method with CNN”, the advantages of the proposed gray image transformation method are compared with those of the current transformation methods by CNN. In “Model structure design and fault classification”, the designed residual models and fault classification results are detailed. Finally, “Conclusion” presents the conclusion of this paper.

Related works

Eccentricity is classified into static, dynamic, and mixed eccentricity (SE, DE, and ME). SE is a condition where the position of the minimum radial air gap is fixed generally owing to imperfections introduced during manufacturing or assembly. The position of the minimum air gap rotates with the rotor when DE occurs in motors, which is attributed to imperfections in the rotor, worn bearings, and bent shaft among others. SE and DE usually exist concurrently as ME, and its inherent level is within 10% of the air gap. The eccentricity faults were detected through time–frequency analysis. For instance, the amplitude of sideband components (ASBC) at characteristic frequency \({f}_{e}=\left(1\pm \frac{2k-1}{P}\right){f}_{s}\) proposed by Bashir et al. [19] is usually used to detect eccentricity faults (where P is the number of pole pairs, and \({f}_{s}\) is the supply frequency, and \(k\) = ± 1, ± 2, ...). The index extracted from wavelet coefficients of the stator current is proposed for the diagnosis of eccentricities, including a linear combination of the energy, shape factor, peak, head angle of the peak, area below the peak, and gradient of the peak of the detail signals [20]. Yonghyun et al. proposed an online detection based on the Hall-effect field sensor by implanting a sensor into the rotor to detect the flux [21].

Demagnetization is the unique failure for permanent magnet motors, which include uniform demagnetization and partial demagnetization. The uniform demagnetization is attributed to the high temperature on account of fault environment temperature or current, which aggrandizes the coil loss to increase the inner temperature of motors. The characteristic frequencies \({f}_{e}=\left(1\pm \frac{k}{P}\right){f}_{s}\) are employed to diagnose demagnetization [22]. Liu et al. suggest that eccentricity and local demagnetization faults can be diagnosed using the high-frequency d-axis inductance and permanent magnet flux estimation, respectively [23].

There exist repetitive partial field between the characteristic frequency of eccentricity and demagnetization, and may be easily confused in the multiple-fault diagnosis employing the time–frequency analysis. In addition, the invasive methods require the implantation of sensors or additive coils inside motors during manufacturing. Further, diagnostic methods based on sensors are easily disturbed by the noise-related source.

Demagnetization and eccentricity faults were detected via the index using d-axis in a previous study [24]. However, this necessitated comparison between faults and normal motors, which is a shortcoming inherent in conventional diagnostic methods using indexes.

Following the development of machine learning, certain diagnostic methods combined machine learning technologies and artificial features. Fault features are extracted based on wavelet transform using 1-D CNN [13]. The vibration signals are processed by 1-D CNN for fault feature extraction to detect mechanical faults for induction motor in a previous study [17], where the reliability is affected by other vibration sources similar to methods based on sensors. This paper identifies the intelligent fault diagnosis for multiple types of faults based on CNN and image recognition, and the image transformation method is keenly observed with respect to fault feature extraction using stator currents.

Finite element modeling and convolutional neural network

Finite element modeling

Finite element method (FEM) is a high-precision modeling method that is often used in the analysis of multiple physical coupling fields because it takes into account the actual parameters and impacts of the nonlinear characteristics of the electromagnetic, mechanical, and thermal fields.

The basic theory of the FEM is based on Maxwell’s electromagnetic equations, whereby the transient equations of the external circuit, motion equations, and circuit elements are combined with the magnetic field equations. The solution of the set of equations provides the stator’s phase current as the principal variable. Three-dimensional (3-D) finite element (FE) analysis is the most precise method currently available for IPMSM modeling. However, the computational cost of the model is also increased considerably, particularly in multi-fault analysis. Therefore, a two-dimensional (2-D) FEM is employed to establish IPMSM models. In a 2-D Cartesian system, the electromagnetic equation analyzed using the FEM is expressed as [19]:

$$\frac{\partial }{\partial x}\frac{1}{\mu }\left(\frac{\partial {\varvec{A}}}{\partial x}\right)+\frac{\partial }{\partial x}\frac{1}{\mu }\left(\frac{\partial A}{\partial y}\right)={{\varvec{J}}}_{0}+{{\varvec{J}}}_{e}+{{\varvec{J}}}_{v},$$
(1)

where \(A\) is the z component of the magnetic vector potential, \(\mu \) is the magnetic permeability, \({J}_{0}\) is the current density related to the applied voltage, \({J}_{e}\) is the current density related to the time variations of the magnetic flux, and \({J}_{v}\) is the current density related to the motional voltage. Equation (1) can be rewritten as:

$$\begin{aligned}\frac{\partial }{\partial x}\frac{1}{\mu }\left(\frac{\partial {\varvec{A}}}{\partial x}\right)&+\frac{\partial }{\partial y}\frac{1}{\mu }\left(\frac{\partial {\varvec{A}}}{\partial y}\right)=-\sigma \frac{{V}_{s}}{l}+\sigma \frac{\partial {\varvec{A}}}{\partial t}\\&\quad+\sigma v\times \nabla \times {\varvec{A}},\end{aligned}$$
(2)

where σ is the electrical conductivity, \(l\) is the motor length along the z axis, \({V}_{s}\) is the applied voltage, and v is the speed of the conductor against the magnetic flux density. By applying a fixed reference frame with respect to the proposed element, v is equal to zero, and the propagation equation is simplified to:

$$\frac{\partial }{\partial x}\frac{1}{\mu }\left(\frac{\partial {\varvec{A}}}{\partial x}\right)+\frac{\partial }{\partial y}\frac{1}{\mu }\left(\frac{\partial {\varvec{A}}}{\partial y}\right)=-\sigma \frac{{V}_{s}}{l}+\sigma \frac{\partial {\varvec{A}}}{\partial t}.$$
(3)

The circuit equation of the magnetic coil is expressed as:

$${V}_{s}\left(t\right)={R}_{s}{i}_{s}\left(t\right)+{L}_{ee}\frac{d{i}_{s}(t)}{dt}+emf(t),$$
(4)

where \({R}_{s}\) is the stator resistance, \({i}_{s}\left(t\right)\) is stator current, \({L}_{ee}\) is additional inductance in external circuit, and \(emf(t)\) is the applied voltage. The vector magnetic potential, stator’s current, and electromotive force can be solved using the time-step FEM (TSFEM) from Eqs. (3) and (4).

According to the following equation, IPMSMs with different degrees of SE are modeled using the TSFEM.

$${\delta }_{s}=\frac{\left|{O}_{s}{O}_{r}\right|}{{g}_{0}},$$
(5)

where \({O}_{s}\) is the center of the stator, \({O}_{r}\) is the center of rotation, and \({g}_{0}\) is the normal air gap.

In addition, IPMSMs with different degrees of demagnetization including 25% and 50% severities degree on one pair of magnet poles are modeled by defining three magnet materials with three B-H curves in different proportions. The IPMSM geometric models showing the flux density distribution were constructed using the FE software Flux and are shown in Fig. 1. The specifications of the proposed three-phase IPMSM are listed in Table 1. The geometrical complexities of all motor parts are considered in detail for the modeling, including the spatial distribution of the stator’s windings, nonuniform air gap, physical conditions of stator conductors, rotor, shaft, air gap, and nonlinearity of the core materials.

Fig. 1
figure 1

Interior permanent magnet synchronous motor (IPMSM) finite element (FE) model: (a) 50% demagnetization in one pair of poles and (b) 20% static eccentricity

Table 1 Specifications of proposed three-phase interior permanent magnet synchronous motor

CNN

Classical CNNs consist of alternating convolutional and pooling layers. Convolutional layers are the core for feature extraction, and take up the inner product of the linear filter and underlying receptive field, followed by a nonlinear activation function at every local part of the input. This operation can be expressed as [25]:

$${\varvec{y}}_{i}^{l}=f\left(\sum_{i}^{n-1}{{\varvec{W}}}_{pq}\otimes {{\varvec{x}}}_{i}^{l-1}+{{\varvec{b}}}_{i}\right),$$
(6)

where \({{\varvec{y}}}_{i}^{l}\) is the \(i\) th output of the \(l\) th convolution layer, \(f(\cdot )\) is an activation such as the rectified linear unit, \({\varvec{W}}_{pq}\) is the trainable filters, \({{\varvec{x}}}_{i}^{l-1}\) is the last feature maps or input data, \({b}_{i}\) are the biases, and the symbol \(\otimes \) is a discrete convolution operator. The model complexity is minimized by sharing the weight vector that produces the feature map.

The pooling layers output the maximum (or average) value of the receptive field at every local part of the feature maps inspired by the visual neurons in animals. There are multiple pooling layers such as the global average pooling proposed by Lin et al. [26]; the spatial pyramid pooling comprises multiple sized max pooling for object detection [27]. The pooling layers are an effective way of decreasing the dimension of feature maps to reduce the computational cost [28]. Generally, the fully connected layers are placed in the last of convolutional blocks as linear maps for high-level reasoning with softmax activation function.

Various types of deep neural network structures have been proposed to improve the performance of network models, such as inception [29]. Using the residual network as an example, this model employs an identity shortcut to avoid the notorious problem of vanishing or exploding gradients [30, 31]. Following the enhancement of the performance of the graphics processing unit, deep networks have been applied in various industries such as computer vision and medical image processing.

FEM models and co-simulation analysis

Co-simulation of FE IPMSM models

In general, an IPMSM is driven with closed-loop algorithms, such as vector control, under actual working conditions. However, closed-loop control algorithms weaken the fault features in stator currents, which limits the application of some diagnostic methods. Therefore, the fault features in open-loop conditions are unreliable for diagnosing faults in IPMSMs in real-work conditions.

To ensure the reliability of the proposed diagnosis method, the FE IPMSM models were embedded in a vector control system with the use of co-simulations based on the FE software Flux developed by ALTAIR and MATLAB’s Simulink. A block of the control algorithm is shown in Fig. 2. In this co-simulation, the influences of the inverter and space vector pulse width modulation algorithm were considered in the simulation analysis. The FE IPMSM models were embedded in the vector control system for the real-time simulation analysis; this incorporates the accuracy of the FE model and the performance of the vector control algorithm, to ensure that the simulation matches closely the actual conditions. The co-simulation results of the normal IPMSM at a speed of 3000 revolutions per minute (rpm) and load of 10 N·m are shown in Fig. 3.

Fig. 2
figure 2

Block diagram of the vector control algorithm

Fig. 3
figure 3

Co-simulation results of the normal IPMSM based on the use of vector control: (a) stator’s three-phase current, (b) motor speed, and (c) electromagnetic torque

Analysis of co-simulation results

The various periods of stator currents at different operating speeds were compared in the time-field to analyze the effect of faults in stator currents. In this analysis, each type of fault model was simulated at similar initial conditions, including the rotor position, load, and control algorithm. Only the fault zones were different in these models. The electromagnetic torques of the two types of IPMSM fault models are compared in Fig. 4. The electromagnetic torque of the demagnetization models plotted with the yellow line for 25% demagnetization (De25) and the red line for 50% demagnetization (De50) are lower than that of the health model (plotted in blue line), as shown in Fig. 4a. The higher the demagnetization severity, the lower is the electromagnetic torque amplitude. The results confirm that demagnetization decreases the output torque in the IPMSM. The electromagnetic torque of the eccentricity fault motor was evaluated in the same comparative manner. The three electromagnetic torque curves almost overlap, which indicates that pure SE has little effect on the stator currents in the IPMSM, as shown in Fig. 4b. Therefore, it is difficult to diagnose pure SE with the use of traditional fault diagnostic methods, such as time–frequency analysis.

Fig. 4
figure 4

Electromagnetic torque comparisons of two types of IPMSM fault models: (a) between demagnetization and health models, and (b) between eccentricity and health models

Proposed diagnosis method and research for current data transformation to images

Proposed diagnosis method

CNNs are able to extract features in image recognition. However, CNNs do not adapt well to industrial applications owing to the differences between the 2-D image and industrial signals. To extract fault features from stator currents, the raw current data are transformed into images to suit the working mechanism of the CNN. In this paper, a method based on deep CNN and image recognition is proposed to diagnose multiple faults in an IPMSM, using a deep CNN to automatically extract fault features. A block diagram of the proposed method is shown in Fig. 5. The proposed method includes three stages: data transformation, feature extraction, and fault classification. In the first stage, all the three-phase currents are transformed into gray images at different operating conditions. In the second stage, the transformed images are fed to a deep CNN to extract the underlying fault features. In the third stage, various types of faults with different severity degrees are segmented by a classifier.

Fig. 5
figure 5

Block diagram of the proposed gray image transformation method

Image transformation method based on autocorrelation matrix

Current image transformation methods can be classified in three categories based on arraying [17, 32], coloring [33], and coordinate transformation [34]. In the array-based transformation method, the data are segmented into vectors with equal sizes, which are arrayed first in a matrix and then mapped to an image. However, data cannot be processed in a timely manner using this method, because it requires a large amount of raw data to be transformed into images. The image transformation method based on coloring is a one-to-many mapping of raw data to the relevant color field. This transformation lacks rigorous logicality because the color field is randomly decided by people. The image transformation method based on coordinate transformation transforms a feature type to another coordinate system to make the feature distribution more easily distinguishable with the naked eye. However, there also exists a rigorous logicality problem in the coordinate transformation method for machines. Therefore, a method based on an autocorrelation matrix is proposed to overcome the aforementioned shortcomings in current image transformation methods. A block diagram of the three stages of the proposed method based on the autocorrelation matrix is shown in Fig. 6. In the first stage, the stator currents are recursively segmented in vectors with equal sizes and saved in an array to process the current data more efficiently. The process of recursive segmentation of the raw current data is shown in Fig. 7. In the second stage, the autocorrelation matrix, mean μ, and variance σ2 of the raw current sample are computed. In the third stage, the autocorrelation matrix is mapped into a gray image rather than to a three-channel image by considering elements of the matrix to the gray value as a Gaussian distribution with mean μ and variance σ2, which makes the image transformation more rigorous in logicality. Using a normal IPMSM as an example, gray image samples are transformed with currents operating in various conditions, including stationary and nonstationary conditions, as shown in Fig. 8.

Fig. 6
figure 6

Block diagram of the proposed method based on the autocorrelation matrix of the signal

Fig. 7
figure 7

Recursive raw data segmentation process

Fig. 8
figure 8

Gray images transformed by the proposed method with currents of a normal IPMSM operating at different speeds: (a) speed increases from 1000 to 2000 revolutions per minute (rpm), (b) speed increases from 2000 to 3000 rpm, (c) variable speed with a final value at 3000 rpm (increased first and then decreased), and (d) speed set at 3000 rpm

Theoretical analysis

To demonstrate that the image transformation method based on the autocorrelation matrix is feasible, the convergence of the CNN model trained with images transformed by the proposed transformation method was analyzed from theory. Any CNN model can be considered as a matrix function \({f}_{CNN}\left({\varvec{A}}\right)\), which can be expressed as:

$${f}_{CNN}\left({\varvec{A}}\right)={a}_{0}{\varvec{I}}+{a}_{1}{\varvec{A}}+{a}_{2}{{\varvec{A}}}^{2}+\dots +{a}_{k}{{\varvec{A}}}^{k}+\dots ,$$
(7)

where I is an element matrix, A is the image variable input to the CNN model and \({a}_{i}\) are regarded as kernels (or filters) of the CNN.

To discuss the superiority of the autocorrelation matrix in terms of neural network computation, the definition of convergence in the matrix function must first be introduced as follows:

The sum \({{\varvec{S}}}_{N}\left({\varvec{A}}\right)=\sum_{k=0}^{N}{a}_{k}{{\varvec{A}}}^{k}\) of the first N + 1 items of a matrix power series \(\sum_{k=0}^{\infty }{a}_{k}{{\varvec{A}}}^{k}\) is defined as the partial sum of the matrix power series. The matrix power \(\sum_{k=0}^{\infty }{a}_{k}{{\varvec{A}}}^{k}\) converges if the sum sequence \(\left\{{{\varvec{S}}}_{N}\left({\varvec{A}}\right)\right\}\) of the matrix power series \(\sum_{k=0}^{\infty }{a}_{k}{{\varvec{A}}}^{k}\) converges; otherwise, it diverges [35].

In the proposed image transformation method based on the autocorrelation matrix, A is the autocorrelation matrix for the operating current signal, which can be denoted by \({{\boldsymbol\alpha }}^{{{T}}}{\boldsymbol\alpha }\) and is expressed as:

$${\varvec{A}}={\boldsymbol{\alpha }}^{T}\boldsymbol{\alpha }=\left[\begin{array}{c}{\alpha }_{1}\\ {\alpha }_{2}\\ \vdots \\ {\alpha }_{n}\end{array}\right]\left[{\alpha }_{1}, {\alpha }_{2}, , {\alpha }_{n}\right],$$
$$=\left[\begin{array}{ccc}{\alpha }_{1}^{2}& \cdots & {\alpha }_{1}{\alpha }_{n}\\ \vdots & \ddots & \vdots \\ {\alpha }_{1}{\alpha }_{n}& \cdots & {\alpha }_{n}^{2}\end{array}\right],$$
(8)

where \(\boldsymbol{\alpha }=\left\{{\alpha }_{i}\right\}, i=\mathrm{1,2},\dots , n\), \({\alpha }_{i}=\frac{1}{\sqrt{2\pi }\sigma }{e}^{-\frac{{(x-\mu )}^{2}}{2{\sigma }^{2}}}\), and \(x\) is the operating current value.

Subsequently,

$${\varvec{A}}{\boldsymbol{\alpha }}^{T}={\boldsymbol{\alpha }}^{T}\boldsymbol{\alpha }\cdot {\boldsymbol{\alpha }}^{T}=\sum_{i=1}^{n}{\alpha }_{i}^{2}\cdot \boldsymbol{\alpha }$$
(9)

Therefore, \({\boldsymbol{\alpha }}^{{\varvec{T}}}\) is an associated eigenvector of A for the eigenvalue \(\sum_{i=1}^{n}{\alpha }_{i}^{2}\).

Additionally,

$$0{\varvec{I}}-{\varvec{A}}=-{\boldsymbol{\alpha }}^{{\varvec{T}}}\boldsymbol{\alpha }=-\left[\begin{array}{ccc}{\alpha }_{1}^{2}& \cdots & {\alpha }_{1}{\alpha }_{n}\\ \vdots & \ddots & \vdots \\ {\alpha }_{1}{\alpha }_{n}& \cdots & {\alpha }_{n}^{2}\end{array}\right],$$
(10)

where I is an element matrix. In matrix \(-{\boldsymbol{\alpha }}^{{\varvec{T}}}\boldsymbol{\alpha }\), \({{\varvec{r}}{\varvec{o}}{\varvec{w}}}_{i}=\frac{{\boldsymbol{\alpha }}_{{\varvec{i}}}}{{\alpha }_{i+1}}{{\varvec{r}}{\varvec{o}}{\varvec{w}}}_{i+1}, i=\mathrm{1,2},3,\dots , n-1\) and \({{\varvec{r}}{\varvec{o}}{\varvec{w}}}_{i}\) is the nth row of matrix \(-{\boldsymbol{\alpha }}^{{\varvec{T}}}\boldsymbol{\alpha }\).

Therefore, \(rank\left(0{\varvec{I}}-{\varvec{A}}\right)=rank\left(-{\boldsymbol{\alpha }}^{{\varvec{T}}}\boldsymbol{\alpha }\right)=1\) and 0 is the n–1 folds eigenvalue of A. Thus, the Jordan form of \({\varvec{A}}\) is expressed as:

$${{\varvec{J}}}_{{\varvec{A}}}=\left[\begin{array}{cc}{{\varvec{J}}}_{0}({\lambda }_{0})& \\ & {{\varvec{J}}}_{1}({\lambda }_{1})\end{array}\right],$$
(11)

where

$${{\varvec{J}}}_{0}\left({\lambda }_{0}\right)=\left[\sum_{i=1}^{n}{\alpha }_{i}^{2}\right]$$
(12)

and

$${{\varvec{J}}}_{1}\left({\lambda }_{1}\right)=\left[\begin{array}{ccc}\begin{array}{cc}0& 1\end{array}& & \\ & \begin{array}{cc}\ddots & \ddots \end{array}& \\ & & \begin{array}{c}1\\ 0\end{array}\end{array}\right].$$
(13)

In addition, \({\varvec{A}}\) is a symmetric matrix. Thus, an invertible matrix \({\varvec{P}}=\left[{{\varvec{p}}}_{1},{{\varvec{p}}}_{2},\dots , {{\varvec{p}}}_{n}\right]\) exists and \({{\varvec{p}}}_{i} ,i=\mathrm{1,2},\dots , n\) are the eigenvectors of A, which results in\({\varvec{A}}={\varvec{P}}{{\varvec{J}}}_{{\varvec{A}}}{{\varvec{P}}}^{-1}\).

Substituting \({{\varvec{J}}}_{0}\left({\lambda }_{0}\right)\) and \({{\varvec{J}}}_{1}\left({\lambda }_{1}\right)\), the sum \({{\varvec{S}}}_{N}\left({\varvec{A}}\right)\) of the first N + 1 items of a matrix power series \(\sum_{k=0}^{\infty }{a}_{k}{{\varvec{A}}}^{k}\) can be expressed as.

$$\begin{aligned}{S}_{N}\left({\varvec{A}}\right)&=\sum_{k=0}^{N}{a}_{k}{{\varvec{A}}}^{k}=\sum_{k=0}^{N}{a}_{k}{\left({\varvec{P}}{J}_{i}\left(A\right){{\varvec{P}}}^{-1}\right)}^{k}\\&={\varvec{P}}\left(\sum_{k=0}^{N}{a}_{k}\left[\begin{array}{cc}{S}_{N}\left({{\varvec{J}}}_{0}\right)& \\ & {S}_{N}\left({{\varvec{J}}}_{1}\right)\end{array}\right]\right){{\varvec{P}}}^{-1}.\end{aligned}$$
(14)

Therefore, the \({{\varvec{S}}}_{N}\left({\varvec{A}}\right)\) converges if and only if the matrix sequences \({\{{\varvec{S}}}_{N}({{\varvec{J}}}_{{\varvec{i}}})\}, i=\mathrm{0,1}\) converge. In the autocorrelation matrix \({\varvec{A}}\),

$${{\varvec{S}}}_{N}\left({{\varvec{J}}}_{0}\right)=\left[\sum_{k=0}^{m}{\left(\sum_{i=0}^{n}{\alpha }_{i}^{2}\right)}^{k}\right]$$
(15)

and

$${{\varvec{S}}}_{N}\left({{\varvec{J}}}_{1}\right)={\left[\begin{array}{ccc}\begin{array}{cc}0& 1\end{array}& & \\ & \begin{array}{cc}\ddots & \ddots \end{array}& \\ & & \begin{array}{c}1\\ 0\end{array}\end{array}\right]}^{k}=0.$$
(16)

Therefore, if \({\left(\sum_{i=0}^{n}{\alpha }_{i}^{2}\right)}^{k}\mathrm{converges}\), \({{\varvec{S}}}_{N}\left({\varvec{A}}\right)\) and \({f}_{CNN}\left({\varvec{A}}\right)\) converge.

In addition,

$$\begin{aligned}{0<\alpha }_{i}&=\frac{1}{\sqrt{2\pi }\sigma }{e}^{-\frac{{(x-\mu )}^{2}}{2{\sigma }^{2}}}{<}1, and{\int }_{-\infty }^{+\infty }\frac{1}{\sqrt{2\pi }\sigma }{e}^{-\frac{{(x-\mu )}^{2}}{2{\sigma }^{2}}}dx\\&={\int }_{-\infty }^{+\infty }{\alpha }_{i}dx=1.\end{aligned}$$
(17)

Then,

$$\underset{k\to \infty }{\mathrm{lim}}{\left(\sum_{i=1}^{n}{\alpha }_{i}^{2}\right)}^{k}=0.$$
(18)

Thus, \({\left(\sum_{i=0}^{n}{\alpha }_{i}^{2}\right)}^{k}\) converges; consequently, \({{\varvec{S}}}_{N}\left({\varvec{A}}\right)\) and \({f}_{CNN}\left({\varvec{A}}\right)\) also converge.

Therefore, the CNN for the autocorrelation matrix converges theoretically, regardless of the size of the model. From a statistical perspective,\({f}_{CNN}\left({\varvec{A}}\right)={\varvec{P}}\left(\sum_{k=0}^{m}{{\varvec{a}}}_{{\varvec{k}}}{\left(\sum_{i=0}^{n}{\alpha }_{i}^{2}\right)}^{k}\right){{\varvec{P}}}^{-1}\) is the weighted sum of each moment of the maximum eigenvalue \(\sum_{i=1}^{n}{\alpha }_{i}^{2}\) of raw sample \(\left\{{\alpha }_{i}\right\}\) within the eigenspace denoted by P, which consists of the eigenvectors of A.

Dataset establishment

All the three-phase currents were transformed into gray images using the proposed method. The labels of gray images just were hinged on models, which were independent of the motor operating conditions. To generate more gray image samples, parts of the gray images were randomly selected and rotated by 90°. A total of 50,000 gray images were integrated in a dataset, including five classes of images transformed from five types of IPMSMs. Partial details of the dataset, in relation to typical motor states, are outlined in Table 2.

Table 2 Details of the dataset

Comparative analysis of the effectiveness of the proposed image transformation method with CNN

To validate the superiority of the proposed image transformation method with CNN, the raw current data were also transformed into a matrix and a three-channel image, using image transformation methods based on arraying and coloring. Each type of image was established similarly as a dataset for CNN training. The accuracy and loss of the training and testing processes are shown in Figs. 9 and 10, respectively.

Fig. 9
figure 9

Accuracy of CNN-4 trained with the (a) raw matrix, (b) raw coloring, (c) reshaped and normalized coloring, and (d) gray image dataset

Fig. 10
figure 10

Loss of CNN-4 trained with the (a) raw matrix, (b) raw coloring, (c) reshaped and normalized coloring, and (d) gray image datasets

Image transformation method based on arraying

In the image transformation method based on arraying, the raw current operating under various conditions was segmented into lists; this is expressed as

$${L}_{i}=\left\{{x}_{0},{x}_{1}\dots {x}_{749}\right\}, i=\mathrm{0,1},2,\dots 49,999.$$
(19)

The list was further reshaped to a matrix defined as \({M}_{30\times 25}\), in which the first row is equal to \({L}_{0}=\left\{{x}_{0},{x}_{1},\dots {x}_{24}\right\}\), the second row is equal to \({L}_{1}=\left\{{x}_{25},{x}_{26},\dots {x}_{49}\right\}\), and so on. The matrix was considered as the “image,” which is then fed into the CNN to extract features. All the matrix samples were constructed in a four-dimensional tensor with the shape of (50,000, 30, 25, 1), which corresponds to 50,000 images with the size of \(30\times 25\times 1.\) Finally, the four-dimensional (4-D) tensor was validated by a sample CNN model that included four convolutional layers and three fully connected layers; this model was termed as CNN-4. Furthermore, 80% of the randomly and evenly selected matrices were used as the data for training CNN-4. The remaining 20% were used as testing data to validate the trained model. The accuracy and loss of the training and testing of CNN-4 with the raw matrix dataset over 25 epochs are shown in Figs. 9a and 10a, respectively. As can be seen, the training accuracy increases with the training epochs, whereas the testing accuracy remains low. The testing loss also increases with the training epochs. These results indicate that it is difficult to extract effective features from the raw current data under closed-loop control and nonstationary conditions. These findings thus validate the current conclusion that a CNN is not suitable for processing high-frequency signals.

Image transformation method based on coloring

The image transformation method based on the coloring transforms of a type of image feature obtained by filling in colors that correspond with the feature values within a grid. The synchrosqueezed wavelet transform (SWT), a more accurate WT than the continuous wavelet transform, is considered as a transformation of coordinates based on wavelet basis functions. In this analysis, the raw current data were segmented into samples with a length of 750 and further processed via SWT to obtain the coefficients of wavelet transform (a time–frequency feature). The absolute values of the coefficients of the wavelet transform representing different colors were then filled in the grid, where the horizontal and vertical axes represent the time- and frequency-fields, respectively. An example image transformed using the image transformation method based on coloring is presented in Fig. 11.

Fig. 11
figure 11

Example image transformed via the image transformation method based on coloring

Furthermore, the axes and the color bar were moved to analyze the capacity of the coloring image feature representations. Similarly, all the coloring images transformed from the current operating under various conditions were constructed as a dataset, 80% of which was used as the training subset and 20% was used as the testing subset. The accuracy and loss of the training and testing of CNN-4 with the coloring image dataset over 25 epochs are shown in Figs. 9b and 10b, respectively.

The model trained with the dataset built using the image transformation method based on coloring exhibits extremely slow convergence or even diverges without any preprocessing for the coloring image prior to being fed into CNN-4 for training. The slow convergence of the model may also stem from one of the two following aspects:

  1. (1)

    the feature representation of the coloring image was invalid, or

  2. (2)

    the scale of the model was too small to easily converge with the coloring image.

Measurements were adopted to accelerate the convergence of the CNN model to verify that the time–frequency feature was recognizable. A common method was used to accelerate the convergence of networks. The coloring image was reshaped to \(500\times 500\) pixels and normalized by dividing with 255, before it was fed to the CNN model for training.

The accuracy and loss results for the training and testing of the model trained using the reshaped and normalized coloring datasets are shown in Figs. 9c and 10c, respectively. As can be seen, the training and testing accuracy increased rapidly. This indicates that the time–frequency feature extracted using SWT can be recognized by the CNN model. However, the CNN-4 model trained with coloring images could not converge without image preprocessing. In addition, some invalid fields existed in the coloring images because a feature value was represented by a grid field that included multiple pixels. Finally, a three-channel image excluding other processes increased the computational load on the network.

Proposed image transformation method

The gray image dataset built using the proposed image transformation method was also fed to CNN-4 to train the model. Similar to the previous cases, 80% of the randomly and evenly selected samples from the entire dataset were considered as the training dataset, while the remaining 20% were used as the testing dataset to validate the trained model. The accuracy and loss results for the training and testing of the model trained with the gray image dataset built using the proposed method are shown in Figs. 9d and 10d, respectively. The training and testing accuracies increase, whereas the losses decrease with increasing epochs.

Analysis of results

The above comparison results in regard to models trained with three datasets are presented in Table 3. Evidently, extracting effective features to recognize the type of fault from raw current data is difficult. The coloring images are divergent without any preprocessing, and the reshaped and normalized coloring images possess low accuracy and saturation. Conversely, the gray image dataset converges easily with a high testing accuracy and low computational cost.

Table 3 Test results of three datasets for CNN-4

The results indicate that the gray image transformed using the proposed method is an excellent feature representation for motor current signals. More importantly, the image transformed using the proposed method is a gray image that maintains rigorous logicality and reduces the computation required for model training. By comparing the three models trained with each image dataset, the superiority of the proposed image transformation method could be concluded based on the following:

  1. (1)

    Fewer raw data are required, thus allowing for the timely processing of signals and a reduction in the computational cost of model training;

  2. (2)

    Enhancement of the feature representation of PMSM current signals;

  3. (3)

    Recursiveness and rigorous logic in the data process;

  4. (4)

    Excellent convergence.

Model structure design and fault classification

Neural network technology has been successfully applied in various fields. However, there is no available rule for designing neural network structures. In theory, the deeper the network model, the better is the network performance. However, certain challenges arise as the depth of the network increases, such as overfitting, saturation in performance [31], and computational bottlenecks [30].

Several approaches have been used for designing network models. The dropout layer [26] was added to the model to avoid overfitting by randomly eliminating some parameters in the fully connected layers. Furthermore, small kernels, such as a "1 × 1" convolutional kernel, were used to build a sparse architecture, which reduced computational bottlenecks [30]. Residual networks are also a popular module for building deep networks, as they facilitate the convergence of the network [31]. In practice, a straightforward architecture is deemed optimal, particularly for small-scale network models.

Resnet-9

Based on the aforementioned experiences, the proposed Resnet-9 model was designed to extract fault features; it consisted of three residual modules and three fully connected layers. Each residual module included two continuous convolutional layers and an identity shortcut. Figure 12 shows the detailed structure of Resnet-9, which has no normalization layers in the Resnet-9 to validate the advantage of the proposed gray image transformation method in convergence.

Fig. 12
figure 12

Architecture of Resnet-9 model

The network model was established using a TensorFlow workstation with an NVIDA Tesla T4 graphics processing unit, based on the Aliyun cloud computing platform. From the entire sample set, 80% of the samples were selected randomly and uniformly as the training dataset. The remaining 20% were used as the testing dataset to validate the trained model. The accuracy and loss results for the training and testing of the ResNet-9 model after 60 epochs of training are shown in Figs. 13c and 14c, respectively.

Fig. 13
figure 13

Training and testing accuracy for the Resnet-9 model trained using the (a) raw matrix, (b) reshaped coloring image, and (c) gray image datasets

Fig. 14
figure 14

Training and testing loss for the Resnet-9 model trained by the (a) raw matrix, (b) reshaped coloring image, and (c) gray image datasets

To verify the superiority of the proposed image transformation method, other datasets were tested using the Resnet-9 model. The accuracy and loss results for the training and testing of the raw matrix and reshaped coloring images are shown in Figs. 13a, b and 14a, b, respectively.

The comparison results for each image transformation method are presented in Table 4. It can be observed that the model trained using the raw matrix diverges and the model trained using the reshaped coloring dataset remains saturated. The training and testing accuracies of the Resnet-9 model trained with the gray image dataset are 0.9939 and 0.9862, respectively; the training and testing losses are 0.0166 and 0.042, respectively. Thus, the proposed method shows excellent precision and involves a low computational cost.

Table 4 Testing results of RESNET-9 for three datasets

To improve the reliability of the results, the dataset is randomly and uniformly divided into five subsets to validate Resnet-9 by leave-one-out cross validation, where the subsets are successively considered as testing dataset and others are used to construct the training dataset. Models in each cross validation are trained in 60 epochs. The average training and testing accuracy are 0.9954 and 0.9869, respectively, and the average training and testing loss are 0.0155 and 0.0396, respectively. Table 5 shows the five turns of cross validation results for Resnet-9. The results in Table 5 show that the statistic result is reliable, which avoids the randomness.

Table 5 Results of cross validation

Pyramidal Resnet-9 and Resnet-15

Dongyoon et al. suggested that deleting the units with down-sampling does not degrade the performance significantly and channels should be increased as the depth of model increases [36]. The pyramidal residual network is presented to increase the feature map dimension gradually. To validate the advantages of the proposed image transformation method not only for a specific model, Pyramidal Resnet-9, a wider residual model, and Resnet-15, a deeper residual network model than Resnet-9, were designed for motor fault classification. The models lack normalization layers to test the convergence of gray images transformed using the proposed method. Table 6 shows the parameters of Pyramidal ResNet-9 and Resnet-15.

Table 6 Parameters of pyramidal RESNET-9 and RESNET-15

As the same way to train and test Resnet-9, two models were trained and tested using TensorFlow workstation and the gray image dataset. Figure 15 shows the results of the loss and accuracy in the training and testing processes. During the training process, the initial learning rate was set as \(1\times \) \({10}^{-3}\); the learning rate decreased as the epochs increased. Different coloring areas indicate different learning rates in figure for results.

Fig. 15
figure 15

Training and testing accuracy and loss for the Pyramidal Resnet-9 model trained by gray images

Figure 15 shows that Pyramidal Resnet-9 obtains a loss of 0.0074 and an accuracy of 0.9978 in the training process, and a loss of 0.0376 and an accuracy of 0.9874 in the testing process; furthermore, Fig. 16 shows that Resnet-15 obtains a loss of 0.0067 and an accuracy of 0.9974 in the training process after 80 epochs, and a loss of 0.0525 and accuracy of 0.9874 in the testing process after 91 epochs. The details of results for three models are listed in Table 7.

Fig. 16
figure 16

Training and testing accuracy and loss for the Resnet-15 model trained by gray images

Table 7 Details of results for RESNET models

In Table 7, it is clear that three residual models trained by the gray image dataset obtain high testing accuracy. This validates the advantages of the proposed image transformation method not only for a specific model. Although models lack normalization layers, they converge with a small loss for the gray image dataset built using the proposed image method based on the autocorrelation matrix. This strongly verifies that image transformation via the proposed method eases the convergence of net models in practice.

Feature maps and heatmaps

Parts of the feature maps of the 1st, 3rd, and 9th convolution layers are shown in Fig. 17 to observe the process of computing the gray image using ResNet-9. There are fewer white textures than faults in the 9th feature map of the normal gray images. This white texture may be a high-dimensional feature for distinguishing each type of fault.

Fig. 17
figure 17

Feature maps of the 1st, 3rd, and 9th convolution layers

Gradient-weighted class activation mapping (Grad-CAM) [37] is a method of visualization for the classes of network by locating the high-weight field in an image which decides the class. In this way, it can be used to observe which regions in the image play an essential role in the recognition work. Five images with different types of faults were analyzed, and fields contributed for label of image can be showed in heatmaps of 9th convolutional layer in Resnet-9. Raw images and heatmaps were overlapped together to observe regions in gray, which hold high weight for faults classification. Figures 18, 19, 20, 21, 22 show the analyzing process. Red areas have a higher weight for classification than black areas. From the heatmaps, it is clear that white textures play a key role to recognize the fault features. The classification of health gray images requires less feature regions than fault images.

Fig. 18
figure 18

Heatmaps for De25 gray image. (a) Gray image for De25 (b) heatmap in 9th convolutional layer (c) weight location

Fig. 19
figure 19

Heatmaps for De50 gray image. (a) Gray image for De50 (b) heatmap in 9th convolutional layer (c) weight location

Fig. 20
figure 20

Heatmaps for He gray image. (a) Gray image for He (b) heatmap in 9th convolutional layer (c) weight location

Fig. 21
figure 21

Heatmaps for Se10 gray image. (a) Gray image for Se10 (b) heatmap in 9th convolutional layer (c) weight location

Fig. 22
figure 22

Heatmaps for Se20 gray image. (a) Gray image for Se20 (b) heatmap in 9th convolutional layer (c) weight location

To recall the gray image transformation method proposed in “Fem models and co-simulation analysis”, the images transformed is a map from autocorrelation matrix of raw current data in essence.

A period of raw current data can be express as the following vector:

$$\varvec\alpha =\left\{{\alpha }_{i}\right\}, \, i=0,1, \ldots, 499.$$
(20)

And then, its autocorrelation matrix is as follows:

$$\begin{aligned}{{\varvec{A}}}&={\boldsymbol{\varvec\alpha }}^{{\varvec{T}}}\varvec{\alpha }=\left\{\begin{array}{c}{\varvec\alpha }_{0}\\ \vdots \\ {\varvec\alpha }_{499}\end{array}\right\}\left\{\begin{array}{ccc}{\varvec\alpha }_{0}& \dots & {\varvec\alpha }_{499}\end{array}\right\} \\&=\left[\begin{array}{ccc}{\varvec\alpha }_{0}^{2}& \cdots & \begin{array}{cc}{\alpha }_{0}{\varvec\alpha }_{498}& {\varvec\alpha }_{0}{\varvec\alpha }_{499}\end{array}\\ \vdots & \ddots & \vdots \\ \begin{array}{c}{\varvec\alpha }_{0}{\varvec\alpha }_{498}\\ {\varvec\alpha }_{0}{\alpha }_{499}\end{array}& \cdots & \begin{array}{c}\begin{array}{cc}{\varvec\alpha }_{498}^{2}& {\varvec\alpha }_{498}{\varvec\alpha }_{499}\end{array}\\ \begin{array}{cc}{\varvec\alpha }_{498}{\varvec\alpha }_{499}& {\varvec\alpha }_{499}^{2}\end{array}\end{array}\end{array}\right].\end{aligned}$$
(21)

Any part of \({\varvec{A}}\) can be considered as:

$${{\varvec{A}}}_{ij}=\left\{\begin{array}{c}{\alpha }_{i}\\ \begin{array}{c}{\alpha }_{i+1}\\ \vdots \end{array}\\ {\alpha }_{i+n-1}\end{array}\right\}\left\{\begin{array}{ccc}{\alpha }_{j}& \begin{array}{cc}{\alpha }_{j+1}& \cdots \end{array}& {\alpha }_{j+n-1}\end{array}\right\},$$
(22)

where, \(\left\{\begin{array}{c}{\alpha }_{i}\\ \begin{array}{c}{\alpha }_{i+1}\\ \vdots \end{array}\\ {\alpha }_{i+n-1}\end{array}\right\}\) and \(\left\{\begin{array}{ccc}{\alpha }_{j}& \begin{array}{cc}{\alpha }_{j+1}& \cdots \end{array}& {\alpha }_{j+n-1}\end{array}\right\}\) is the subvector of \({\boldsymbol{\alpha }}^{{\varvec{T}}}\) and \(\boldsymbol{\alpha }\), respectively.

The \({{\varvec{A}}}_{ij}\) is the cross-correlation submatrix, which includes the properties in regard to the different time sequence of current data. Normal three-phase PMSM is symmetry in the spatial structure, the spatial distribution of air gap magnetic field, or the phase current vector. However, rotor faults break this symmetry, which may enhance the correlation of current signals in different time periods, so that the deep learning model can extract fault features from the cross-correlation matrix of signals in different time periods.

Analysis of experimental results

From the perspective of motor speed, the dataset included gray images transformed with stator currents at various speeds, such as stabilizing, increasing, and even variable (initially increasing and then decreasing) speeds. The high testing accuracy is sufficient to validate the fact that the proposed method is immune to speeds. From the perspective of electromagnetic torque, the gray image samples were transformed with stator currents at different loads. In addition, the electromagnetic torque increases with the motor speed and then decreases once the motor speed reaches the reference value. The variations in the electromagnetic torque are analogous to the varying loads. Similarly, the high testing accuracy validated the fact that the proposed method is immune to loads. The reason why the proposed method is available for fault diagnosis under various operating conditions is that the number of raw data is small relative to the mechanical angle of motor, which makes the motor state contained in each sample approximate to stationary condition.

Conclusion

This paper proposed an intelligent and visual method for the detection of multitype faults in an IPMSM with image recognition based on a deep CNN and MSCA, such as diagnosis eccentricity and demagnetization faults. IPMSMs were modeled using FEM and embedded in the vector control system for in-loop simulation analysis. As traditional approaches are not capable of diagnosing multitype faults through the use of an MSCA-based method, this paper proposed a method to monitor multitype faults based on a deep CNN and image recognition, thus resolving the aforementioned challenge. Given the difficulty in extracting features from stator currents, an image transformation method based on an autocorrelation matrix was proposed to enable the extraction of fault features with the CNN, by transforming stator currents into gray images. The image transformation method was tested by three different designed models including Resnet-9, Resnet-15, and Pyramidal Resnet-9 which obtain testing accuracy of 98.62%, 98.74%, and 98.74%, respectively. In summary, the proposed image transformation method offers the following advantages in comparison with existing image transformation methods:

  1. (1)

    Fewer data are required. This leads to a reduction in the computational cost of model training and makes the data process of fault monitoring more efficient;

  2. (2)

    The image transformation method is more rigorous in logicality than current methods;

  3. (3)

    The autocorrelation matrix enhances the representation of data features, making fault feature extraction easier, as compared with the extraction from raw current data;

  4. (4)

    The image transformation method leverages the advantages of CNNs and facilitates the convergence of the CNN model.

The results validated the fact that the proposed method is immune to speeds and loads, and it is applicable under stationary and nonstationary conditions. Moreover, the proposed method outperformed other methods based on vibration analysis and deep learning, as it is based only on the time-field analysis of the motor’s stator current, which is not directly affected by factors outside the driven system. The proposed method is, therefore, capable of monitoring multitype faults in various types of industries, such as autopilot electric vehicles. The testing accuracy could be further improved through the use of a deeper CNN architecture, and the proposed diagnostic method could be applied in other types of faults for PMSMs.