1 Introduction

As an important component of wind turbines, the health of rolling bearings directly affects the operating stability of wind turbines [1,2,3]. The continuous operation of wind turbines in variable speed and heavy load environments leads to the inevitable failure of generator bearings such as pitting, wear, and gluing, which seriously affects the operation stability of wind turbines, and the bearing replacement time is long and the cost is high. Once the failure occurs, it will lead to significant economic losses [4, 5]. Therefore, it was of great significance to study an effective fault diagnosis method for generator bearing to reduce operating cost and extend bearing service life [6,7,8].

Traditional rolling bearing fault diagnosis methods mainly include envelope spectrum analysis, stochastic resonance demodulation, etc. [9,10,11]. Traditional diagnosis methods are simple in principle but require a lot of professional knowledge and a complex feature extraction process, low fault diagnosis accuracy, and high research cost [12]. Zhao et al. [13] proposed a rolling bearing fault feature enhancement method based on an adaptive noise reduction algorithm and maximum correlation kurtosis deconvolution (MCKD). Deng et al. [14] proposed a composite fault diagnosis method based on MCKD and sparse representation and verified the superiority of the proposed method through simulation signals and laboratory data. With the large-scale development of wind turbines, its system complexity increases, and the traditional fault diagnosis method can’t meet the needs of fault diagnosis due to its limitations. With the development of deep learning (DL), its powerful feature extraction capability was extensive used in image processing, fault diagnosis, and other fields [15, 16]. Petrauskiene et al. [17] converted the vibration signals into color recursion graphs for fault classification through convolutional neural networks (CNN). Jiao et al. [18] proposed a model combining transfer learning and residual network to extract bearing depth features. Zhang et al. [19] proposed an information flow fusion semi-supervised learning intelligent fault diagnosis method based on CNN structure to address the problem of limited labeled data in practical engineering. Zhang et al. [20] proposed a bearing anomaly detection method for wind turbines based on CNN and LSTM. Hou et al. [21] proposed an attention based parallel fusion encoding network, and used public datasets to verify the accuracy and robustness of the method. Liu et al. [22] proposed a fully dynamic model and sub-domain adaptive intelligent diagnosis framework based on the variable working conditions of bearings, which can achieve bearing fault diagnosis under different working conditions. Adaiton et al. [23] proposed a bearing diagnosis method based on a variational autoencoder to address the problem of poor interpretability of intelligent diagnosis algorithms based on DL. The features were projected onto a low dimensional space, and two experimental cases were used to verify the interpretability and diagnostic accuracy of the method.

The above methods realize bearing fault diagnosis in their respective diagnosis fields, but are based on the training set contains plentiful and evenly distributed fault data. In practical engineering, wind turbines usually operate in a healthy state, and it was easy to obtain massive health state data. However, the fault data samples are very limited, and the practical application of the above model is severely limited. Therefore, it is urgent to realize generator-bearing fault data enhancement under the condition of insufficient fault samples [24, 25]. Li et al. [26] use the recursive neural network of a two-stage attention mechanism for data expansion. Wei et al. [27] balance the data set by oversampling the sample features. Although the above method solves the problem of data imbalance to a certain extent, the quality of the generated samples is not high, and there is a large deviation from the original data. data augmentation (DA) makes up for the lack of fault data by learning the distribution of original data samples to generate new samples and then trains the diagnosis model to realize fault diagnosis under the condition of insufficient samples [28, 29].

Because of its powerful data generation capability, generative adversarial network (GAN) was widely used in the fields of speech recognition, fault diagnosis, and data enhancement. Hu et al. [30] proposed a wind turbine-bearing fault diagnosis strategy based on a conditional variational GAN model and fusion of multi-source signals. By learning the original data features through GAN, sample labels were introduced to solve the problem of limited fault samples. Liu et al. [31] introduced the Harr wavelet into GAN, constructed a new loss function to improve the quality of generated data, and realized bearing fault diagnosis under the condition of limited fault data. Fan et al. [32] proposed a fault diagnosis method based on the combination of GAN. Tang et al. [33] used GAN to balance samples and CNN to monitor the status of rolling bearings. Although the above methods have achieved good diagnostic results in the field of bearing fault diagnosis, most of these studies are based on laboratory fault data, there are significant differences between different fault samples, and there are few signal interference factors. In practical engineering, different faults are often irregular from degradation, the data set imbalance ratio is high, and affected by strong background noise and other equipment excitation sources, the practical application effect of the above method is not ideal. And the traditional GAN has the problem of gradient disappearing and gradient explosion, which reduces the quality of generated samples. The nonlinear characteristics of the one-dimensional vibration signal also limit the effect of the GAN model.

To overcome the shortage of GAN, this paper proposes a data enhancement method based on multiple fully convolutional generative adversarial networks (MCGAN). Taking TF graph of the bearing vibration signal of a wind turbine generator through STFT as the input of the model, the inherent time and frequency distribution characteristics of the TF graph are learned, and then the expanded samples with bearing fault characteristics, similar to the real vibration signal distribution and have diversity are generated to solve the problem of data imbalance. Compared with the existing research, the proposed method can effectively solve the problem of GAN gradient disappearance and gradient explosion, generate high-quality samples, and then balance the data set, effectively solve the problem of generator bearing imbalance in practical engineering wind turbines, and improve the fault diagnosis accuracy. In Sect. 2, the basic structure of GAN network was introduced. In Sect. 3, the basic principle, loss function and diagnosis process of MCGAN were introduced. In Sect. 4, the proposed method was experimentally verified by using the actual collected bearing data of wind turbine generator, and is compared with other feature generation methods to verify the superiority of the proposed method. In Sect. 5, the proposed method was summarized and conclusions are drawn.

2 Theoretical background

2.1 Generative adversarial network

GAN was one of the commonly used methods for DA. GAN consists of a generator (G) and a discriminator (D), as shown in Fig. 1. The generator takes random noise z as input and generates a fake sample that can “fool” the discriminator. The discriminator is used to determine whether the input sample is a fake sample or a real sample.

Fig. 1
figure 1

GAN network structure diagram

The objective function of GAN is:

$$L(G,D) = \mathop {{\text{min}}}\limits_{G} \mathop {{\text{max}}}\limits_{D} \left\{ {E_{{x \sim P(x)}} \left[ {\log D(x)} \right] + E_{{z \sim P_{z} (z)}} \left[ {\log (1 - D(G(z)))} \right]} \right\},$$
(1)

where: z is random noise, Pz is the distribution of random noise, P is the distribution of real samples, D is the generator, and G is the discriminator. By replacing G(z) with \(\tilde {x}\), formula (3) can be rewritten as:

$$L\left( {G,D} \right) = \mathop {{\text{min}}}\limits_{G} \mathop {{\text{max}}}\limits_{D} \left\{ {E_{{x \sim P(x)}} \left[ {\log D\left( x \right)} \right] + E_{{z \sim P_{z} (z)}} \left[ {\log (1 - D\left( {\tilde{x}} \right)} \right]} \right\}.$$
(2)

Through the adversarial training between the G and the D, P(z) = P(x), that is, the sample data generated by G conforming to the real distribution, the model reaches Nash equilibrium.

3 Data enhancement method based on MCGAN

A bearing fault diagnosis method based on MCGAN data augmentation was proposed to address the issues of complex working conditions of wind turbine generator rolling bearings, abundant normal samples, and limited fault samples, resulting in poor diagnostic and generalization capabilities of wind turbine generator bearing fault diagnosis methods. MCGAN can utilize a small number of samples to learn the internal distribution features of each time-frequency map sample generated by STFT, and then generate samples with diversity similar to the actual sample distribution, achieving data augmentation and obtaining an expanded dataset.

3.1 MCGAN data enhancement

Traditional GAN requires a large number of data samples as input and generates samples with similar distribution by learning the distribution of a large number of training samples. However, in practical engineering, it was difficult to obtain enough fault data. MCGAN was proposed in this paper focuses on learning the internal distribution characteristics of a very small number of training samples. To ensure that MCGAN can effectively extract the fault characteristics of vibration signals, MCGAN takes TF features obtained from STFT as input and intercepts TF according to different scale receptive fields. Gradually learn the internal distribution characteristics of time and frequency in the vibration signals of wind turbine generator bearings, to generate high-quality and diverse samples with the same internal distribution as the original data.

MCGAN is composed of multiple CGAN networks, and the structure is shown in Fig. 2. Where {G0, G1, …, GN} and {D0, D1, …, DN} are the generator and discriminator of CGAN respectively, and the generator of each CGAN has the same receptive field as the discriminator. With the deepening of the MCGAN network, the receptive field gradually becomes smaller, and more fine-grained features are gradually learned. MCGAN takes the time-frequency graph generated by the bearing vibration signal through STFT as input, {x0, x1, …, xN} TF graph x intercepts samples according to different receptive fields, that is, xn is the sample obtained by x with r > 1 as the lower sampling rate. Each generator generates a sample with similar xn distribution for the corresponding truncated signal. The corresponding discriminator attempts to distinguish whether the input sample is a true truncated sample or a generator. After the antagonistic training of the generator and discriminator, the model can generate samples with similar internal distribution characteristics.

Fig. 2
figure 2

MCGAN structure diagram

MCGAN uses multiple CGANs to gradually generate fake samples from large receptive fields to small receptive fields and from low resolution to high resolution. The sample generation starts at the larger scale, passes through the generator successively until it reaches the smallest scale, and adds noise to the input of the generator at each scale. All CGAN generators and discriminators have the same receptive field, so the scale captured by the model gradually decreases during sample generation. At the largest scale, GN maps spatial Gaussian white noise ZN to image sample x* N:

$$x_{N}^{*} = G_{N} \left( {Z_{N} } \right).$$
(3)

At this stage, the receptive field is set to half the length of the vibration signal of the rolling bearing, which helps the sample generated by GN to take into account the overall layout and local distribution at the same time. The input \({G_n} \in \left\{ {{G_0}, \cdots ,{G_{N - 1}}} \right\}\) is not only the noise signal, but also the upsampled signal of the signal obtained by the upper-level generator is added, namely:

$$x_{n}^{*} = G_{n} \left( {Z_{n} ,x_{{n + 1}}^{*} \otimes r \uparrow } \right),\,n < N,$$
(4)

where: \(\otimes\) indicates the up-sampling process. The upsampling signal obtained by the upper level generator of Gn input can help the model learn higher resolution and finer-grained features. In addition, each level of Gn input contains noise information, which can improve the model’s generalization ability in small sample environments.

Fig. 3
figure 3

Gn structure diagram of generator

All generators Gn have the same structure, as shown in Fig. 3. The specific operation is:

$$x_{n}^{*} = x_{{n + 1}}^{*} \otimes r \uparrow + \varphi _{n} \left( {Z_{n} + x_{{n + 1}}^{*} \otimes r \uparrow } \right),$$
(5)

where: \({\varphi _n}\) represents five nonlinear maps with a convolution structure with 3 × 1@32 convolution kernel, BN, and LeakyReLU activation function. Adding BN layers in Gn and using LeakyReLU activation functions instead of commonly used ReLU activation functions can effectively improve the stability of CGANs.

3.2 MCGAN loss function

In the process of training the model, the training starts from the N layer, and the parameters are fixed immediately after the training of each layer, and then the next layer is trained. To ensure the similarity between the generated sample and the real sample, this paper adds the reconstruction loss based on the adversarial loss of each CGAN:

$$x_{n}^{*} = x_{{n + 1}}^{*} \otimes r \uparrow + \varphi L_{r} (G_{n} ,D_{n} ) = \left\{ {\begin{array}{ll} {\left\| {G_{N} (Z^{'} ) - x_{N} } \right\|} & {n = N} \\ {\left\| {G_{n} (0,x_{{n + 1}}^{*} \otimes r \uparrow ) - x_{n} } \right\|} & {n < N} \\ \end{array} ,} \right.$$
(6)

where {Z, N, Z, N-1 …, Z’ 0}={Z’, 0, …, 0} is a selected set of random noise, and increasing the reconstruction loss will ensure that the sample obtained by the noise generator is similar to the real sample distribution, that is, the generated sample contains time-frequency characteristics similar to the real sample. Thus, the loss function for each CGAN is:

$$\mathop {{\text{min}}}\limits_{{G_{n} }} \mathop {{\text{max}}}\limits_{{D_{n} }} L_{a} \left( {G_{n} ,D_{n} } \right) + \lambda L_{r} \left( {G_{n} ,D_{n} } \right),$$
(7)

where \(\lambda\) is the equilibrium parameter and La(Gn, Dn) is the adversarial loss. Then the loss function of MCGAN is:

$$\mathop {{\text{min}}}\limits_{{G_{{}} }} \mathop {{\text{max}}}\limits_{{D_{{}} }} \sum\limits_{n} {L_{a} \left( {G_{n} ,D_{n} } \right) + \lambda _{n} L_{r} \left( {G_{n} ,D_{n} } \right).}$$
(8)

3.3 Fault diagnosis method of the rolling bearing of wind turbine based on MCGAN data enhancement

The proposed fault diagnosis method of wind turbine generator bearing based on MCGAN data augmentation is shown in Fig. 4, and the specific steps are as follows:

(1) The collected bearing data of the wind turbine generator was preprocessed, and TF characteristics are extracted by STFT;

(2) Construct a training set and a test set according to a certain proportion of TF images;

(3) Input the unbalanced fault samples in the training set into the initialization parameters of the MCGAN model, and train the CGAN model in layers to obtain the data-enhanced model MCGAN.

(4) The MCGAN model was used to expand the training set, and the expanded data set with a balanced, sufficient, and diverse sample size was obtained;

(5) Using the extended data set to train the fault diagnosis model;

(6) Input the test set into the fault diagnosis model to obtain the fault diagnosis result.

Fig. 4
figure 4

Fault diagnosis method of the rolling bearing of wind turbine based on MCGAN data enhancement

4 Test verification

To verify the effectiveness of the MCGAN method to enhance bearing vibration signal information, the actual collected bearing data of the wind turbine generator was used for test verification.

4.1 Introduction to wind turbine bearing data

The test data are from a 1.5 MW unit in a wind farm in Shandong province. The generator is Xiangtan motor and the bearing model is SKF6332. The parameters of the generator are shown in Table 1. The acceleration sensor model is B&K Vibro AS-020, which is installed on the bearing end cover of the generator and collects acceleration signal data from the drive end and free end of the generator. 16,384 Hz was the sample frequency, and the single sampling duration is the 20s. Data collection is shown in Fig. 5.

Fig. 5
figure 5

Data acquisition of wind turbine bearing

Table 1 SKF6332 parameters

According to the bearing data of turbine 33 in the wind field, four status data of health status (N), outer ring failure (OR), inner ring failure (IR), and rolling failure (B) at 1350RPM were selected. A sliding window with 200 steps and a window length of 2048 intercepts signals as samples. Health state data is easy to obtain, so health state vibration samples are sufficient, so this paper only enhanced the data of IR, OR, and B fault states. To verify the effectiveness of MCGAN for data enhancement on a small number of samples, only 5 samples of each fault state are taken as the training set of MCGAN for expanding the data set. 100 sample were selected for each of the four types of health status data as a test set to test the quality of the expanded data set obtained by MCGAN information enhancement. The specific data division is shown in Table 2.

Table 2 Wind turbine bearing data sample construction

4.2 Analysis of test results

According to the time-frequency graph obtained by STFT from the vibration data of turbine generator bearings, the MCGAN setting is composed of three-stage CGAN, whose receptive fields are 64, 32 and 11, respectively. The generator of each CGAN layer is the same as the discriminator structure, consisting of five 3 × 1@32 convolution layers, a BN layer and LeakyReLU as the activation function. For each level of CGAN training 200 times, the learning rate is set to 0.005. Figure 6 shows the comparison between the original STFT data and the generated STFT time-frequency graph.

Fig. 6
figure 6

a STFT time-frequency diagram of raw data b frequency spectrum when STFT is generated

As can be seen from the comparison in Fig. 6, the time spectrum generated under different states is similar to the original spectrum, retaining the bearing fault characteristics. To further evaluate the similarity between the generated sample and the original sample, a multi-index evaluation system is established to evaluate the quality of the generated sample. Fréchet Inception Distance (FID) [34] and Structural Similarity Index Metric (SSIM) [35] are used to measure the distribution difference between the generated sample and the real sample. The calculation formulas for FID and SSIM are as follows:

$$ \text{FID }=\left\| {u_{x} - u_{y} } \right\|^{2} + Tr(\sum _{x} + \sum _{y} - 2(\sum _{x} \sum _{y} )^{{\frac{1}{2}}} ) ,$$
(9)
$${\text{SSIM}}\left( {x,y} \right) = \frac{{\left( {2u_{x} u_{y} + c_{1} } \right)\left( {2\sigma _{x} \sigma _{y} + c_{2} } \right)}}{{\left( {u_{x}^{2} + u_{y}^{2} + c_{1} } \right)\left( {\sigma _{x}^{2} + \sigma _{y}^{2} + c_{2} } \right)}},$$
(10)
$$c_{1} = \left( {k_{1} L} \right)^{2} ,$$
(11)
$$c_{2} = \left( {k_{2} L} \right)^{2} ,$$
(12)

where: ux represents the mean value of the original image, uy represents the mean value of the generated image, \({\sum _x}\) represents the covariance matrix of the original image, \({\sum _y}\) represents the covariance matrix of the generated image, \(\sigma _{x}^{2}\) represents the variance of the original image, \(\sigma _{y}^{2}\) represents the variance of the generated image, L is the dynamic range of pixel values, k1 = 0.01, k2 = 0.03.

MCGAN was used to expand the 5 samples under each fault category in Tables 2 to 100, forming a sample-balanced data set, and using the expanded data set to calculate FID and SSIM according to the formula, FID = 0.2943, SSIM = 0.9582. When the generated image is closer to the original image, the FID is smaller and the SSIM is closer to 1. It can be seen from the calculation results that the samples generated by MCGAN have the same characteristics as the original samples, which can effectively enhance the data and make up for the data imbalance in the actual bearing fault diagnosis of wind turbines.

4.3 Comparison of different pretreatment methods

MCGAN uses TF graphs obtained from STFT as input. To verify the suitability of STFT and MCGAN models, traditional data preprocessing methods are adopted. Grayscale image (GI), continuous wavelet transform (CWT), and Wigner–Ville distribution (WVD) are used as inputs to MCGAN. The training set of IR, OR, and B samples was expanded to 100 samples respectively.

To further evaluate the quality of samples generated by different methods, FID, SSIM, maximum mean discrepancy (MMD), and KL divergence were used to measure the distribution difference between the generated samples and the real samples. The formulas for MMD and KL divergence are as follows:

$$MMD\left( {P,Q} \right)^{2} = \left\| {\frac{1}{m}\sum\limits_{{x_{i} }} {\phi \left( {x_{i} } \right) - } \frac{1}{n}\sum\limits_{{x^{\prime}_{i} }} {\phi \left( {x^{\prime}_{i} } \right)} } \right\|_{2}^{2} ,$$
(13)
$$KL\left( {P\parallel Q} \right) = \sum {P\left( X \right)} \log \frac{{P\left( x \right)}}{{Q\left( x \right)}},$$
(14)

where P and Q are the probability distributions of real samples and generated samples respectively, \({x_i}\) and\({x^{\prime}_i}\) are the ith real samples and generated samples respectively, m and n are the number of real samples and generated samples respectively.

Figure 7 shows the trend of changes in FID and SSIM during MCGAN training. Overall, as the model iterates, the value of FID gradually decreases and the value of SSIM gradually increases, indicating that the samples generated by the model are getting closer to the real samples. In addition, in this study, MCGAN consists of three layers of CGANs, with each layer trained 200 times. Therefore, the decrease in FID and the increase in SSIM tend to occur in one stage with 200 iterations. The first two stages change rapidly, and after fixing the upper layer CGAN, the two indicators show significant fluctuations during the alternating process of training the next layer, but they will soon stabilize. This is because the parameters at the beginning of the next level model are randomly generated, but due to the foundation of extracting features from the upper level CGAN, the model tends to stabilize in fewer iterations.

Fig. 7
figure 7

The trend of changes in FID and SSIM during MCGAN training

The divergence of FID, SSIM, MMD, and KL between the generated samples and the real samples obtained through MCGAN with different inputs are shown in Table 3.

Table 3 Evaluation indicators for generating samples using different methods

It can be seen from Table 3 that among the four methods, the divergence of data set FID, MMD, and KL generated by GI as input is the largest, SSIM is the smallest, and the data generation effect is the worst. This is because the simple conversion of vibration signals into two-dimensional grayscale graphs cannot well characterize the relationship between the time domain and frequency domain of signals. The sample generation effect of CWT and WVD is slightly better than that of GI, but it is still lower than that of STFT, mainly because CWT is affected by wavelet basis function, and WVD produces cross-term interference when processing modulated signals. Among the four methods, STFT has the best effect of data preprocessing to generate data, which is conducive to the fault diagnosis of turbine generator bearings. Therefore, the time-frequency graph generated by STFT is used as the input of the model in this paper.

4.4 Comparison of different data enhancement methods

To highlight the superiority of MCGAN for the bearing data enhancement of wind turbines, the present method is compared with the commonly used data enhancement methods, SMOTE, GAN, and stacked autoencoder SAE. Due to the small training sample size, to prevent serious overfitting, random noise is added to the data during GAN and SAE training. Four methods were used to expand the three health state samples OF, IF, and BF in the training set to 100, and the data sets generated by SMOTE, GAN, SAE, and MCGAN were named D1, D2, D3, and D4, respectively. The t-SNE results of the generated data are shown in Fig. 8.

Fig. 8
figure 8

Different methods to generate data visualization

As can be seen from Fig. 8, SMOTE-generated sample D1 was concentrated between the training set samples, and the generated data diversity was poor. GAN and SAE generated samples D1 and D2 diversity is better than SMOTE but generated data is still around the training set sample. This is because even if random noise is added to the training sample, the amount of data is still insufficient, resulting in overfitting of the model. It can be found that the quality of the data generated by the three methods of SMOTE, GAN, and SAE is greatly affected by the quality of the training samples, and the robustness is poor. The diversity of data generated by MCGAN is the best, which helps in the fault diagnosis of turbine generator bearings.

4.5 Enhancement effect of data with different sample balance degrees

To further verify the superiority of MCGAN in enhancing the bearing data of wind turbine generators under the condition of unbalanced data, training sets U1–U4 with different proportional and balanced degrees are set, as shown in Table 4. SMOTE, GAN, SAE, and MCGAN were used to enhance the data sets with different degree of imbalance, respectively. The four methods were used to enhance the U1–U2 four degree of imbalance data sets, to generate FID, SSIM, MMD, and KL divergence between the sample and the real sample to evaluate the enhancement effect. The average evaluation index of the data set generated by IR, OR, and B categories is shown in Fig. 9.

Table 4 Dataset of different proportional equilibrium degrees of wind turbine bearings
Fig. 9
figure 9

Evaluation indicators a FID, b SSIM, c MMD, d KL of U1–U4 data set enhanced by different data enhancement methods

As can be seen from Fig. 9, with the decrease of data imbalance ratio, FID, MMD, and KL of the four methods gradually decreased, while SSIM gradually increased, indicating that the data enhancement effect of the four methods has been improved, and the enhancement effect of MCGAN proposed in this paper is still the best. Compared with the same degree of imbalance data set, SMOTE has the worst effect and MCGAN has the best effect. It shows. The method proposed in this paper can effectively solve the problem of data imbalance in practical engineering.

4.6 Fault diagnosis

To further test the validity of MCGAN-generated data. Use SMOTE, GAN, SAE, and MCGAN to generate datasets D1, D2, D3, and D4 to train common classifiers SVM, MLP, Alexnet, and ResNet, and perform diagnostics on test sets. The fault diagnosis results of four classifiers on four data sets are shown in Fig. 10.

Fig. 10
figure 10

Fault diagnosis accuracy of four methods on four data sets

As can be seen from Fig. 10, SMOTE, GAN, and SAE generated data to train the four classifiers has an unsatisfactory effect on the test set. This is because the generated sample does not learn the distribution of the real sample, resulting in a poor effect on the trained classifier. The accuracy of the four classifiers on the data generated by MCGAN is above 80%, among which SVM is a machine learning method with an accuracy of 81.4%, and ResNet is a deep learning method with the best effect with an accuracy of 93.7%. The superiority of MCGAN data enhancement is further explained.

5 Conclusions

Given the problems such as the poor diagnostic ability and generalization ability of the fault diagnosis method of the turbine generator bearing, which was caused by the complex working condition of the turbine generator bearing and few fault samples under actual working conditions, this paper proposes a data enhancement method of the turbine generator bearing data set based on MCGAN. The following conclusions are drawn through comparative experiments.

  1. (1)

    MCGAN obtains generated samples by learning the distribution characteristics of time-frequency graphs in the time domain and frequency domain of STFT. Generated samples contain fault characteristics, and t-SNE visualization finds that generated samples are more diverse, which is conducive to bearing fault diagnosis.

  2. (2)

    The distribution difference between the generated sample and the real sample is measured by the divergence of FID, SSIM, MMD, and KL. Compared with common data enhancement methods, SMOTE, GAN, and SAE, the results show that the MCGAN-generated sample distribution is the closest to the real sample distribution, and the data enhancement effect is the best.

  3. (3)

    Using data sets expanded by different methods to train fault diagnosis classification models. Fault diagnosis is performed on the test set composed of real samples. The results show that the results of the model generated by MCGAN are all higher than 80%, which further indicates the superiority of MCGAN model data enhancement and can solve the problem of fault diagnosis with few samples in engineering.