1 Introduction

Recent decades renewable energy sources have received increasingly wide attention. As one of the most promising new clean renewable energy sources, wind power generation is in large-scale development around the world [1,2,3,4]. However, wind turbines are prone to various failures due to long-term operation under tough conditions, complex alternating loads and variable speeds [5]. The bearing is a critical component of a wind turbine and bearing failures form a significant proportion of all failures in wind turbines. These, can lead to outage of the unit and a high maintenance cost [6, 7]. Hence, the development of an accurate fault diagnosis method for wind turbine bearing would be extremely valuable for improving safety and economy.

Vibration analysis is an effective condition monitoring method, especially suitable for rotating machinery. So far, a vast number of vibration signal processing methods have been used in fault detection of the gear box and bearing for wind turbines, such as spectrum analysis [8], wavelet transform [9], Wigner-Vile distribution [10] and empirical mode decomposition (EMD) [11]. Compared with the former three methods, EMD performs better in processing the vibration signal but it has the drawback of being time-consuming [12]. Variational mode decomposition (VMD) [13] and empirical wavelet transform (EWT) [14] achieve better signal processing performance than EMD and avoid mode aliasing, and they have strong noise robustness. In the past 20 years, the application of the stochastic subspace identification (SSI) method in vibration signal analysis has also developed very fast [15, 16], especially in the fault diagnosis field related to buildings and rotating equipment [17, 18]. However, the SSI method has so far seldom been used to diagnose a bearing fault for a wind turbine. The SSI method directly constructs a model based on time-domain data and can identify the mode parameters. This is suitable for mining the most essential fault information [19]. In this paper, SSI is employed to extract fault features by processing the collected vibration signal of the wind turbine.

In recent years, intelligent diagnosis methods have been widely applied in diagnosing a bearing fault in a wind turbine. As one of the artificial intelligence methods, the support vector machine (SVM) has many special advantages in solving with small samples, nonlinear and high dimensional pattern recognition. Reference [20] used SVM to classify different states of a rolling bearing and conducted experiments to validate the effectiveness of the proposed method. Reference [21] proposed a method based on wavelet packet and a locally linear embedding algorithm to extract fault features, and then intelligently classified the different fault degrees of a rolling bearing using SVM. Reference [22] combined EMD and SVM to identify different fault states of a rolling bearing. A single kernel function is used in the traditional SVM method to solve the classification problem of simple data. However, traditional SVM cannot effectively solve a complex classification problem, especially for a heterogeneous and imbalanced data classification problem. In order to improve the performance of SVM, a multi-kernel support vector machine (MSVM) is adopted for pattern recognition in this paper. MSVM not only integrates the generalization ability but also the self-learning ability of the traditional single kernel SVM [23,24,25], thus having better adaptability and robustness.

We use wind turbine bearing vibration data to construct the subspace model and to realize the bearing feature extraction, then use the MSVM algorithm to classify the feature parameters for bearing fault diagnosis. The paper is organized as follows. The principles of SSI and MSVM are respectively introduced in Section 2 and Section 3. The procedure of fault diagnosis method based on SSI-MSVM is described in Section 4. Diagnostic performance is tested by applying the SSI-MSVM method to a bearing experimental signal of a wind turbine in Section 5. Finally, conclusions are drawn in Section 6.

2 Principle of SSI

2.1 Stochastic state-space model

The SSI method is a black box identification method using collected data to establish a linear state space model, and is very suitable for extracting features of vibration signals. The stochastic state-space model can be described as follows:

$$\begin{aligned} \left\{ {\begin{array}{*{20}l} {{\varvec{X}}_{k + 1} = {\varvec{AX}}_{k} + {\varvec{w}}_{k} } \hfill \\ {{\varvec{Y}}_{k} = {\varvec{CX}}_{k} + {\varvec{v}}_{k} } \hfill \\ \end{array} } \right. \hfill \\ \hfill \\ \end{aligned}$$
(1)

where XkRn and YkRl are the state variable and output of the system in discrete time k, respectively; ARn×n is the system matrix, which describes the dynamic behaviour of the system; CRl×n is the output matrix of the system; wkRn and vkRl are system noise and measurement noise, respectively.

2.2 SSI method

The SSI method can be completed in three steps: orthogonal projection, singular value decomposition and system parameter estimation.

2.2.1 Orthogonal projection

First, a block Hankel matrix Y composed of the measurement signal is defined as follows:

(2)

where yk (k=1, 2, …, i+j+N) is the collected signal; Yp is the past part in matrix Y and its row number is i; Yf is the future part in matrix Y and its row number is j+1; both i and j are generally not less than the maximum order of the system model, namely, \(\hbox{min} \{ i,j + 1\} \ge n\); and the column number of the block Hankel matrix N is generally far greater than i and j+1, namely, \(N \gg \hbox{max} \{ i,j + 1\}\).

Matrix Y is newly divided into two blocks as follows:

(3)

where \({\varvec{Y}}_{p}^{ + }\) is the new past part generated by shifting the first row of Yf into the last row of Yp\({\varvec{Y}}_{f}^{ - }\) is the new future part generated by deleting the last row of Yf .

Yf is orthogonally projected to the space of Yp by the following equation:

$${\varvec{P}}_{m}\, \triangleq\, \frac{{{\varvec{Y}}_{f} }}{{{\varvec{Y}}_{p} }}\,=\,{\varvec{Y}}_{f} {\varvec{Y}}_{p}^{\text{T}} \left( {{\varvec{Y}}_{p} {\varvec{Y}}_{p}^{\text{T}} } \right)^{\dag } {\varvec{Y}}_{p}$$
(4)

where \(( \bullet )^{\dag }\) denotes the Moore-Penrose inverse matrix.

Similarly, \({\varvec{Y}}_{f}^{ - }\) is orthogonally projected to the space of \({\varvec{Y}}_{p}^{ + }\) as follows:

$${\varvec{P}}_{m - 1}\, \triangleq\, \frac{{{\varvec{Y}}_{f}^{ - } }}{{{\varvec{Y}}_{p}^{ + } }}\,=\,{\varvec{Y}}_{f}^{ - } \left( {{\varvec{Y}}_{p}^{ + } } \right)^{\text{T}} \left( {{\varvec{Y}}_{p}^{ + } \left( {{\varvec{Y}}_{p}^{ + } } \right)^{\text{T}} } \right)^{\dag } {\varvec{Y}}_{p}^{ + }$$
(5)

2.2.2 Singular value decomposition

Singular value decomposition is adopted to analyze the projection matrix Pm as follows:

$${\varvec{P}}_{m} = \left[ {\begin{array}{*{20}c} {{\varvec{U}}_{1} } & {{\varvec{U}}_{0} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\varvec{S}}_{1} } & {\mathbf{0}} \\ {\mathbf{0}} & {{\varvec{S}}_{0} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\varvec{V}}_{1}^{\text{T}} } \\ {{\varvec{V}}_{0}^{\text{T}} } \\ \end{array} } \right]$$
(6)

where \(\varvec{U}_{1}\) and \(\varvec{V}_{1}\) are unitary matrices; \(\varvec{S}_{1} = {\text{diag}}\{ \sigma_{1} ,\sigma_{2} , \cdots ,\sigma_{i} , \cdots ,\sigma_{n} \}\) is the diagonal matrix and σi is the ith singular value of S1; U0, V0, S0 are null matrices.

The projection matrix Pm can also be expressed as the product of the observable matrix Гm and the Kalman filter sequence \(\hat{\varvec{X}}_{m}\).

$${\varvec{P}}_{m} = {\varvec{{\varGamma}}}_{m} {\hat{\varvec{X}}}_{m}$$
(7)

Similarly, the rejection matrix \({\varvec{P}}_{m - 1}\) can also be expressed as follows:

$${\varvec{P}}_{m - 1} = {\varvec{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varGamma } }}_{m} {\hat{\varvec{X}}}_{m + 1}$$
(8)

Where \({\varvec{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varGamma } }}_{m}\) is the new observable matrix generated by deleting the last row of Гm.

According to (6) and (7), the observable matrix Гm and the state variable \(\hat{\varvec{X}}_{m}\) can be inferred.

$${\varvec{\varGamma}}_{m} = {\varvec{U}}_{1} {\varvec{S}}_{1}^{{\frac{1}{2}}}$$
(9)
$${\hat{\varvec{X}}}_{m} = {\varvec{S}}_{1}^{{\frac{1}{2}}} {\varvec{V}}_{1}^{\text{T}}$$
(10)

The state variable \(\hat{\varvec{X}}_{m + 1}\) can also be obtained from (8),

$${\hat{\varvec{X}}}_{m + 1} = \left( {{\varvec{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varGamma } }}_{m} } \right)^{\dag } \varvec{P}_{m - 1}$$
(11)

2.2.3 System parameter estimation

By plugging state variable \(\hat{\varvec{X}}_{m}\) and \(\hat{\varvec{X}}_{m + 1}\) into the stochastic state-space model, the following formula can be obtained:

$$\left[ {\begin{array}{*{20}c} {{\hat{\varvec{X}}}_{m + 1} } \\ {{\varvec{Y}}_{m\left| m \right.} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\varvec{A}} \\ {\varvec{C}} \\ \end{array} } \right]{\hat{\varvec{X}}}_{m} + \left[ {\begin{array}{*{20}c} {{\varvec{\rho}}_{w} } \\ {{\varvec{\rho}}_{v} } \\ \end{array} } \right]$$
(12)

where ρw and ρv are residual vectors and not related to \(\hat{\varvec{X}}_{m}\).

Then the system matrix A and output matrix C can be estimated by applying the least squares method as follows:

$$\left[ {\begin{array}{*{20}c} {\varvec{A}} \\ {\varvec{C}} \\ \end{array} } \right] \approx \left[ {\begin{array}{*{20}c} {{\hat{\varvec{A}}}} \\ {{\hat{\varvec{C}}}} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {{\hat{\varvec{X}}}_{m + 1} } \\ {{\varvec{Y}}_{m\left| m \right.} } \\ \end{array} } \right]\left( {{\hat{\varvec{X}}}_{m} } \right)^{\dag }$$
(13)

where \({\hat{\varvec{A}}}\) and \({\hat{\varvec{C}}}\) denote the estimates of A and C, respectively.

In (13), the matrix A contains the feature information of the system model built by the bearing vibration data. That is, the eigenvalues of matrix A correspond to different fault modes.

The eigenvalue decomposition of the matrix A is given by:

$$\varvec{A} = \varvec{U\varSigma V}^{\text{T}}$$
(14)

where U is the left singular matrix of the system matrix A; VT is the transpose of the right singular matrix of the system matrix A; \(\varvec{\varSigma}= {\text{diag}}\{ \lambda_{1} ,\lambda_{2} , \cdots ,\lambda_{n} \}\) is a diagonal matrix and λi is the ith singular value of the system matrix A.

3 Principles of standard SVM and MSVM

3.1 Standard SVM

The basic principle of the SVM is to map the input samples from original space to high dimensional feature space by a kernel function. Given a sample set\(\{ (\varvec{x}_{1} ,y_{1} ), \, (\varvec{x}_{2} ,y_{2} ), \, \cdots , \, (\varvec{x}_{n} ,y_{n} )\}\), where xiRn is the input samples and yi∈{−1, +1} is the output samples, SVM maps the input samples to the n dimensional feature space using the kernel function, in which the optimal classification hyper plane \(\sum\limits_{i = 1}^{n} {w_{i} k(\varvec{x},\varvec{x}_{i} )}\) is constructed, where wi is the ith element of the coefficient vector w. The classification interval in optimal hyper plane 2/||w|| is expected to achieve a maximum in the SVM method. By introducing slack variables, the optimal hyper plane is transformed into the following constrained optimization problem:

$$\left\{ \begin{aligned}& \hbox{min} \;J(\varvec{w},\varvec{e}) = \frac{1}{2}{\varvec{w}}^{\text{T}} {\varvec{w}} + C\sum\limits_{i = 1}^{n} {e_{i}^{2} } \hfill \\ &{\text{s}} .{\text{t}} .\;\;y_{i} ({\varvec{w}}^{\text{T}}\varvec{\phi}({\varvec{x}}_{i} ) + b) \ge 1 - e_{i} \hfill \\& \quad \,\,\,\,e_{i} \ge 0 \hfill \\ \end{aligned} \right.$$
(14)

where \(i = 1,2, \cdots ,n\); ei is the ith element of the slack variable vector e; C is penalty coefficient; b denotes the distance from the hyper plane to the origin; \(\varvec{\phi}( \cdot )\) is mapping function.

The Lagrange multiplier \(\alpha_{i}\) is used to transform the constrained optimization problem into the dual optimization problem. Thus, the final classification decision function is described as follows:

$$f(\varvec{x}) = \text{sgn} \left( {\sum\limits_{i = 1}^{n} {y_{i} \alpha_{i} k(\varvec{x},\varvec{x}_{i} )} + b} \right)$$
(15)

3.2 MSVM

In the process of fault identification using the SVM, the selection of kernel function is a key segment. Different kernel functions correspond to different discriminant functions, which directly affect the identification accuracy of the SVM. The kernel functions of the SVM mainly include a local kernel function and a global kernel function. And the Gauss kernel function is a typical local kernel function and can be described as follows:

$$K_{RBF} (\varvec{x}_{i} ,\varvec{x}_{j} ) = \exp \left( {\frac{{ - \left\| {\varvec{x}_{i} - \varvec{x}_{j} } \right\|}}{{\sigma^{2} }}} \right)$$
(16)

where σ is the kernel parameter.

As a typical global kernel function, the polynomial kernel function can be described as follows:

$$K_{ploy} (\varvec{x}_{i} ,\varvec{x}_{j} ) = \left( {\varvec{x}_{i}^{\text{T}} \varvec{x}_{j} + 1} \right)^{d}$$
(17)

where d is the kernel parameter.

A single kernel function is used in the traditional SVM method and can solve the classification problem of simple data. However, the traditional SVM has some limitations for a complex classification problem such as bearing fault diagnosis. In order to improve the performance of the SVM, it is proposed to combine the local and global kernels to construct the MSVM, which can be described as follows:

$$K_{\hbox{min} } (\varvec{x}_{i} ,\varvec{x}_{j} ) = \lambda K_{RBF} (\varvec{x}_{i} ,\varvec{x}_{j} ) + (1 - \lambda )K_{ploy} (\varvec{x}_{i} ,\varvec{x}_{j} )$$
(18)

where λ (0<λ<1) is the tuning parameter.

According to (18), it can be readily seen that the multi-kernel function degenerates into the Gauss kernel function under the condition that λ=0 and into the polynomial kernel function under the condition that λ=1. The multi-kernel function can adapt to different input samples by adjusting the tune parameter λ. The MSVM integrates prior knowledge of specific problems in the process of selecting the kernel function, and thus it has the ability of learning and generalization combined.

4 Fault diagnosis model based on SSI method and MSVM

In this section, a novel fault diagnosis method based on SSI-MSVM is proposed for the wind turbine bearing. The flow chart of the SSI-MSVM method is shown in Fig. 1.

Fig. 1
figure 1

Flow chart of fault diagnosis method for bearing based on SSI-MSVM

The detailed procedure is described as follows:

  1. 1)

    Vibration signal acquisition. Setting the mounting position of vibration sensors and sampling rate, the vibration signal of the wind turbine bearing is acquired through a signal acquisition system.

  2. 2)

    SSI analysis. First, the collected vibration signal is used to construct a stochastic state-space model of a wind turbine bearing as shown in (1). Then, the SSI method is applied to estimate the system matrix A, eigenvalues of which are extracted as fault feature vectors.

  3. 3)

    Training MSVM model. Samples in different states are taken as training samples to train the MSVM, establishing the fault diagnosis model. The trained model can be used to distinguish different patterns.

  4. 4)

    Fault diagnosis. The test samples are input to the trained fault diagnosis model for classification, and the working state and fault type of the wind turbine bearing can be determined according to the output of the MSVM model.

5 Experiment analysis

To validate the practicability and effectiveness of SSI-MSVM in fault diagnosis, bearing fault experiments are conducted in a wind turbine test rig, which is shown in Fig. 2. The test rig is composed of wheel hub, main shaft, gearbox and generator. The main shaft is supported by two rolling bearings, which bear mainly the radial and also a certain axial load. An acceleration sensor, X&K AD500T, is mounted on the bearing pedestal to acquire vibration signals and the sampling frequency is 10 kHz. Parameters of X&K AD500T are as follows: measuring range of 25g (g = 9.8 m/s2), sensitivity of 500 mV per g, frequency range of 0.3-12000 Hz and resolution of 0.004g. To simulate the typical faults of a wind turbine bearing, a spark erosion technique is used to seed a pit in the inner race, roller and outer race.

Fig. 2
figure 2

Structure diagram of wind turbine test rig

In Figs. 3 and 4, the two sets of vibration signals are collected from wind turbine bearings in different conditions. They respectively simulate strong fault and weak fault modes of the bearing. Compared with the time waveform of the normal vibration signal, it can be seen that there are irregular and large amplitude shock signals in strong faults in Fig. 3, but in Fig. 4 the shock signals of weak faults are not obvious.

Fig. 3
figure 3

Time waveform of bearing vibration signal in normal and strong faults

Fig. 4
figure 4

Time waveform of bearing vibration signal in weak faults

For the SSI method, the length of each data sample should be long enough to satisfy the validity of fault feature extraction. So, each sample is set to contain 9000 data points in processing the vibration signal of the wind turbine bearing. Both the Hankel matrix Yp and Yf are 9×8991 dimensional matrices. The projection matrix is analysed by singular value decomposition and the results indicate that the order of the stochastic subspace model is 5. Then the SSI method is applied to process the collected signal and eigenvalues of the system matrix are obtained. Figure 5 shows the features of the bearing vibration signal in different conditions extracted by the SSI model. It can be seen that all the eigenvalues are distributed in the unit circle, which indicates that the identified bearing stochastic subspace model is stable and effective. At the same time, the distributed locations of eigenvalue corresponding to normal mode and different fault modes are non-coincident which illustrates that these different eigenvalues can be discriminated by a clustering algorithm.

Fig. 5
figure 5

Features extracted by SSI in different conditions

For strong faults and weak faults of bearing in this study, 320 samples gained from an experimental roller bearing are respectively adopted to verify the superior diagnostic performance of the SSI-MSVM method compared with that of K-means clustering, fuzzy means clustering (FCM) and traditional SVM. All data are divided into two data sets: the training and test, in which the training data sets including 120 samples are used to calculate the fitness function and train the diagnosis model, and the test data sets are used to examine the classification accuracy of each model.

In order to prove the stability of the experiment, six groups of comparative experiments with different unlabeled testing sample sets are operated. Each experiment is carried out 10 times. The average test accuracies of different pattern identification methods are presented in Table 1. Figure 6 shows the visual expression of the comparison result between the proposed method and other methods.

Table 1 Fault diagnosis result comparison among K-means clustering, FCM, SVM and MSVM
Fig. 6
figure 6

Classification accuracy comparison with different pattern recognition methods

From Table 1 and Fig. 6, it can be seen that the fault diagnosis accuracy of the SSI-MSVM method is in the range of 90.27%-92.73%, while that of K-means clustering, FCM and SVM are in the range of 82.53%-88.06%, 86.35%-89.32% and 89.25%-91.03%, respectively. Overall, the SSI-MSVM method has a higher diagnostic accuracy than that of K-means clustering, FCM and SVM, which indicates that the proposed diagnosis model is clearly superior to the traditional diagnosis method.

6 Conclusion

This paper presents a novel fault diagnosis method for a wind turbine bearing based on SSI and MSVM. The SSI method directly constructs a model based on time-domain data and can identify the mode parameters. This is suitable for extracting the fault features. The MSVM is an improved mode recognition method which combines the Gauss kernel SVM and the polynomial kernel SVM, so it can identify the fault types of bearing more accurately.

The results indicate that the SSI-MSVM method is an effective fault diagnosis method for a wind turbine bearing, and can successfully identify fault types of bearing and achieve higher diagnostic accuracy than that of K-means clustering, FCM and traditional SVM.