1 Introduction

With the widespread use of wind turbines (WTs) as renewable energy systems, it is now important to include control and supervision in the system design. Fault detection and isolation (FDI) of WTs allows reducing maintenance costs, which is particularly important for offshore WTs. Online supervision should suggest the best maintenance time as a function of fault occurrence and wind speed in order to reduce operation and maintenance costs. Early detection of faults allows also avoiding degradation of the material and other side effects. Furthermore, fault detection is essential for control reconfiguration in order to ensure optimal power in case of partial fault. Even though the wind turbine functionality might be similar to rotating machinery, it involves a number of difficulties ranging from a high variability in the wind speed, aggression by the environment, measurement difficulties due to noise and vibrations, besides the fact that wind turbines are supposed to run continuously for several years. For these reasons, the development of methods for FDI in WT is increasingly important. Similarly, a number of fault tolerant control (FTC) approaches are also being applied to WT, but this is out of the scope of this paper.

FDI approaches can in general be classified as model-based or data-based: On one hand, model-based methods require a comprehensive model of the system. On the other hand, success of data-based approaches is conditioned by the significance (amount and quality) of historical data and the mathematical method used to detect the patterns in data. However, training data is usually limited to some specific conditions that are typically normal, non faulty data, with limited variations of operating conditions. Limitations of both model-based and data-based approaches can be overcome by combining them in order to ensure optimal supervision. This represents the main idea of the present paper.

Reviews of WT monitoring and fault diagnosis were proposed by [13]. Both data- and model-based approaches were reported. Among model-based approaches, observers were applied to monitoring several parts of wind turbines. Reference [4] proposes an unknown input observer to detect sensor faults around the WT drive train. More focus has been drawn on the electrical conversion system in the wind turbines. Reference [5] proposes an observer-based solution to current and voltage sensors fault detection. Reference [6] presents an FDI solution to faults in a doubly fed wind turbine converter.

A number of approaches were used for FDI in WT, such as neural networks (NN) as well as statistical-based approaches. Neural networks were used for estimation of the generator power by [7]. Reference [8] shows that neural networks had a higher confidence level than polynomial regression-based model for FDI in gearbox bearing damages and stator temperature anomalies in the WT. Reference [9] studies faults related to the accumulation of coal in the coal mill using statistical and dynamic-based approaches. They showed the importance of data selection in the statistical approach for supervision. Reference [10] compares different data-mining algorithms to extract models for FDI of WT (without isolation, except for diverter fault): NN, NN ensemble, boosting tree algorithm, and support vector machine (SVM). In normal situations, SVMs are more accurate in prediction and isolation. But in faulty situations, better prediction is obtained by NN ensemble and better evaluation of its severity by the boosting tree algorithm. The use of frequency domain was also found interesting for FDI of some vibration components in WT. Reference [11] uses the frequency domain model for FDI of tooth crack in the planetary gear using spectral methods.

The WT considered in this work is a horizontal axis variable speed turbine composed of three blades for which a benchmark was proposed by the companies kk-electronic and MathWorks and the University Aalborg[12]. Different faults are likely to occur in this benchmark: sensor faults (pitch positions, generator and rotor speeds), actuator faults (pitch positions, convertor torque) and system faults (drive train). These faults could be type stuck, scaled measurements or subject to offset (e.g., calibration error, interruption in data transmission and degradation of some components). Based on this benchmark, different solutions for FDI of the WT were proposed at an International Federation of Automatic Control (IFAC) competition in 2011[1318]. The proposed solutions were satisfactory only for part of the possible faults. No solution was found convenient for faults related to the actuator of the convertor torque and system faults. Reference [13] only considered sensor faults using SVM for supervision. This method showed the best results in terms of detection time and number of false alarms for faults of type stuck measurements. The second best solution was proposed by [14] which is based on an estimation approach. The third solution is based on up-down counter solution given by [15]. Other approaches were also used, such as the concepts of sensitivity matrix[16], piecewise affine models for pitch sensors[17], parity equations followed by H optimization[18], a data-based method[19], and a method based on Kalman filter[20].

In this work, both observers (Kalman-like) and a data-based approach (SVM) are used for FDI in WT using the mentioned benchmark. The paper is organized as follows. In Section 2, basic hints about SVM classification and Kalman-like observer are given. In Section 3, the wind turbine is described and the locations and types of faults are defined. In Section 4, SVM and observer implementation are presented showing the different tuning levels. In Section 5, simulation scenarios are presented to evaluate the efficiency and limitations of the proposed methodology using a real wind speed sequence.

2 Theoretical background

The objective of this work is to combine data-based and model-based approaches for FDI in WT. Among data-based approaches for FDI appear artificial neural networks[8], and statistical methods such as principal component analysis[21], partial least square (PLS) and more recently support vector machines (SVM). This last approach will be considered due to its robustness and fastness, which makes it valid for online applications.

In model-based approaches, the use of observers represents the first choice for FDI[2224]. Various observer structures have been proposed in the literature for linear and nonlinear systems. Among the proposed structures, the unknown input observer is widely used[25]; also appears the eigenstructure assignment approach[26] or the sliding mode observer[27]. For FDI in nonlinear applications, the extended Kalman filter was firstly applied without theoretical validation[28]. The theoretical FDI problem for nonlinear systems was initially introduced by [29]. Since that time, many techniques have been proposed for nonlinear systems. Garcia and Frank[30] gave a survey on the principal observer-based approaches to fault diagnosis of nonlinear systems. The authors in [3133] proposed solving the FDI problem by combining the geometric decoupling techniques with the nonlinear observer synthesis. The observer form proposed in the present paper is obtained by input-output injection linearization.

2.1 Support vector machines

SVMs are based on the structural risk minimization principle using the statistical learning theory introduced in 1964 by Vapnik and Chervonenkis[34]. Only recently, SVMs were introduced as machine learning algorithms for classifying data from two different classes[35,36]. Basically, a binary support vector classifier constructs a separating hyperplane. The hyperplane should have the maximum margin which is the width up to which the boundary can be extended on both sides before it hits any data point. These contact points are called the support vectors. In order to allow classifying nonlinearly separable sets, a nonlinear kernel function can be used. The main differences between SVM and many other statistical methods are therefore: First, the structural risk minimization (training by traditional classifiers usually minimizes only the empirical risk) that improves the ability of generalization even with a reduced number of samples and avoids over-fitting in view of good parameter tuning. Second, SVMs use nonlinear kernels which allow separation of nonlinearly separable data. SVMs have been extensively used to solve classification problems in many domains ranging from face, object and text detection and categorization, information and image retrieval and so on. Their use for fault detection started in 1999 and was found to improve the detection accuracy. Reference [37] presents a review about the use of SVMs for fault detection. They reported 37 papers in academic journals on this subject. Nowadays, the number of journal papers using SVMs for fault detection has importantly increased. The concerned domains are in majority restricted to mechanical machinery with slight extension to electro-mechanical machinery, semi-conductors and chemical processes[3840].

Consider the problem of separating the set D composed of N training vectors belonging to two classes (Ω1, Ω2)(for more details see [3436])

$$\matrix{{D = \{ ({x_1} + {z_1}),({x_2},{z_2}), \cdots, ({x_N},{z_N})\}, {\Omega _1} = \{ {x_i}|{z_i} = + 1\} } \cr {{\Omega _2} = \{ {x_i}|{z_i} = - 1\}, \quad i = 1, \cdots, N} \cr }$$

where x i R p denote the input vectors, each vector being characterized by a set of p descriptive variables x i ∈ {x i1, x i2, ⋯, x ip }, and z i ∈ {−1,+1} defines the class label of a given vector x i . The purpose of SVM is to find an optimal separating hyperplane f(x) that maximizes the margin \(({1 \over {||w||}})\) between the hyperplane and the data points from each side such that all points of the same class are on the same side of the hyperplane (Fig. 1). The support vectors correspond to points located exactly at a distance equal to the margin. The weight w is a p-dimensional vector orthogonal to the hyperplane. Since it is not always possible to perfectly separate the data (for instance, due to measurement noise/errors), a slack variable ζ i is introduced to relax the margin constraints and allow misclassification. ζ i measures the degree of misclassified vectors (lying on the wrong side of the hyperplane or inside the margin). In this case, the optimisation problem for soft margin classification can be written as follows (linearly separable data with error tolerance):

$$\mathop {\min }\limits_{w,b,\zeta } \left( {{1 \over 2}||w|{|^2} + C\sum\limits_{i = 1}^N {{\zeta _i}} } \right)$$
(1)

subject to \(\left\{ {\matrix{{{z_i}f({x_i}) \geqslant 1 - {\zeta _i}} \cr {{\zeta _i} \geqslant 0} \cr } } \right.\), where f(x i ) is the predicted output and C ⩾ 0 is a regularization parameter that governs the tolerance to misclassification. Increasing the value of C will increase the cost of misclassifying points but reduce the importance of minimizing the model complexity (minimizing ∥w2). It can be tuned by optimisation and cross validation. If criterion ∥w2 is convex and all constraints are linear, this problem can be solved by constructing a Lagrange function. Solving the dual optimization problem gives the Lagrangian multipliers (α i ), the support vectors and b (the bias). According to the Karush-Kun-Tucker complementary condition, the solution must satisfy: α i [z i f(x) − 1 + ζ] = 0 which means either α i = 0 or z i f(x) − 1 + ζ i = 0. The latter condition corresponds to the support vectors (inputs lying on the margin), where α i ≠ 0.

Fig. 1
figure 1

SVM classification of two linearly separable classes (here x i R 2)

For nonlinearly separable data, which is the case of many real problems, the data can be mapped by some nonlinear function φ(x) into a high-dimensional feature space where linear classification becomes possible. Rather than fitting nonlinear curves to the data, SVMs handle this by using a kernel function K(x i , x) ⩽ φ(x i ), φ(x) > to map the data into a different space, where a hyperplane can be used to do the separation. The obtained decision function is

$$f\left( x \right){\rm{ }} = \left\langle {w,\phi (x)} \right\rangle + b = {\rm{ }}\sum\limits_{i = 1}^N {{\alpha _i}{z_i}K({x_{\sup \,i}},x) + b} $$
(2)

with the properties

$$w = \sum\limits_{i = 1}^N {{\alpha _i}{z_i}\phi ({x_{\sup \,i}})}$$

where b is the bias term (a scalar). It is clear that this decision function is only influenced by the non-zero α i (support vectors). This gives two features to the SVM algorithm: ability of adjusting the error with a reduced training set, and fast computation in decision-making (allowing online implementation). Therefore, N can be replaced by n sv (size of support vectors x sup) in (2). The kernel function can be any function that satisfies Mercer’s theorem, namely any continuous positive definite function can be considered as a kernel function that represents an inner product function in some space. The Gaussian kernel (which is a radial basis function) is the most widely used:

$$K\left( {{x_i},x} \right) = {{\rm{e}}^{{{ - {{\left\| {{x_i} - x} \right\|}^2}} \over {2{\sigma ^2}}}}}$$

where σ is the variance. A small σ is known to perfectly fit the training data but to be unable to evaluate the fault for new data (overfitting, reduced ability of generalization). It can also be tuned by optimisation and cross validation.

2.2 Kalman-like observer

The Kalman-like observer was developed by [38] for a class of nonlinear systems, where the state matrix may depend on the inputs; outputs or on time and all the inputs are regularly persistent. The observer equations are based on the minimization of a quadratic convex criterion. For such a class of nonlinear systems, the Kalman-like observer is easier to implement than the Kalman filter since the gain matrix relies on a unique tuning parameter, which justifies the choice of this observer for this FDI in WT. Also, no need for a change of variables is required since this system is under a canonical form of observability.

Consider the following nonlinear system:

$$\left\{ {\matrix{{\dot x = A(u,y)x + G(u)} \cr {y = Cx} \cr } } \right.$$
(3)

with a single output, yR. The state matrix A might depend on the inputs and the outputs. Let us call φ u (s, t 0) the unique solution of

$${{{\rm{d}}{\phi _u}(S,{t_0})} \over {{\rm{d}}s}} = A(u(s)){\phi _u}(S,{t_0})$$

with (φ u (t 0, t 0) = I the identitymatrix and \({\phi _u}({t_0},s) = \phi _u^{ - 1}(s,{t_0})\). We denote φ u (s, t) = φ u (s, t 0)φ u (t 0, t) and G(u, t 0 , t 0 + t 1) is the Gramian of observability related to the input u on the interval [t 0, T]:

$$G(u,{t_0},{t_0} + {t_1}) = \int_{{t_0}}^{{t_0} + {t_1}} {\phi _u^{\rm{T}}} (t,{t_0}){C^{\rm{T}}}C{\phi _u}(t,{t_0}){\rm{d}}t$$

where T stands for the transposed matrix or vector.

Definition 1. An input uR m is regularly persistent for system 3 if ∃t 1 > 0, ∃α 1 > 0, ∃α 2 > 0 and ∃t 0 ⩾ 0 such that ∀tt 0:

$$\matrix{{{\lambda _{\min }}(G(u,{t_0},{t_0} + {t_1})) \geqslant {\alpha _1}} \cr {{\lambda _{\max }}(G(u,{t_0},{t_0} + {t_1})) \leqslant {\alpha _1}} \cr }$$

where λ min and λ max stand for the less and largest eigenvalues of G, respectively.

Theorem 1[41]. If u is regularly persistent, then

$$\left\{ {\matrix{{\dot \hat x = A(u)\hat x + G(u) - {R^{ - 1}}{C^{\rm{T}}}(Cx - y)} \cr {\dot R = - \theta R - {A^{\rm{T}}}(u)R - RA(u) + {C^{\rm{T}}}C} \cr } } \right.$$
(4)

is an observer for system 3 with θ > 0, \(\hat x(n) \in {{\bf{R}}^n}\). Moreover, the norm of the estimation error goes exponentially to zero. The tuning parameter of the Kalman-like observer is θ, which must be superior than zero. The convergence of the observer is guaranteed if matrix R is a symmetric positive definite matrix.

3 Wind turbine description

A horizontal axis variable speed turbine composed of three blades is considered in this work[12]. The system has a full converter coupled to a generator that allows converting the mechanical energy to electrical energy. A drive train is used to increase the rotational speed from the rotor (the three blades) to the generator.

Sensor faults: The system is equipped with duplicated sensors (Fig. 2) measuring:

  1. 1)

    the three pitch positions (β k,mi , k = 1, 2, 3, i = 1, 2),

  2. 2)

    and the generator and rotor speeds (ω g,mi , ω r,mi , i = 1, 2), where i indicates the sensor number and k the pitch number (each pitch has 2 sensors measuring its position). This gives a total of ten sensors all subject to two kinds of faults: stuck or scaled measurements that are to be detected within 10 sampling periods (desired number of samples for detection \(n_s^{{\rm{des}}} < 10\)), where the sampling time is T s = 0.01 s (Table 1). The process has other sensors, measuring for instance the wind speed and generator power that are not supervised in this work.

Fig. 2
figure 2

Measurements of the WT

Table 1 Fault locations and fault detection results based on few scenarios (\(n_s^{{\rm{des}}}\) and n s are the desired and real numbers of sampling periods for detection, the sampling period being T s = 0.01 s)

Actuator faults: As a function of the wind speed, a control system allows controlling the aerodynamics of the turbine to get the optimal power. The benchmark allows simulating the wind turbine control under normal operation: Zone II: power optimization, and Zone III: constant power production. Zones I and IV correspond to the start and stop operations. The actuators manipulate the three pitch systems and the convertor torque. They allow respectively pitching the blades and setting the generator torque to control the generator and rotor rotation speeds. These actuators are also subject to fault. The converter system that sets the generator torque might have an offset that should be detected rapidly (\(n_d^{{\rm{des}}} < 5\)). The three pitching systems might also have a change in the dynamics that can be due to abrupt change in the hydraulic system or to high air content in the oil at a slower rate.

System faults: In the used benchmark, system faults might also occur for instance in the driving train due to friction changes with time that might break down the train, but this is not considered in this work. In real life, other kinds of faults might also occur, like data transmission, or raw position that are not considered either.

Hints on the model: Fault detection will be studied based on closed-loop simulations in Zones II and III with a real measured sequence of wind of 4400 s. The detailed model of the turbine is given in [12]. Some hints are given bellow. Note that the system contains nonlinear parts, the measurements are noisy and the control system is switched between both zones, which all add difficulties for FDI.

Let us recall the pitch system and converter models that will explicitly be referred to in the fault scenarios. The pitch system is hydraulic and the relation between the measured and desired pitch angle (the reference obtained by the controller) can be modelled by a second-order transfer function:

$${{\beta _k^m(s)} \over {{\beta ^d}(s)}} = {{w_n^2} \over {{s^2} + 2\xi {w_n}s + w_n^2}}$$
(5)

where \(\beta _k^m(s)\) and β d(s) are the measured and desired positions of pitches k = 1, 2, 3, and [ω n , ζ] = [11.11 rad/s, 0.6] are respectively the natural frequency and a damping factor. The pitch rate (β) may take values between Ȓ8 and 8 deg/s and the pitch angle (β) between −2 and −90 deg. As shown bellow, β remains lower than 20° with the used wind sequence in this benchmark.

Similarly, the converter dynamics, represented by the measured to the desired generator torque, can be modelled by a first-order transfer function:

$${{\tau _g^m(s)} \over {\tau _g^d(s)}} = {1 \over {\tau s + 1}}$$
(6)

where \(\tau_g^m\) and \(\tau_{g}^{d}\) are the real and desired generator torques, and τ = 0.02 s is a combined convertor and generator model parameter. The real generator torque, being non measured, is calculated from the measured generator speed ω g,mi and the power produced by the generator P g , which are related by the following equation:

$${P_g}(t) = {\eta _g}{\omega _g}(t){\tau _g}(t)$$
(7)

where η g = 0.98 is the generator efficiency.

4 FDI implementation

First of all, SVM is applied alone to FDI of all faults. For actuator faults, both SVM and a Kalman-like observer are compared.

4.1 Support vector machines

Fault detection and isolation by SVM is developed in two parts: training of models for FDI and validation of the obtained models. The main steps in the development are detailed bellow.

  1. 1)

    Data generation: First of all, a set of measured data x (inputs, references and outputs) without fault or with different fault amplitudes is generated to train models for detection of each fault separately (to ensure isolation). This set was generated using a real wind sequence as input to the benchmark. For each fault, about six scenarios were considered, with different fault amplitudes. Each sample is attributed z=+/−1 (with/without fault) for the considered fault. Note that when a particular fault is considered, normal data might contain faults in the other sensors or actuators.

  2. 2)

    Data pre-treatment: Data filtering was found primordial before model development in order to reduce the sensitivity to process disturbances or measurement noise. A first order filter with a time constant τ was used (filtered data is noted with a hat). Data were not normalized.

  3. 3)

    Features selection: The key step in training SVM models is features selection. The input vector x i used for classification should contain the most pertinent information related to the considered fault. But, all the data cannot be used since the important information might be affected by high variation amplitudes of useless variables. This vector may include inputs, outputs, set-points, combination of both or derivation of the measurements with time. It is important to mention that the selection of x i should insure both fault detection and isolation. In this work, x i is selected based on observing the process outputs for each fault. Note that for each fault a different vector was proposed. Using some statistical analysis such as principal component analysis or partial least square can be useful for pre-treatment, but was not found interesting in the present study since some information was lost during treatment.

  4. 4)

    Parameter tuning: The kernel used for learning all the faults is the Gaussian kernel. Initial values of the kernel variance σ and generalization parameter C are obtained based on the correlations proposed by [42]. These values were then refined based on a few simulations. Cross validation may also be used, but this would require reduction of the data size before optimising the parameters for each fault. Indeed, the wind sequence duration is 4 400 s with T s = 0.01 s, which gives 4 × 105 samples. For parameter tuning, it is well known that high σ values lead to improved generalisation, but very high σ might not fit the data at all. Small σ on the contrary might perfectly fit the learning data (overfitting), but might be unable to evaluate the fault for new data (reduced ability of generalization). For the regularization parameter C, higher values allow more misclassification and minimizing the function complexity.

  5. 5)

    Model development: The SVM learning algorithm then uses the inputs x and their corresponding outputs z to identify α i and the support vectors (x sup,i ) to be used in (2) for decision making. Note that the same “model” (x i and α i ) is used for faults of the same type (e.g., pitch position fault 1a β k,mi , ∀k= 1, 2, 3, ∀i = 1, 2, so one model for 6 sensors). Eight different models were therefore developed.

  6. 6)

    Validation: the obtained SVM models are evaluated in new fault scenarios. Parameter adjustment is done based on the number of false alarms and misdetections.

The features used for learning SVM models are detailed bellow for each fault.

4.1.1 Stuck measurements

Data exchange with the system might be interrupted for few sampling periods, especially in wireless systems, which is frequent in WT. The measurement is therefore stuck at the last exchange with the system. In order to detect such a fault, the use of the derivative of the measurement can be very useful. A wise filter is however necessary in order to overcome difficulties due to measurement noise. The sensors concerned by this type of fault are the pitch position sensor and the rotor and generator speed sensors.

Pitch position sensor. For stuck pitch position sensor fault (fault 1a), the following vector is used for detection and isolation:

$$x = \left[ {\matrix{{\hat \beta _k^{m1}({t_j}) - \hat \beta _k^{m2}({t_j})} \cr {\beta _k^{m1}({t_j}) - \beta _1^{m1}({t_{j - 1}})} \cr {\beta _k^{m2}({t_j}) - \beta _1^{m2}({t_{j - 1}})} \cr } } \right]$$
(8)

where t j and t j −1 are the time instances j and j − 1, respectively, \(\beta _k^{mi}\) is the measured pitch position, and \({\hat \beta }\) is the measured pitch position filtered using τ = 6 × T s s. The first line allows fault detection. The second and third lines allow fault isolation (sensor number 1 or 2). They give a kind of derivative of the sensor measurement (without the division by T s , and using the absolute value).

For practical applications, when

$$|\beta _k^{mi}({t_j}) - \beta _k^{mi}({t_{j - 1}})| = 0$$

this term is replaced by a large constant value (5000) in order to enhance distinguishability between the fixed value fault and normal case (no fault) as these values oscillate between 1×10−2 and 2. For all sensors measuring the pitch positions (\(\beta _k^{mi},k = 1,2,3,i = 1,2\)), the same model is used with the variance σ tuned at 10.

Generator and rotor speed sensors. For stuck rotor (fault 2a) and generator (fault 3a) speed sensor faults (ω g,mi , ω r ,mi i = 1, 2), the following vector is used for detection and isolation:

$$x = \left[ {\matrix{{|\hat \omega _p^{m1}({t_j}) - \hat \omega _p^{m2}({t_j})|} \cr {|\omega _p^{m1}({t_j}) - \omega _p^{m1}({t_{j - 1}})|} \cr {|\omega _p^{m2}({t_j}) - \omega _p^{m2}({t_{j - 1}})|} \cr } } \right],\quad p = g,r$$
(9)

where \({{\hat \omega }_g}\) is obtained using a filter with τ = 2 × T s s and \({{\hat \omega }_r}\) using τ = 60 × T s s. The Gaussian variance is tuned at σ = 15.

4.1.2 Scaled measurements

Scaled faults might occur in the sensors, for instance, due to calibration errors, or drifts in some components of the sensor with time. Therefore, these faults might appear progressively. In order to detect such faults, the difference (in absolute value) between the “desired” and measured values is considered after filtering. It is important to note that in the benchmark these faults are simulated as a multiplicative gain (\(\beta _k^{mi} = k \times {\beta _k}\)). Therefore, once the drift attains a sufficiently detectable limit, it can be detected and isolated. When the measurement equals zero, the faulty measurement coincides with the real one, even though the sensor might be defected. Therefore, in the considered scenarios, the faults are introduced at instances where \(\beta _k^{mi} \ne 0\), otherwise it is not detected.

Pitch position sensors. Drift in the pitch position sensor fault (fault 1b) is detected and isolated in two steps: First of all, the fault is detected using the following measurement vector

$$x = \left[ {\matrix{{|\hat \beta _k^{m1}({t_j}) - \hat \beta _k^{m2}({t_j})|} \cr {|\beta _k^{m1}({t_j}) - \beta _k^{m1}({t_{j - 1}})|} \cr {|\beta _k^{m2}({t_j}) - \beta _k^{m2}({t_{j - 1}})|} \cr } } \right].$$
(10)

The second and third lines in 8 are important in order to exclude faults of type stuck sensor (fault 1a). In a second step, if a fault of type b is detected, for isolation between sensors 1 and 2, the following vector is used:

$$x = \left[ {\matrix{{|{{\hat \beta }^d}({t_j}) - \beta _k^{m1}({t_j})|} \cr {|{{\hat \beta }^d}({t_j}) - \beta _k^{m2}({t_j})|} \cr } } \right]$$
(11)

where β d is the desired value of the pitch angle and \({\hat \beta }\) is obtained by filtering using a first order filter with τ = 0.08 s.

Generator and rotor speed sensors. For the detection of generator and rotor speed sensor faults of type b (scaled measurement), excluding faults of type a (stuck measurement), the following vector is first applied:

$$x = \left[ {\matrix{{|\hat \omega _p^{m1}({t_j}) - \hat \omega _p^{m2}({t_j})|} \cr {|\hat \omega _p^{m1}({t_j}) - \hat \omega _p^{m1}({t_{j - 1}})|} \cr {|\hat \omega _p^{m2}({t_j}) - \hat \omega _p^{m2}({t_{j - 1}})|} \cr } } \right],\quad p = g,r.$$
(12)

In a second step, isolation between sensors 1 and 2 is done using the following vector

$$x = \left[ {\matrix{{{{{{\widehat\tau }^m}_g \times {{\hat \omega }_{p,m1}}({t_j})} \over {\hat P_g^m}}} \cr {{{{{\widehat\tau }^m}_g \times {{\hat \omega }_{p,m2}}({t_j})} \over {\hat P_g^m}}} \cr } } \right],\quad p = g,r$$
(13)

where \(P_g^m\) is the measured power of the generator. The measurements are filtered with τ = 6 × T s s for the estimation of faults of \({{\hat \omega }_s}\) and using τ = 60 × T s s for \({{\hat \omega }_r}\).

4.1.3 Actuators

Offset in the generator torque actuator. The generator torque actuator fault (fault 4) is assumed to be an offset in the benchmark, which is comparable to scaled measurements. Therefore, comparison to the desired value is considered:

$$x = \left[ {\matrix{{|\tau _g^d - \hat \tau _g^m|} \cr {{\lambda _2} \times |\hat \omega _g^d - {{\omega _g^{m1} + \omega _g^{m2}} \over 2}|} \cr } } \right].$$
(14)

In the first line, the measured generator torque (\(\tau _g^m\)) is compared to the desired one (\(\tau_g^d\)), and in the second line, the measured generator speed is compared to the mean measured values from the two sensors ω g,mi . The desired generator speed is calculated from the desired generator torque \(\tau_g^d\) (7) which gives

$$\omega _g^d = {{P_g^d} \over {\tau _g^d{\eta _g}}}$$

with \(P_g^d\) the desired generator power. \(\tau_g^d\) is filtered using a first order filter, τ = 2 × T s s is a time constant. The objective of this filter is to take into account the dynamics of the control system (time necessary for \(\tau_g^m\) to attain (6). The factor \({\lambda _2} = {10^{ - 10}} \times \nu _{{\rm{wind}}}^6\) in the 2nd component of x is used to take into account the wind speed with a kind of normalization with respect to the first term. The kernel variance is tuned at σ = 10.

Scaled pitch position actuator. Pitch position actuator faults (fault 5) might be due to abrupt change in the hydraulic system or to high air content in the oil which appears at a slower rate. Both types of faults are modelled by varying ω n and ζ in (1) either abruptly or more smoothly over 30 s.

As can be seen, (5) is a second order linear relation between β and β d. In the case of alteration in parameters ω n and ζ, the stationary state does not change, but the transient dynamics changes. In order to estimate the dynamics, the transient behavior resulting from different operating conditions should be included in the data for SVM training, which might increase importantly the data volume. Based on a few simulations, as for scaled sensors faults, the vector proposed for the detection of the pitch position actuator fault is thought to take into account the difference between the real and desired pitch positions. Note that due to measurement noise and control dynamics, this difference will be subject to oscillations even under normal situation. The difference between both sensors is also considered in order to distinguish this fault from sensor faults. Finally, the generator speed measurement is considered since a fault in the pitch position will affect the rotation of the rotor and so the generator at a secondary level.

$$x = \left[ {\matrix{{\omega _g^{m1} - \omega _g^{m2}} \cr {\beta _k^{m1} - \beta _k^{m2}} \cr {{{\hat \beta }^d} - \beta _k^{m1}} \cr {{{\hat \beta }^d} - \beta _k^{m2}} \cr } } \right]$$
(15)

where is the response of the second order system given by (5) to the input β d. A Gaussian variance σ = 10 is used.

Since it is difficult to obtain a comprehensive training data including transitory states, and since the mechanical model is known, it is obvious to develop a model-based method for fault detection of the pitch position actuators. Therefore, before showing the simulation results of SVM, a Kalman-like observer will be developed for fault detection and isolation of this fault.

4.2 Kalman-like observer for pitch position actuator

As mentioned previously, the pitch position actuator fault (fault 5) is simulated by deviating the natural frequency and the dumping factor ω n and ζ from their nominal values. Therefore, for fault detection, an observer is developed to estimate these parameters and evaluate the drift from nominal values. The difference between the estimated values of any of these parameters with respect to its nominal value can be used as a residual with some threshold, that is to be defined as a function of the noise.

Considering (5) and by applying the following change of coordinates x 1 = β k , x 2 = β k , u = β d, k = 1, 2, 3, one gets the following state equations:

To estimate parameters ω n and ζ, we construct the following augmented system, with \({x_3} = 2\zeta {w_n}\):and \({x_4} = w_n^2\)

(16)

Necessary conditions for the observability of x 3 and x 4 in system 16 are:

  1. 1)

    \({{\dot x}_1} \ne 0\) and \({{\dot x}_2} \ne 0\) (non null dynamics β k ≠ 0 and \({{\dot \beta }_k} \ne 0\), which requires that β r ≠ 0;

  2. 2)

    Regularly persistent inputs are required (β d);

  3. 3)

    And \(({{\dot x}_1} + \dot u){x_2} \ne ({x_1} + u){{\dot x}_2}\)

Under these conditions, a Kalman-like observer[43] can be developed for system (16) as follows:

$$\left\{ {\matrix{{\dot \hat x = A(u)\hat x + G(u) - {R^{ - 1}}{C^{\rm{T}}}(Cx - y)} \cr {\dot R = - \theta R - {A^{\rm{T}}}(u)R - RA(u) + {C^{\rm{T}}}C} \cr } } \right.$$
(17)

Matrix R is initialized at identity. The observer is tuned using \(\theta = 3.\,s = {{\dot x}_1}\) is the filtered output derivative signal which should be bounded and non null. A first order filter with τ = T s s is employed on β k . Note however that no filtering was considered neither for the output nor its set-point \(\beta _k^{mi}\) and β d. Indeed, filtering these entities differently would create a static error (between the output and the set-point), which would affect the observability of the unknown parameters.

Also note that since the system is not observable if β d = 0, the observer gain is set to zero in Zone II and is activated only in Zone III. Moreover, when β d = 0, the estimated values are reinitialized at their nominal values.

5 Results and discussion

5.1 SVM results

SVM models were applied to fault detection and isolation of all the discussed sensors and actuators. All faults of type stuck measurement could be detected within 2 sampling periods while introduced at different instances, and therefore under different dynamics (e.g., control phases). The detection time (and robustness) for scaled measurement faults depended on the scaling factor and on the system behaviour at the fault time. Based on a few simulations, an average value n s was calculated, as reported in Table 1.

In the case of occurrence of one fault at a time, fault detection and isolation is insured for all the considered faults. For simultaneous faults, fault isolation is insured if the estimation is based on a different vector x. For instance, one of the scenarios will show the efficient isolation of simultaneous scaled faults in ω r and ω g . Another scenario will show the efficiency of isolation of simultaneous faults of different types stuck/scaled in two different sensors measuring the pitch position.

5.1.1 Pitch position sensor faults

In a first scenario, two simultaneous faults were considered in the pitch positions \(\beta _1^{m1}\) and \(\beta _2^{m2}\) of types stuck and scaled measurements, respectively. Moreover, two scaling factors were compared: 1.2 and 1.8. The faults could be detected and isolated in both cases. Fig. 3 shows the detection results of the fault of type stuck measurement in \(\beta _1^{m1}\). Figs. 4 and 5 show the effect of the scaling factor on the FDI results of \(\beta _2^{m2}\). The scaled measurement fault occurs in \(\beta _2^{m2}\) at 2800 s, where \(\beta _2^{m2}\) holds a high value (around 10 deg), and finishes at 2900 s, where it decreases almost to zero. It can be seen that the detection results depend on the measured value of β, since the offset is calculated as a scaling factor of this measurement. Therefore, the smaller the value of β, the smaller the offset, and at β = 0, it can be said that the fault is eliminated. For this reason, it can also be seen that the fault is detected only by intermittence if the scaling factor equals 1.2 (Fig. 5), while it is well detected with a scaling factor of 1.8 (Fig. 4).

Fig. 3
figure 3

Fault detection and isolation of stuck sensor \(\beta _1^{m1}\) (fault 1a). Fault duration: 2800–2900 s. The scale of the measured value is shown on the left and the residual’s scale is given on the right (as indicated by the arrows)

A threshold is to be attributed to each fault, to be compared to the residual and to announce fault occurrence. The threshold should be ⩾ 0 and its amplitude depends on the absolute value of used x. For instance, in Figs. 35, the threshold can reasonably be set to 0.5, which means that the system is considered non faulty if the residual is less than 0.5. (The color figures in this paper can be found in the electric version.)

Fig. 4
figure 4

Fault detection and isolation of pitch position \(\beta _2^{m2}\) between 2800–2900 s (fault 1b), scaling factor = 1.8

Fig. 5
figure 5

Fault detection and isolation of pitch position \(\beta _2^{m2}\) between 2800–2900 s (fault 1b), scaling factor = 1.2

It can be concluded that fault isolation is robust to simultaneous faults in this case. Fault detection is robust for faults of type stuck measurements, but depends on the dynamics and the offset in scaled measurements.

5.1.2 Generator and rotor speed sensor faults

As for the pitch position case, stuck measurement could be detected in 2 sampling periods while introduced in ω r or ω g sensor at different instances under different dynamics. Fig. 6 shows the detection results for the rotor speed sensor \(\omega_r^{m1}\), where the measurement was stuck at 1.2 rad/s.

Fig. 6
figure 6

Fault detection and isolation of the rotor speed (fault 2a), fault duration: 2400–22500 s

Concerning scaled measurement faults, fault detection results were more robust than those of the pitch position, which is due to the lower noise level with respect to the absolute values of ω g and ω r . An example of simultaneous faults occurring in both \(\omega_r^{m2}\) and \(\omega_g^{m1}\) with a scaling factor of 1.2 for both sensors is shown in Figs. 7 and 8. This leads to Δω r ≈ 0.35 rad/s and Δω g ≈ 30 rad/s. Based on a number of simulations, it can be concluded that fault isolation of these sensors is efficient for both fault types (stuck/scaled). Again, the threshold can be set to 0.5 for both types of faults related to these sensors.

Fig. 7
figure 7

Simultaneous faults in \(\omega _r^{m2}\) and \(\omega _g^{m1}\), both of them scaled by a factor of 1.2 (faults 2b and 3b), fault duration: 3600–3700s

Fig. 8
figure 8

Simultaneous faults in \(\omega _r^{m2}\) and \(\omega _g^{m1}\), both of them scaled by a factor of 1.2 (faults 2b and 3b), fault duration: 3600–3700 s

In Fig. 9, the fault was introduced in \(\omega _r^{m2}\) at 100 s, where the absolute value of ω r was around 0.6 rad/s (while ω r ≈ 1.7 at 3560 s, Fig. 7). Therefore, with a comparable scaling factor of 0.8, the fault leads to a lower Δω r ω r ≈ 0.1 rad/s). The fault could however be detected but the detection results were intermittent.

Fig. 9
figure 9

Fault in the generator speed \(\omega _r^{m2}\), when scaled by a factor of 0.8 (faults 2b), fault duration: 100–200 s

5.1.3 Torque actuator faults

As in scaled faults, the estimation of the rotator torque actuator faults depends on the offset. Fig. 10 shows that the detection is very fast and free of oscillations or false alarms with an offset of 200 N-m (i.e., 1.3% of τ g at the time of fault occurrence). This fault could be detected in both controller zones (I and II). When considering faults with 100 N·m off-set, the residual had some oscillations (not shown in the figure), which reveals the detectable limit (about 0.07%).

Fig. 10
figure 10

Fault in the convertor torque actuator (fault 4a), with an offset = 200 N·m, fault duration: 2000–2100 s

5.2 Pitch position actuators (by SVM and observer)

In this section, FDI results of the pitch position actuators are discussed by two approaches: SVM and Kalman-like observer. The necessity of using an observer to estimate this fault is due to the fact that only the transitory measurements are affected by drifts in Z and ω n , which seems to be fast. Training of transitions by SVM would require a number of simulations under different conditions. As specified in the observer development section, this fault cannot be detected by the observer in Zone II (while β d = 0). Therefore, the fault scenarios are realised mainly in Zone III, but not exclusively. The observer gain is set at zero in interval II, by making a test on the value of β d and the estimated states are reinitialised at their nominal values when β d = 0. No test about the zones is done for SVM. Fig. 11 shows the desired pitch position β d which presents high oscillations. Note that these oscillations that are due to measurements noise and control dynamics are expected to create oscillations in the estimates of the observer. The estimates by the observer are therefore filtered using a first order filter with τ = 20Ts s.

Fig. 11
figure 11

Desired pitch position β d

Figs. 1215 show a first scenario where an abrupt drift occurs in ω n and ζ from there nominal values [11.11 rad/s, 0.6] to [6, 0.3]. The fault occurs at 2460 s for 100 s. This fault is assumed to be due to a sever failure in the hydraulic system. Fig. 12 shows that the estimation of ω n by the Kalman-like observer is clearly affected by the fault but it is noisy, similarly for ζ (Fig. 13). Note that the observer gain is set to zero when β d = 0, as for instance between 2000–2200 s.

Fig. 12
figure 12

Kalman-like observer results. Estimation of ω n when a fault occurs in the pitch actuator β 2 at 2460 s (fault 5), where [β n ,ζ] drift suddenly to [6, 0.3] over 100 s

Fig. 13
figure 13

Kalman-like observer results. Estimation of ζ when a fault occurs in the pitch actuator β 2 at 2460 s (fault 5), where [ω n , ζ] drift suddenly to [6, 0.3] over 100 s

Fig. 14
figure 14

FDI results using the Kalman-like observer. Detection of actuator fault at 2460 s in β 2 (fault 5) where [ω n , ζ] drift suddenly to [6, 0.3] over 100 s

Fig. 15
figure 15

FDI results using SVM. Detection of actuator fault at 2460 s in β 2 (fault 5) where [ω n , ζ] drift suddenly to [6, 0.3] over 100 s

Based on the estimates of ω n and ζ, a residual can be calculated for the pitch actuator fault, using the difference between ω n or ζ and their nominal values. In Fig. 14, the residual is chosen to be equal to one if both of these conditions are verified: |Δζ| > 0.2 and |Δω n| > 2. Based on this assumption, the detection time by the observer is approximately 4 s. Fig. 15 shows the residual obtained by SVM, where the detection time is estimated to be 3.94 s. However, the SVM residual is oscillating and soon after a first detection it goes back to zero and continues oscillating, but without false alarms. Note that the presence of false alarms using the observer cannot completely be avoided, as the employment of important filtering will delay the detection.

It can be concluded in this example that the observer FDI results are more precise than those of SVM. Moreover, the estimates of ω n or ζ can be useful for fault reconfiguration.

In a second scenario, the values of ω n and ζ are supposed to drift smoothly (over 30 s) from their nominal values [11.11, 0.6] to [7, 0.4], which is assimilated to the presence of high air content in the oil of the hydraulic system (Figs. 16 and 18). This fault is assumed less severe than the first scenario. The estimates of ω n and ζ are shown in Figs. 16 and 17, respectively. Due to oscillations and the fact that the fault (air content in the oil) appears relatively quickly (over 30 s), it cannot be distinguished from a sudden fault (Figs. 1215). The final values of ω n and ζ seem however to be well estimated in both fault types, which can be useful for fault reconfiguration.

Fig. 16
figure 16

Kalman-like observer results. Estimation of ω n when a fault occurs in the pitch actuator β 3 at 3300 s (fault 5), where [ω n , ζ] drift suddenly to [7, 0.4] over 100 s

Fig. 17
figure 17

Kalman-like observer results. Estimation of ζ when a fault occurs in the pitch actuator β 3 at 3300 s (fault 5), where [ω n , ζ] drift suddenly to [7, 0.4] over 100 s

Fig. 18
figure 18

FDI results using the Kalman-like observer. Detection of actuator fault at 3300 s in β 3 (fault 5) where [ω n , ζ] drift suddenly to [7, 0.4] over 100 s

Fig. 18 shows the residual as obtained by the observer (based on the conditions |Δζ| > 0.2 and |Δω n | > 2) and Fig. 19 the residual obtained from SVM in this scenario. Both residuals are oscillating, with some false alarms using the observer. This slow drift fault is therefore slightly more difficult to estimate, which indicates that small offset fault (resulting at the beginning of the drift) cannot be detected.

Fig. 19
figure 19

FDI results using SVMs. Detection of actuator fault at 3300s in β 3 (fault 5) where [ω n , ζ] drift suddenly to [7, 0.4] over 100 s

6 Conclusions

The wind energy is profitable if the technology of the turbines is optimized and online supervised. In view of the large number of components in the system, large number of frequent but noisy measurements besides the system disturbances, a good method of supervision should be used for fault detection and isolation. In this work, sensor faults were treated by SVM which was found to be a good method for pattern recognition and to be adapted to online implementation. For detection of sensors stuck at some measurement, the derivative of the measurement is used in the training data. For detection of scaled measurements, the training data contain the difference between the measurement and the set-point. Offset in the actuator torque was also learned by SVM, based on the difference between the desired and real values.

For the pitch position actuator, two methods were used: SVM and Kalman-like observer. Both methods give comparable results, with the observer having a higher sensitivity to the fault, but more false alarms. The observer interest is however incontestable for fault configuration.