Introduction

Year-on-year for the last few decades, wind energy has become one of the most promising, inexhaustible, green, clean, non-polluting, and sustainable energy sources. From 2011 until 2020, its production capacity expanded from 220 GW to 733 GW [1]. Unfortunately, this kind of energy is affected by several and various failures due to its complexity, which leads to the loss of its efficiency and reliability. Generators [2, 3], gearboxes [4, 5], power converters [6, 7], and blades [8, 9] are typically the most common faults. In this regard, several methods and techniques are investigated in the literature in order to ensure the safety, integrity, and performance of the operation of such systems [6, 10]. The study in [11] proposed two hybrid numerical weather prediction models and an artificial neural network model for wind power forecasting over extremely complicated terrain, the first model created predicts the energy output of each wind turbine directly, while the second model forecasts first the wind speed before converting it to power using a fitted power curve. By using an artificial neural network (ANN)-based distribution static compensator, the authors of [12] emphasized a new control strategy to enhance the power quality (DSTATCOM) in WECS. Mansouri et al. [13] employed a detection and diagnosis strategy for diverse incipient faults of the WECS under various states. In [14], the authors disposed of an advanced fault detection and diagnosis (FDD) approach for wind energy conversion (WEC) systems based on reduced-gaussian process regression-based random forest (RGPR-RF). Regarding the scientific community has been closely monitoring ensemble learning (EL) approaches, which mix several and numerous machine learning models to create the most optimal and best possible predictive model. The success of the ensemble model can be attributed to a variety of factors, including statistical, computational, and representation learning [15], bias-variance decomposition [16], and strength-correlation [17]. There are numerous surveys in the literature that mostly concentrate on the review of ensemble learning, such as the learning of ensemble models in classification problems [18,19,20,21]; regression problems [22, 23]; and clustering problems [24]. Indeed, an effective neural network-based ensemble technique was employed in [25]. The authors of this paper used bagging, boosting, and random subspace combination approaches together with an ensemble classifier constructed using neural network techniques. The work in [26] employed the benefits of the support vector machine, K-nearest neighbor, and the decision tree in an improved ensemble learning (EL)-based intelligent fault diagnosis paradigm that aims to guarantee the high efficiency of grid-connected photovoltaic (GCPV) systems. The initial step in data mining is known as preprocessing [27], and it entails cleaning and arranging the dataset to suit the requirements of the input for the subsequent stages.

Accordingly, one potential pre-processing step is feature selection (FS), which is a method for keeping a subset of features from a dataset that can accurately represent the data without outliers or redundancies [27]. In fact, several and numerous applications, such as data classification [28,29,30], data clustering [31,32,33], image processing [34,35,36], and text categorization [37, 38], deployed and utilized the FS technique. To examine the FS issue, various and several distinct versions of SCOA have been emphasized [39,40,41,42,43,44,45,46]. Besides, various optimization algorithms, for instance, the genetic algorithm (GA) [47, 48], the backtracking search algorithm (BSA) [49], the coral reef optimization (CRO) [50], the particle swarm optimization (PSO) [51], and the fruit fly optimization algorithm (FOA) [52], are introduced to keep and depict the appropriate parameters for artificial intelligent (AI) methodologies.

This work proposes an improved and effective ensemble learning approach for fault detection and diagnosis in wind conversion systems. The contribution of this paper is threefold: firstly, pre-processing data is obtained. Secondly, a sine-cosine optimization algorithm is performed in order to avoid redundant features and select and extract only the more relevant observations from the entire set of features. Finally, the significant obtained features are fed to an ensemble learning algorithm to improve the classification performance and enhance the WECS model’s reliability and ability to distinguish between the diverse operating modes. In this work, therefore, we inserted frequent, potential, and diverse types of failures: wear-out faults, open-circuit faults, and short-circuit faults, at different sides and locations (grid and generator sides) in order to examine the reliability of the developed strategy compared to the state-of-the-art methods, including artificial neural network (ANN), K-nearest neighbor (KNN), cascade forward neural network (CFNN), feed forward neural network (FFNN), generalized regression neural network (GRNN), and support vector machine (SVM). The rest of this paper is arranged as follows:

The suggested ensemble learning-based sine-cosine optimization algorithm strategy is highlighted in “Methods” section, and the concepts of each employed technique are described. The proposed technique will be tested on wind energy conversion systems in “Results and discussion” section, and the maintained results are analyzed and summarized. “Conclusion” section of this paper offers a conclusion.

Methods

EL-SCOA approach

The evolved strategy involves three major steps, including data processing and treatment, feature optimization and selection, and fault detection and diagnosis (FDD). The main goal of the suggested technique, called the ensemble learning-based sine-cosine optimization algorithm (EL-SCOA), is to improve the fault diagnosis capabilities and efficiency of WECS. Unlike conventional diagnosis methods, which apply the raw data directly, the established proposal extracts and selects the best descriptive and intensive features from the original dataset and feeds them as inputs to the classifier for diagnosis purposes. The classifier uses bagging and boosting algorithms as ensemble techniques and ANN as a baseline classifier in order to identify, classify, and discriminate between the various states that may occur in the WECS.

The block diagram that illustrates the important steps of the evolved approach for FDD purposes is shown in Fig. 1.

Fig. 1
figure 1

The steps of the proposed approach for fault diagnosis

figure a

Algorithm 1. EL-SCO Algorithm

The EL-SCOA is divided into two major categories: the training set and the testing set. The detailed descriptions are explained in Algorithm 1

Concept theoretical framework of artificial neural network (ANN)

Artificial NNs are computational models inspired by the networks of the human biological brain. These networks have gotten big attention until now [13, 48, 53]. The ANN utilizes a network pattern to generate decisions. Indeed, the input, hidden, and output layers are the three levels that make up the ANN structure. Each layer is made up of sets of nodes. The output layer provides the network response after the information has been processed by the hidden layer and received by the input layer. The number of inputs corresponds to the number of neurons in the input layer, similar to this, the number of output layer neurons corresponds to the number of ANN outputs, for instance in our work the ANN classifier is trained using WECS measurement variables (\({x}_{1}...{x}_{m}, m = 12\)) as inputs, and (\(N=7\)) labels as their corresponding desired outputs as depicted in Fig. 2. Nevertheless, the number of neurons in the hidden layer is determined experimentally. It consists of various experiments by varying the number of neurons in the hidden layer (10 hidden layers are employed in this study). In contrast to a complex ANN structure, a simple ANN architecture provides accurate predictions. Moreover, a signal of weight \({w}_{ij}\) interconnects every two neurons of successive layers. Each neuron transfers the information to the neurons in the next layer after processing it through an activation function (\(f\)). The most frequently employed function is the sigmoid activation function since it is a nonlinear function that can be differentiated [54]. This function, a logistic function with a range of 0 to 1, has the following formula:

Fig. 2
figure 2

Structure ANN with \(m\) WECS inputs and their \(N\) labels’ corresponding outputs

$$f=\frac{1}{1+{e}^{-x}}$$
(1)

The weight, signal weight adjustment, prediction error, and the output of neural network equations are expressed in [55].

Ensemble learning theory

The ensemble learning methodology incorporates and combines various and several individual models in order to generate one optimal predictive model, thereby upgrading the performance and the classification results of the FDD techniques. Generally, boosting and bagging strategies are the most well-known and used in the literature.

Boosting strategy

In ensemble models, the boosting methodology, often known as a sequential ensemble [56], is used in ensemble models to improve the generalization of learning models that have weak generalizations [57]. Boosting is an ensemble technique where the predictors are created sequentially instead of independently. Indeed, boosting is based on the idea that subsequent predictors should learn from their previous errors and, accordingly, the obtained predictions become more accurate.

Bagging strategy

One of the most common techniques for generating ensemble-based algorithms is bagging [58], also named bootstrap aggregating. A bagging technique is deployed to enhance the performance of an ensemble classifier. Additionally, the intensive objective of this technique is to generate a series of independent observations with the same size and distribution as the raw dataset. Create a series of samples and generate an ensemble predictor that is more precise than the single predictor generated on the raw dataset. In fact, bagging concerns two tasks: the first is the creation of bagged observations and the transfer of each bag of observations to the base models, and the second is a technique for merging the predictions of the various predictors. Incorporating the output of the base predictors may differ because majority voting is utilized for classification issues and averaging is used for regression issues in order to create the ensemble output.

Concept of the sine-cosine optimization algorithm (SCOA)

The SCOA is a swarm-based optimization methodology that was first suggested by Mirjalili in 2016 [57, 58]. It is based on periodic behaviors that use the sine and cosine functions and is motivated by the transcendental function theory. Similarly to other optimization techniques, SCOA performs optimization through the use of mathematical rules. It is probable that a number of initial randomnesses possible solutions varied either away from or toward the final position (optimal solution). Some dynamic and randomized parameters highlight search exploration and exploitation via diverse optimization milestones [59]. The two phases position update equation used by the SCOA is demonstrated as follows:

$${p}_{ij}^{k+1}=\left\{\begin{array}{c}{p}_{ij}^{k}+{r}_{1}\times \mathrm{sin}\left(r2\right)\times \left|{r}_{3}\times {P}_{j}^{k}-{p}_{ij}^{k}\right|, {r}_{4}<0.5\\ {p}_{ij}^{k}+{r}_{1}\times \mathrm{cos}\left({r}_{2}\right)\times \left|{r}_{3}\times {P}_{j}^{k}-{p}_{ij}^{k}\right|,{r}_{4}\ge 0.5\end{array}\right.$$
(2)

Where \(p\) denotes the position of the ith individual in the jth dimension at the (k+1)th iteration. \(P\) depicts the global best position in \(j\mathrm{th}\) dimension at \(k\mathrm{th}\) iteration. The parameter \({r}_{1}\) decreases linearly with the iterative process, which is utilized to ensure the balance between exploration and exploitation. The parameter \({r}_{1}\) is depicted as

$${r}_{1}=\sigma -k\frac{\sigma }{\overline{k} }$$
(3)

Where \(\overline{k }\) indicates the maximum number of iterations, and \(\sigma\) is a constant number. \({r}_{2}\) is the random number uniformly distributed in \(\left[0, 2\pi \right]\), \({r}_{3}\) is the random number uniformly distributed in the range \(\left[0, 2\right]\), and \({r}_{4}\) is a random number in \(\left[0, 1\right]\), which is used to switch with an equal probability between sine and cosine trigonometric functions. When \({r}_{3}>1\), the exchange of information between \({P}_{j}^{k}\) and \({p}_{ij}^{k}\) increases; although when \({r}_{3}<1\) the influence between \({P}_{j}^{k}\) and \({p}_{ij}^{k}\) is reduced. Figure 3 depicts the SCOA search mode diagram.

Fig. 3
figure 3

Diagram of the SCOA search mode

System description

In this research, a variable-speed wind turbine based on a squirrel cage induction generator (SCIG) is considered, as displayed in Fig. 4.

Fig. 4
figure 4

Variable speed wind turbine based on SCIG and converter topology

The squirrel cage induction machine (SCIG), which will be monitored and controlled by the stator-side AC/DC converter, and the grid-side DC/AC converter sub-system are the two major categories of the employed system’s model. This structure permits an infinitely variable speed operation. Additionally, regardless of the machine’s rotation speed, the required voltage is converted into direct current and voltage. Furthermore, for this structure, the generator grid side is based on an Insulated Gate Bipolar Transistor (IGBT), where its configuration is the same as that of the converter grid side. Table 1 illustrates the diverse properties of wind turbines.

Table 1 Properties of wind turbine

Grid converter and generator converter are the two levels of the power conversion topology used in the wind chain. Each converter has a total of three arms. Each arm is made up of high and low IGBTs, as shown in Fig. 5.

Fig. 5
figure 5

Variable speed wind turbine based on SCIG and converter topology

Results and discussion

Data collection

In this study, we utilize data obtained from a healthy WTCS, which are then injected with several faulty scenarios: open-circuit (OC), short-circuit (SC), and wear-out. In other words, we initially considered how the system behaves in a healthy condition, and then we independently injected each faulty scenario, considering how each failure impacts and affects the system’s behavior in that situation. In fact, we do not take into account the transitional regime that appears when we switch from a healthy state to a faulty condition. These faults are thoroughly described in Table 2. The internal resistance of two ohms is used to indicate the final fault (WO fault). Accordingly, each mode behavior is adequately described over 2000 10-time-lagged samples within a second-time duration with 20 KHz as the sampling frequency.

Table 2 Description of the diverse labeled failures injected in the WEC system

Seven operating modes of WECS are used in this study, including one healthy case (designated as C1) and six different faulty states (C2C7). Twelve measurement variables, which are used to represent these seven scenarios, are shown in further detail in Table 3.

Table 3 Labeling, description and ranges of the measured and monitored system variables

The generator variables \({i}_{\mathrm{sd}}, {i}_{\mathrm{sq}}\), and the grid variables \({i}_{\mathrm{sd}}, {i}_{\mathrm{sq}}, {i}_{\mathrm{sar}}, {i}_{\mathrm{sbr}}\) can be calculated and obtained using the Park transformation, with \(\theta (\mathrm{rad})\),

$$\left[\begin{array}{c}{i}_{sd}\\ {i}_{sq}\end{array}\right]=\frac{2}{3}\left[\begin{array}{c}cos\theta \mathrm{cos}\left(\theta -\frac{2\pi }{3}\right) cos(\theta +\frac{2\pi }{3})\\ sin\theta \mathrm{sin}\left(\theta -\frac{2\pi }{3}\right) sin(\theta +\frac{2\pi }{3})\end{array}\right]\left[\begin{array}{c}{i}_{sa}\\ {i}_{sb}\\ {i}_{sc}\end{array}\right]$$
(4)

Table 4 depicts the distinct operating scenarios. In both the training and testing phases, we used 50% of the observations.

Table 4 Construction of database for fault detection and diagnosis system

Certain electrical and mechanical variables under various faulty situations are displayed in the following figures.

Evaluation metrics

Different metrics, often known as performance metrics or evaluation metrics are used to fully assess the effectiveness or quality of the model. These performance metrics enable us to evaluate how well our model performed the supplied data. In this manner, we can improve the model’s performance by tuning the hyperparameters. The approved criteria are accuracy (%), which denotes the rate of samples that are correctly predicted over the total number observations. Recall (%) which denotes, in the pertinent class, the rate of positive samples correctly predicted to the observations. Precision (%) denotes the number of positive samples correctly predicted divided by the number of total predicted positive observations. Computation time (CT(s)) represents the time required to carry out the algorithm Figs. 6, 7, 8, 9 and 10.

Fig. 6
figure 6

Mechanical torque for different operating modes

Fig. 7
figure 7

Generator speed for different operating modes

Fig. 8
figure 8

Generator current for different operating modes

Fig. 9
figure 9

Bus voltage for different operating modes

Fig. 10
figure 10

Grid current for different operating modes

$$\mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}$$
(5)
$$\mathrm{Recall}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$$
(6)
$$\mathrm{Precision}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$$
(7)

Where \(\mathrm{TP}\) (true positive) is the number of observations that are correctly identified, \(\mathrm{TN}\) (true negative) represents the number of observations that are correctly dismissed, \(\mathrm{FP}\) (false positive) is the number of observations that are incorrectly dismissed and \(\mathrm{FN}\) (false negative) is the number of observations that are incorrectly identified.

Discussion

In order to demonstrate, and show the effectiveness of the presented approach in terms of diagnostic recall, precision, accuracy, and computation time, a number of methods, including CFNN, FFNN, ANN, GRNN, KNN, and SVM, have been employed. The diverse existing methods are modeled and tested in a MATLAB toolbox. To evaluate the overall effectiveness of the provided strategies, the accuracy was calculated using a 10-fold cross-validation metric. For the FFNN, CFNN, GRNN, and ANN, 10 hidden layers with a total of 50 hidden neurons in each layer were chosen. In order to introduce non-linearity, a sigmoid function is used in the hidden layers. The \(K\) and \(C\) parameters for SVM are set with the lowest RMSE value, and the \(K\) value for KNN is equal to 3. The maximum number of iterations for the SCOA is 100, and the number of solutions that are chosen is 10.

The comparison analysis in Table 5 showed that the proposed strategy EL, which combines the bagging and boosting strategies with the ANN classifier, performed much better than the ANN and the other methodologies in terms of accuracy (98.88%) and outperformed the other models. In spite of the fact that the suggested EL approach performs better and produces good results in terms of classification accuracy compared to conventional techniques, it still suffers from a difficult training phase and a high time complexity. To deal with this drawback, we actually employed a sine-cosine optimization algorithm (SCOA) in order to pick and select the best descriptive features and reduce the computation time, which represents a significant challenge in the fault diagnosis domain, as well as accelerate the learning and the classification tasks. As a result, the computation time is significantly decreased from 23.74 s to 12.00 s, with only a minor difference in its accuracy (by 0.53%). The inefficient KNN and SVM classification outcomes are attributable to the direct usage of raw data, demonstrating the success of the suggested approach that selects the more significant features before performing the classification task. Six features (out of 12) of the developed EL-SCOA strategy were best selected, as shown in the following table (Table 6).

Table 5 Performance evaluations of various classification methods
Table 6 The selected features and the performance evaluations of the evolved classification strategies

Table 7 illustrates the obtained testing classification outcomes of the diverse classes by the use of a confusion matrix (CM) to further demonstrate the effectiveness of the evolved methodology. Indeed, the samples that were successfully labeled to the healthy condition (C1) and the various faulty operating states (C2 to C7) as well as the samples that were incorrectly labeled, are both displayed in this matrix. Specifically, the X and Y axes highlight the true classes and the projected conditions, respectively.

Table 7 Confusion matrix for the EL-based SCOA in the testing phase

Table 7 demonstrates that the EL-based SCOA strategy correctly identifies the 2000 observations from the 2000 true positives for the conditions operating modes (C2, C3, and C7), indicating that these modes are correctly classified and there was no misclassification. However, there is a misclassification for the healthy state (C1), faulty modes 3 (C4), 4 (C5), and 5 (C6), as evidenced by the classification of 142 observations from the healthy class as the class (C5), 9 observations from the class (C4) as the class (C5), and 4 samples from the class (C6) as the class (C1).

Conclusions

This paper developed an enhanced fault detection and diagnosis approach called an ensemble learning-based sine-cosine optimization algorithm (EL-SCOA) for wind energy conversion (WEC) systems. The presented methodology was addressed so that the sine-cosine algorithm is proposed in order to optimize, select, and extract the most informative features from the raw data, where the maintained selected features are fed to the classification technique for diagnosis purposes. The classification method incorporates bagging and boosting as ensemble methods and an ANN as a baseline classifier. The proposed paradigm attempted to discriminate between various operating states (short circuit, open circuit, and wear-out faults) introduced at various locations and sides (generator and grid sides). As compared to other existing methods including ANN, KNN, CFNN, FFNN, GRNN, and SVM, the experimental outcomes show that the suggested strategy performs very well. As a result, the effectiveness of the suggested technique inspires us to further examine its computation time and memory storage in future research. In order to simultaneously improve diagnosis accuracy and decrease WEC system execution time, a strategy that combines data size reductions and the aforementioned technique will be proposed.