1 Introduction

Wind energy is one of the most important sources of renewable energy. Over the past years, the wind energy contribution raised to 24.6% in Germany [1] and increasing further. Wind energy must be competitive to keep rising while maintaining a reasonable price range. Some predictions assume that the Levelized Cost of Energy (LCOE) will be between 3.8 and 7 €ct/kWh by 2035 [2]. The LCOE is an accumulated sum of the investment costs and the annual expenses normalized by the annual energy product (AEP). Therefore, an operator should consider every factor. For wind turbines in operation, the modifiable cost factor is operation and maintenance (O&M). Proper monitoring of the turbine allows reducing the duration of (un-)scheduled maintenance and their resulting downtime.

Condition monitoring of technical assets aims to detect changes and trends representing deviations from normal operational behavior, thus indicating a developing Condition Monitoring Systems (CMS) inspects rotating systems such as the drivetrain and main bearing. Commonly, an operator can financially recover from one main bearing fault during the lifetime of a wind turbine. Therefore, the operator should thoroughly monitor the main bearing to reduce and prevent the consequences of a single fault.

Conventional CMS requires a certain number of sensors leading to high investment costs. Since 2006, each wind turbine is mandatorily equipped with a Supervisory Control and Data Acquisition-System (SCADA) [3]. SCADA monitors the wind turbine’s various information such as produced power, wind speed, oil pressure, temperature … Commonly, SCADA tracks that information with a ~1 Hz sampling rate and subsequently condensed to 10 min sampling. The time signal of 600 samples is reduced to mean, max, min, and standard deviation. This reduction hides dynamics like oscillation.

Nonetheless, the 10-min dataset allowed the authors to determine the individual history of each wind turbine. A 10-min averaged SCADA-based Condition Monitoring System’s primary benefit is to have an additional CMS at low cost while using installed equipment without any added hardware costs. Based on data observations, it is possible to determine an overview of each turbine’s experienced loads. Thus, indicating the remaining useful lifetime.

Multiple publications investigated the possibility of using Data Mining in wind energy and SCADA analysis as an alternative CMS. An overview of some of those publications is pointed out in the following. Kusiak and Zhang used several machine learning techniques to show that a wind turbine’s vibrations have a negative impact on performance [4]. In their study, the neural networks outperformed the other methods like a conventional CMS. Kusiak and Verma used a different approach to identify and predict status patterns from SCADA data. They derived association rules from historical data to identify common status patterns. Consequently, Kusiak and Verma trained random forests to predict these status patterns in SCADA data of wind turbines [5]. Zhang and Kusiak also developed methods for detecting anomalies in SCADA data to identify wind turbines’ critical vibrations [6]. They grouped the data using k‑means clustering to divide the data into abnormal or normal. For predicting abnormal drivetrain and tower vibrations, SVM, neural networks, and random forest were used [6]. Astolfi et al. analyzed SCADA data from four WTGs in complex terrain in Italy to identify the terrain’s role influencing the wake of WTGs, which was confirmed by experimental evidence [7]. Godwin determined possible pitch faults by a RIPPER algorithm, a modification of decision trees. He identified a variety of rules to indicate failures in advance. [8]. Zhang used SCADA data from a wind farm in Norway with 17 WTGs and an artificial neural network (ANN) to identify deviations from the main bearing’s normal behavior and generate early warnings [9]. But he experienced false prediction due to long downtimes. Butler developed a method to determine the main bearing’s remaining useful life using radial basis function and degradation index [10].

Many methods have been investigated and explored. The most common ones showed to be decision trees and ANN. Methods with promising results if all requirements are given, e.g., computational power, database, expert knowledge. However, previous publications did not address the usability for operators so far. Small operators do not have access to some of those methods. Therefore, the paper examines the most used methods as a possible SCADA-based main bearing CMS and ranks them on various key performance indicators.

2 Methodology

The paper aims to detect the main bearing fault before the damage becomes critical. The required workflow consisted of three significant steps:

2.1 Collection of SCADA time series

This paper’s underlying data consists of ~400 wind turbines with more than 17 × 106 h of operating hours. Some of those wind turbines experienced the main bearing fault. The database’s average wind turbine has a rated power of 3.5 MW with a 120 m rotor diameter and a hub height of 90 m. To not violate any non-disclosure agreements, all presented data are normalized.

2.2 Preparation of time series

The data had to be processed as it contained faulty inputs and non-plausible values. The authors had introduced two approaches before any analysis took place. The first approach was to determine all missing timesteps in the SCADA time series. Every missing time step was assumed to be downtime of a wind turbine and marked as such. The second approach was to filter data by their plausibility. If the signal were outside a plausible range, the signal was set to NaN at that point. Those data points were not introduced in any data mining process. Table 1 displays the range of reasonable boundaries.

Table 1 Boundaries of a plausibility check

2.3 Data mining process

The underlying data mining is divided into three approaches, varying in their complexity. Depending on the prior knowledge of a wind turbine operator, they can select the most suitable method.

The first method was a statistical comparison within the wind farm. The authors assumed that all wind turbines in a wind farm experience similar loading conditions during a given time. At each given point in time, the main bearing temperature of all wind turbines has been compared. They showed a normal distribution, which means that median and mean are equal. At every time step, the mean temperature and the standard deviation were determined. Every main bearing temperature that differed more than ±3 σ from the reference means the main bearing temperature was labeled. The ±3 σ boundary showed to be the best fitting threshold in ~70% of all wind farms. For each wind turbine, the out-of-bounds labels were accumulated over a given time frame (i.e., four weeks). The accumulated values were normalized by the number of data points inside the time frame to compare the wind turbines. Wind turbines with a higher occurrence of out-of-bounds behavior showed to be more likely to fail. The analytic of this method was further improved by excluding downtimes longer than one hour. During those downtimes, the main bearing temperature cools down. This cool-down distorts the mean and standard deviation.

The second method was to use a classification learner. The underlying learner was a random forest (RF). The random forest is a classifier that operates with multiple decision trees. A single decision tree is a simple classifier that differs between two classes (e.g., smaller or bigger than 4 m/s). During the training of such decision trees, the questions and their threshold are determined. The questions are defined such that, when answered, they clearly distinguish them into separate classes.

Those decision trees can be easily trained and used. In the case of random forest, multiple decision trees are defined with different questions. The mean of all decision trees is used as the output and as the predicted value. The input into the random forest is listed below. Multiple inputs have been tried and those performed.

  • Nacelle temperature

  • Ambient temperature

  • Transmission Bearing Temperature

  • Generator Bearing Temperature

  • Oil temperature

  • Reactive Power

  • Power

  • Wind speed

The training period was defined to be the first two years of operation. This period showed to be an acceptable range as most events already occurred at least once during this period. Pagitsch determined a similar period [11]. In this contribution, the training period worked for 86% of tested wind turbines. The trained random forest achieved on average an accuracy of 90.1%. The features shown here should be treated according to their plausibility and be modified in agreement with their system.

The last approach was a modified ANN. Properly trained, the ANN can predict the main bearing temperature. However, the hyperparameter and structure of the ANN are directly linked to the performance. Several setups (number of hidden planes, number of nodes, activation functions) have been tested and compared. The best performing ANN had:

  • 2 hidden layers

  • 15 neural nodes

  • Sigmoid activation function

  • Past 6‑time steps of the SCADA time series

The autoregressive neural network has a similar setup as a normal ANN with delayed timestamps extracted from SCADA data. Additionally, the prediction of one previous main bearing temperature is fed back the input into the loop. The autoregressive time-delayed input was assumed to influence the predictions heavily [12].

The autoregressive ANN does not consider the dimension of time. The available data is interpreted as a sequence-independent of its capture time. Outputs such as the temperature, which exhibit lags and a strong dependency on their previous states, it is highly beneficial to consider their past states correctly. Therefore, the feedback input has an influence and importance on the NN prediction. The setup was improved by introducing missing data to model a discontinuity in the data. These discontinuities ranged from a couple of hours to weeks. Usually, those missing data lead to large errors once the availability is reestablished. The introduction of the missing data parameter was able to reduce drastically. The trained system is based on the first two years of operation and achieved an accuracy of 95.1%.

All those methods have been used to analyze a failing main bearing in the database. The paper displays some of those results, determined in a wind farm with five wind turbines. The available dataset spans from 2013 till late 2019. In mid-2019, wind turbine number four (WT04) had a main bearing fault.

3 Results and discussion

All three methods have been used to identify the faulty wind turbine number four. Each method was capable of detecting the default before its repair. Therefore, the methods must be compared and evaluated. The methods are assessed using a weighted average of multiple key performance indicators (KPI), given in Table 2. An internal survey of the project committee determined the weighting. At every KPI, each method is ranked, allowing one to choose the best suitable method.

Table 2 Key performance indicator and their according weight

3.1 Statistical comparison

Fig. 1 displays an output of the statistical method. The x‑axis shows the timeline and the y‑axis various wind turbine of one wind farm. Each rectangle represents how often a wind turbine was outside the ±3 σ boundary during the one-month observation period. A value of 25% means that the wind turbine had a too high or too low main bearing temperature concerning the other wind turbines for one week. A zero-entry (black rectangle) represents no outside behavior and no entries during this period. The authors assumed that during downtime, the main bearing temperature does not behave untypically.

Fig. 1
figure 1

The output of the statistical method between 2013 and 2019

In an ideal scenario without any issues, the out-of-bounds behavior was distributed equally. Exemplarily, a wind farm with five wind turbines, each wind turbine has a relative occurrence of approximately 20% of the out-of-bounds behavior. Slight differences might occur due to power curtailments, short downtimes, or the violation of the assumption that all wind turbines experience the same wind speed.

At the end of Q2 in 2019, WEA 04 has increased behavior. This identification aligns with the maintenance reports. During that time, the lubrication system was repaired. The statistical approach showed to be promising as it identified a change in the system. However, it can also be noticed that in Q2 and Q3 of 2016, WT 01 and WT03 also had an increased occurrence. Thus, showing a likelihood of false prediction. A direct fault indication can only be concluded if the relative occurrence remains high over a certain period. In this publication, the period was six months, which varies with site and wind farm.

The statistical method had the benefit of user-intuitive setup and fast computation. It did not require any prior knowledge with respect to data mining. Another advantage of this method was that the effects of ambient temperature could be neglected as the wind turbines are located inside the same wind farm. Logically, operators can not use that method for multiple wind farms at different locations.

A drawback of this method is the size of the wind farm. The smaller the wind farm, the more likely the distribution can be distorted. This distortion can increase the false positive rate. Another drawback of the method is if multiple wind turbines fail simultaneously, leading to a distribution shift. Consequently, the method does not behave stable and is not recommended as an alternative to a conventional CMS. The procedure should be used as an indicator for further inspections.

3.2 Random forest

The second method was based on RF to predict the upcoming main bearing temperature. The RF was trained over two years of operation. By varying the inputs listed earlier, the authors determined the importance of the input. Wind speed, produced power, and external temperature received the highest importance to predict the main bearing temperature. The accuracy dropped by a minimum of 10% when missing those inputs.

Fig. 2 displays an output of the RF with respect to WT 03 and WT 04. The time is given along the x‑axis, and the prediction error is displayed on the y‑axis. The prediction error is the difference between the predicted temperature and the actual measured temperature. A positive value represents an overprediction, and a negative value an underprediction. In 2015, both wind turbines had a high peak, which is due to cold ambient temperatures. The training set did not contain those ambient temperatures and the RF over-predicted the actual bearing temperature. In mid-2018, the prediction error of WEA04 started to drop.

Fig. 2
figure 2

Prediction error of main bearing temperature according to RF

To clearly state a possible error, two thresholds were defined: a warning and an error threshold. In this publication, ±1.5 σ is set as the warning value and ±3 σ as the error threshold. Consequently, the system highlighted the peak in 2015, a drawback that led to false indications. An issue that can be solved by detailed inspection of the time series or examination would lead to additional costs. Therefore, a warning signal on all wind turbines simultaneously should be treated with suspicion. In 2018, the warning indication of WT 04 would occur with a higher frequency, thereby indicating the impending failure. Meanwhile, WT 03 predictions error oscillates around zero.

The RF is very dependent on its training database. The classifier can only predict a variety of temperatures that the database has known. The random forest cannot handle unseen events. The consequences are high offset in the prediction. Additionally, if the database is biased, the classifier tends to predict the one represented mostly.

To enable an RF, the training requires a database and knowledge on how to define hyperparameters. Additionally, the training phase requires some computational effort. However, after the initialization and training, an RF can be used with low computational costs and satisfying accuracy.

3.3 Autoregressive neural network

The last method investigated was an autoregressive NN. The constructed autoregressive NN uses the time SCADA input of at least six previous time steps, which means that the previous hour is considered as input. Thus, integration of cool down due to downtime. Additionally, the prediction itself is used as the autoregressive input with a time delay of 1‑time step. Fig. 3 shows the prediction error of the autoregressive NN. The time is given along the x‑axis and the prediction error along the y‑axis. Like the previous figure, the prediction error is defined as the difference between the prediction and the actual value. It can be noted that offsets are less than the results of the RF. Small spikes occur around 2015. However, they are smaller than 0.5 σ of the training results. In 2019, the autoregressive NN indicated a deviation between prediction and actual value. Next to that spike, the autoregressive NN returned NaN values as output that can not be displayed. But the frequency of NaN values was increased after the spikes. Therefore, indicating an unknown situation and possible fault. The offset occurs one month after the RF. However, there is less noise during the forecast, meaning that the warning and error threshold can be set at lower values.

Fig. 3
figure 3

Prediction error of main bearing temperature according to autoregressive NN

This model’s advantage is that the output is isolated from the influence of anomalies in the monitored component. Usually, the case during downtimes, while the main bearing cools down. The previous prediction was the most critical input feature in an autoregressive model due to the immense dependency of temperatures on their last state [13].

3.4 Comparison

All methods showed promising results. Even though the results appeared to be promising, it needs to be stated that shown results are based on the given database and the developed model. The authors did not fully achieve a direct transfer to other wind turbines and other sites. Therefore, the setup had been reevaluated and trained for all wind farms within the database. The methodology remained unchanged. This last section shows the overall best-performing results of the database.

Concerning the KPI computational effort, database, and required knowledge, the statistical approach is the preferred solution. It returns the fasted results and does not require a high amount of initialization effort.

However, concerning the accuracy and false prediction, the statistical method did not satisfy. To compare the methods, the receiver operating characteristic (ROC) was used. The ROC presents a performance measurement applied in the field of classification tasks for different thresholds. ROC curves are typically used to assess the effectiveness of outlier detection algorithms [14]. The ROC plots the true positives over the false positives rated for various threshold values between 0 and 1. An ideal performance curve is a rectangular curve into the upper left corner. The true positives take on the value of 1 (all alarms are identified correctly), and the false positives take on the value 0 (no data is falsely classified as an alarm). If the line goes directly from the origin to the upper right corner, the model can be interpreted as a random guess.

Fig. 4. displays the ROC of the tested methods. The statistical approach performs better than a random guess with room for improvement. Meanwhile, the RF achieves relatively early a true positive rate of 0.9 with the mentioned threshold. The best performing method showed to be the autoregressive NN. Its ROC is almost like the ideal scenario.

Fig. 4
figure 4

ROC curve of used methods based on database [12]

Based on the shown ROC curve and the experience throughout this paper’s development, the KPI rating can be found in Table 3. The autoregressive NN outperforms the RF and the statistical method. Even though the RF also showed satisfying results. It needs to be mentioned that RF also had some false indication, as shown in Fig. 2. Consequently, the autoregressive NN results in the best method as a SCADA-based CMS. However, the other methods showed to be of fair competition and should be considered depending on the SCADA-based CMS’s focus. All in all, the methodology had a similar and partially earlier fault-indication with respect to the conventional CMS.

Table 3 KPI of investigated methods

Even though the results appeared to be promising, it needs to be stated that results are based on the given database and the developed model. The authors did not fully achieve a direct transfer to other wind turbines and other sites. Therefore, the setup must be reevaluated and trained for any given configuration. However, the methodology remains unchanged.

4 Conclusion

Throughout this publication, it was shown that SCADA data could be used as an additional CMS. The publication focused on the perspective of the operator. The required effort largely depends on the focus of the operator. The used models ranked from a statistical method up to data mining methods. Even with a simple comparison, it was possible to detect faults in advance while keeping initial efforts low. The autoregressive method outperformed all. The prediction error was relatively low, and no false predictions were made. Nonetheless, this method requires expert knowledge and an extensive database. Integrating such an approach to wind turbines’ operation could decrease the severity of faults and maintenance costs.