Introduction

Renewable energy plays a predominant role in addressing global energy and environmental challenges. Among all the renewable energy sources, wind power is the most promising, as the technology is commercially matured, economically viable and environmentally clean. As a result, the global wind power capacity has reached up to 651 GW by the end of 2019 [1]. If this trend continues, it could be possible to meet 20% of worldwide electricity needs by wind energy by 2030 [2]. This implies the importance of wind in the future clean energy scenario.

Performance of the wind turbines at a prospective site is a vital information for the efficient design and successful management of wind energy projects. This is derived from the power response of the turbines at different wind velocities. The ideal velocity–power relationship of a wind turbine is given by its power curve. Every manufacturer provides the power curve of their turbine. The manufacturer’s power curve (MPC) is developed following the procedure given in IEC standards. Under the IEC standard, the wind velocity at the hub height is measured with a mast placed at a distance of 2–4 times the rotor diameter of the turbine, and the corresponding power production is recorded. This means that for an 80-m-diameter turbine, the velocity is measured at a minimum distance of 160 m away from the rotor. The measured velocities and power are first filtered and calibrated for the sea-level conditions. Then these observations are binned into 10-min averaged values for plotting the power against the corresponding wind velocities. These guidelines to develop power curves of wind turbines are described in detail in IEC 61400-12-2:2013/COR1:2016: Power performance of electricity-producing wind turbines based on nacelle anemometry [3].

To explain different operational zones in the power curve, typical MPC of a 2-MW pitch-controlled wind turbine is shown in Fig. 1. The turbine starts generating power at its cut-in velocity VI. At velocities higher than VI, power increases with the wind velocity and reaches its maximum at its rated velocity VR. Though theoretically this increase in power from VI to VR should be cubic in nature, for practical turbines, it can be of different exponentiations and even their combinations, depending on the design features of the turbine [4]. From VR to the cut-out velocity VO, output from the turbine is kept constant. At velocities higher than VO, the turbine is shut down due to safety reasons. Hence, the velocity–power performance of the turbine can be divided in to the dynamic regions between VI and VR and deterministic region between VR and VO.

Fig. 1
figure 1

Typical manufacturer’s power curve (MPC) of a 2-MW wind turbine and its comparison with the observed power at the site. The range of observed power corresponding to 10 m/s wind velocity is highlighted with the yellow line as an example

Though the MPC can give an indication on the velocity–power response of the turbine, under real-field conditions, turbine may behave differently. For example, MPC of the 2-MW turbine is compared with the power observed at the site in Fig. 1. The power plots are highly scattered and the real production from the turbine varies significantly from the MPC. These deviations are more prominent between the VI and VR, which is an important performance region of the turbine. Reasons for these variations are critically analyzed in [5,6,7,8,9,10,11]. The major reason highlighted is that the real-site environment under which the turbine operates could be significantly different from the standard conditions put forth in the IEC guidelines. For example, wind turbines are often installed in complex topologies and the wind field over such terrains can be much turbulent than the ideal conditions. Nearby obstructions and structures as well as neighboring turbines can also influence the turbulence intensity. Mechanical and control system settings may also cause performance variations. Density of air and the turbine height (there by wind shear) could vary from site to site. Another factor which alters the velocity–power relationship is the age and condition of the turbine and associated equipment. Performance models based on MPC cannot completely capture the dynamics in velocity–power relationships caused by these factors. Despite these limitations, MPCs give useful insights into the performance of the turbines under varying wind conditions and are widely used by the industry, for example, in planning of wind energy projects.

With the increased number of wind farms which are operational for several years and wider implementation of Supervisory Control and Data Acquisition (SCADA) system for monitoring and managing these projects, large amount of site performance data from the turbines are being made available. For applications like wind power forecasting and wind turbine prognostics and health management (PHM), instead of using MPC, more realistic velocity–power relationship can be developed by applying artificial intelligence (AI) on these site performance data. Being data driven, such models are more capable of capturing the turbine and site-specific dynamics of wind to power conversion.

There are several recent studies which apply AI methods for defining the wind turbine performances. Methods popularly used in these studies are artificial neural networks (ANN) [12,13,14,15,16], K-nearest neighbor (KNN) [15, 17, 18] and support vector machines (SVM) [15, 19,20,21]. All these studies successfully showed that the data-driven methods are superior than the conventional approaches based on MPCs and MPC-based parametric models in understanding the velocity–power variation pattern in wind turbines. Nevertheless, most of these studies employ only one or two of these methods and the reported accuracies vary from case to case. Further, the models are trained and tested with the respective training and testing data sets, which are from the same turbines, working under the same site environment. Two interesting research questions in this modelling context are (1) among the different AI methods, which one would give comparatively best performance, when applied to a given turbine working under a specific site environment? (2) How would AI based wind power models developed for a given turbine and site condition perform, when it is applied to a different turbine, working under an entirely different site environment?

This paper addresses these questions, first by comparing the performances of four AI-based models developed for a 2-MW onshore wind turbine. The best performing modelling method, among these, is then applied to a 3.6-MW offshore wind turbine to demonstrate its applicability under different turbine and site environments.

The paper is arranged as follows. Details of different machine learning methods, used in this analysis, are briefly explained after this section. Then these methods are applied to the site performance data of a 2-MW wind turbine. Accuracies of these models are evaluated and compared with that of MPC. A site-specific performance curve for the turbine, using the best performing model, is also presented as an example. Replicability of the method for a 3.6-MW offshore turbine is then demonstrated. Finally, use of the AI-based site-specific performance models for applications like wind power forecasting and prognostics, and health management of wind turbines are briefly discussed.

Data-driven models

Performance data from a 2-MW pitch-controlled wind turbine is considered for developing the data-driven models. The cut-in, rated and cut-out velocities of the turbine are 4.5 m/s, 12 m/s and 25 m/s, respectively. The turbine is installed at a site at a mean sea level height of 29 m. SCADA data on the velocity and power from the turbine, averaged over 10-min interval, were used for the modeling. The data were randomly divided into two groups in the ratio of 2:1, respectively, for model training and testing, keeping the statistical properties of the groups similar. The training data set, based on which the models are developed, was cleaned to eliminate the possible noises. This included the removal of out layered data; for example, data corresponding to the instances where the turbine’s production was recorded zero as it was under maintenance, though the corresponding wind velocity was in the productive region of the turbine.

The AI methods considered for comparison are artificial neural networks (ANN), support vector machine (SVM), K-nearest neighbor (KNN) and multivariate adaptive regression spline (MARS). The models were developed using the training data set and the power corresponding to the velocities in the test data set were predicted using these developed models. The predicted power values are then compared with the corresponding power records in the test data set for error analysis and the accuracies of different AI models are compared. Features of these models are briefly explained below.

Artificial neural networks (ANN)

ANN is a distributed and adaptive machine learning method in which several identical and interconnected artificial neurons (or nodes) are arranged in input, hidden and output layers. The neurons receive input signals, calculate its weighted sum and pass it to a nonlinear function to arrive at the desired output signal. ANN is being widely used in predictive analytics of wind energy systems, for example [22, 23]. Among the different algorithms applied in ANN, the resilient back propagation algorithm with weight backtracking (RPROP+) [24] was used for the present study, as it gave better accuracy with the current dataset. With this model, the velocity V and the corresponding power P can be correlated as

$$P_{p} = f\left\{ {w_{op} + \sum\limits_{k = 1}^{m} {f\left[ {v_{ok} + \sum\limits_{i = 1}^{n} {V_{i} v_{ik} } } \right]w_{kp} } } \right\},$$
(1)

where \(P_{p}\) is the predicted power output of the pth output unit, n is the number of input networks and m is the number of hidden layers. Here, \(w_{{{\text{0p}}}}\) and \(v_{{{\text{0k}}}}\) are the bias on the pth output unit and kth hidden unit, respectively. Similarly, \(w_{{{\text{kp}}}}\) is the weight from kth hidden neuron to pth output and \(v_{{{\text{ik}}}}\) is the weight from ith input unit to kth output node. The weights were updated using the relationship

$$w_{{\text{p}}}^{{\left( {t + 1} \right)}} = w_{{\text{p}}}^{\left( t \right)} - \eta_{{\text{p}}}^{\left( t \right)} \cdot {\text{sign}}\left( {\frac{{\partial E^{\left( t \right)} }}{{\partial w_{{\text{p}}}^{\left( t \right)} }}} \right),$$
(2)

where \(w_{p}^{\left( t \right)}\) is the pth weight at tth iteration step, \(\eta_{p}^{\left( t \right)}\) is the pth learning rate at tth iteration stage and \(\frac{{\partial E^{\left( t \right)} }}{{\partial w_{p}^{\left( t \right)} }}\) is the partial derivative of error function to weight at tth iteration phase.

The developed network had a single input of wind velocity and the corresponding modeled power was computed in the output layer. Based on iterations and error minimization, two hidden layers with single and double nodes were chosen for the model. The final architecture of the model is shown in Fig. 2.

Fig. 2
figure 2

The architecture of the ANN model

Support vector machine (SVM)

In SVM, the input vector V is mapped into a D-dimensional feature area by a nonlinear mapping operator ø and then regression toward the mean within the feature area is performed [25]. This transformation is done using the kernel function:

$$k(V,P) = \sum\nolimits_{j = 1}^{D} {\phi_{j} } \left( V \right),\phi_{j} \left( P \right).$$
(3)

The nonlinear regression within the low-dimensional input area is transferred to a high-dimensional statistical regression area by

$$f\left( x \right) = \sum\limits_{i = 1}^{D} {w_{i} } \phi_{i} \left( V \right) + b,$$
(4)

where \(\left\{ {\phi_{i} \left( V \right)_{i = 1}^{D} } \right\}\) is the feature, and \(\left\{ {w_{i} } \right\}_{i = 1}^{D}\) and b are coefficients which are established from the data by minimizing the function:

$$S[w] = C/N\sum\limits_{i = 1}^{N} {\left| {P_{i} - f\left( {V_{i,} w} \right)} \right|}_{\varepsilon } + 1/2\left\| w \right\|^{2} .$$
(5)

The cost function represented by

$$\left| {P_{i} - f\left( {V_{i,} w} \right)} \right|_{\varepsilon } = \left\{ \begin{gathered} \left| {P_{i} - f\left( {V_{i,} w} \right)} \right| - \varepsilon ,\quad {\text{for}}\,\left| {P_{i} - f\left( {V_{i,} w} \right)} \right| \ge \varepsilon \hfill \\ 0\quad {\text{otherwise}} \hfill \\ \end{gathered} \right.$$
(6)

is known as Vapnik’s \(\varepsilon {\kern 1pt}\)-insensitive loss function. The minimizing function can be represented by

$$f\left( {V,\alpha ,\alpha^{ * } } \right) = \sum\limits_{i = 1}^{N} {\left( {\alpha_{i} - \alpha_{i}^{ * } } \right)} k\left( {V_{i,} V} \right) + b,$$
(7)

with \(\alpha_{i} \alpha_{i}^{ * } = 0,\,\;\alpha_{i} ,\alpha_{i}^{ * } \ge 0\), i = 1,….., N, the kernel function \(f\left( {V_{i} ,V} \right)\) defines the inner product in the D-dimensional feature space. Some of the most commonly used kernels are linear, polynomial, radial, Gaussian and sigmoidal. To get the estimations of w and b, positive slack variables \(\xi_{i}\) and \(\xi_{i}^{ * }\)are introduced, which are given by

Minimize

$$S\left( w \right) = C\sum\limits_{i = 1}^{N} {\left( {\xi_{i} + \xi_{i}^{ * } } \right)} + \frac{1}{2}\left\| w \right\|^{2}$$
(8)

subjected to

$$\left\{ \begin{gathered} P_{i} - f\left( {V_{i,} w} \right) \le \varepsilon + \xi_{i} \hfill \\ f\left( {V_{i,} w} \right) - P_{i} \le \varepsilon + \xi_{i}^{ * } \hfill \\ \xi ,\quad \xi_{i}^{ * } \ge 0 \hfill \\ \end{gathered} \right..$$
(9)

These variables are introduced, once the information cannot be calculated by the function f under precise \(\varepsilon\). With Lagrange multipliers and Karush–Kuhn–Tucker conditions, Eq. 8 is often rewritten into the shape as follows:

Minimize

$$\begin{gathered} S(\alpha^{ * } ,\alpha ) = \varepsilon \sum\limits_{i = 1}^{N} {\left( {\alpha_{i}^{ * } + \alpha_{i} } \right)} - \sum\limits_{i}^{N} {\left( {\alpha_{i}^{ * } - \alpha_{i} } \right)} + \hfill \\ \frac{1}{2}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {\left( {\alpha_{i}^{ * } - \alpha_{i} } \right)} } \left( {\alpha_{j}^{ * } - \alpha_{j} } \right)k\left( {V_{i} ,V_{j} } \right) \hfill \\ \end{gathered}$$
(10)

subjected to

$$\sum\limits_{i = 1}^{N} {\left( {\alpha_{i}^{ * } - \alpha_{i} } \right)} = 0,\;0 \le \alpha_{i} ,\alpha_{i}^{ * } \le C.$$
(11)

The coefficients \(\alpha_{i} ,\alpha_{i}^{ * }\) will be nonzero values and the information associated with them is called support vectors. Parameters C and \(\varepsilon\) are to be decided depending on the application.

For the current analysis, after iterations with different kernel functions, it was found that the radial kernels with its optimized parameter gave the best fits with lowest errors.

The K-nearest neighbor (K-NN)

The KNN is a non-parametric method widely used for solving the classification problems. In this instance-based learning, value of an unknown instance (P in this case) is predicted using the corresponding known data parameters for the given instance (V), by mapping them into a multidimensional space and classifying them into clusters of minimum distance k. k is usually computed as the Euclidean distance.

For a given training dataset G with n instances of P and V, represented as \(\left\{ {P,V} \right\}_{i = 1}^{n}\), the distance between P and V is given by

$$d(P,V) = \left( {\left| {\sum\limits_{i = 1}^{n} {P_{i} - V_{i} } } \right|^{c} } \right)^{1/c} ,$$
(12)

where c is the order between P and V. For the most common Minkowski method, which is adopted for the current study, c can be taken as 2 [26]. The weightage wi for individual instances at this distance is then computed using the equation

$$w_{i} = \left[ {\left( {\frac{{2\left( {d + 4} \right)}}{d + 2}} \right)^{{\left( {\frac{d}{d + 4}} \right)}} k} \right],$$
(13)

where k is the kernel function. Out of the two options of optimal and rectangular, optimal kernel is chosen for this analysis to minimize the errors. Then the average of the k nearest neighbor instances selected from the training data is computed which gives the predicted value for the unknown instance, P.

Multivariate adaptive regression splines (MARS)

In MARS models, the input data space is divided into different regions for each of which regression equations are developed [27]. These regression relationships are constructed from a set of coefficients and basis functions, which are totally based on the input data at the respective regions. As the relationship between the independent and dependent variables are not pre-assumed in this method, they can even be non-monotone in nature. These characteristics of the MAR Splines models make it particularly attractive for several data mining applications.

The MARS model considered here for establishing the PV relationship of the turbines can be expressed in its general form:

$$P = \beta_{0} + \sum\limits_{j = 1}^{n} {\sum\limits_{b = 1}^{b} {\left[ \begin{gathered} \beta_{jb} \left( + \right)\max \left( {0,V_{j} - H_{bj} } \right) + \hfill \\ \beta_{jb} \left( - \right)\max \left( {0,H_{bj} - V_{j} } \right) \hfill \\ \end{gathered} \right]} } ,$$
(14)

where n and b are the numbers of predictor variables and basis function respectively and H values refers to the “hinges’’ or “knots’’. Hence, with a forward stepwise algorithm, certain spline basis functions are chosen and then with a backward stepwise algorithm, the best set is identified by deleting the unfit basis functions. Further improvement in the continuity of the model is achieved by smoothing techniques.

Comparison of the models

The actual power developed by the turbine, at different wind velocities, are compared with the corresponding power estimated using different data-driven models in Fig. 3. The solid line indicates the modelled power whereas the scattered points represent the site observations. All the models could capture the general trend in the power variations. For further quantifying and comparing the accuracies of the models, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) between the model output and observations are computed.

Fig. 3
figure 3

Comparison between the observed power and the power estimated using different models

MAE is calculated as

$${\text{MAE}}\,\, = \,\,\frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {P_{{{\text{Ve}}}} - P_{{{\text{Vo}}}} } \right|}$$
(15)

and RMSE by

$${\text{RMSE}}\,\, = \,\,\sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {P_{{{\text{Ve}}}} - P_{{{\text{Vo}}}} } \right)}^{2} \,} ,$$
(16)

where N is the total number of data, PVe is the power estimated using the models at a given velocity V and PVo is the corresponding observed power. Errors in power estimates using MPC are also calculated. The results of the error analysis are shown in Table 1. In comparison with the MPC, all the data-driven models could perform well in understanding the velocity–power relationship of the turbine under varying wind conditions. Among the models, SVM-based approach showed minimum error. Compared to the estimates based on MPC, MAE and RMSE could be reduced by 54% and 25%, respectively, using the SVM based models.

Table 1 Comparison of errors of the data-driven models and MPC

In general, ANN is expected to yield better solutions compared to other machine learning methods in predictive analytics. However, in this case, the SVM-based model could perform better. Every learning algorithm is functionally distinct and suits differently with different data sets. One of the reasons for the better performance of SVM could be its inherent strength of Structural Risk Minimization (SRM) in contrast with ANN, which uses Empirical Risk Minimization (ERM). With ERM, there are possibilities that the model becomes strongly tailored to the particularities of the training data set and as a result, it is poorly generalized to (new) test data. The SVM addresses this problem by balancing the model's complexity against its success at fitting the training data. Hence, SVM models are less prone to overfitting [28,29,30].

Another factor could be the limitation of multiple local minima in case of ANN and the possibilities of global minima in case of SVM, which could have affected the accuracy of the optimal solution [31,32,33]. Further, with the relatively smaller number of features, SVM models can give good results even with lesser turning. It has to be also noted that SVM models are less computationally demanding compared to ANN models, which is particularly advantageous for applications like short-term wind power forecasting, where the available computational time could be limiting as the outputs from the Numerical Weather Prediction (NWP) are to accessed and integrated with the wind turbine’s power performance models within the limited time window.

SVM-based site-specific performance curve

As the modelling approach based on SVM showed the highest accuracy among the methods compared, an-SVM based site-specific performance curve for the 2-MW turbine is developed, which is shown in Fig. 4. Observed power from the test data set is also plotted in the curve which is represented by the scattered points. The coefficient of determination (R2) between the modelled curve and the observations is 0.91. As the performance zone between VI and VR is important in the modelling, model performance in this region was critically analyzed further. The RMSE and MAE between the power predicted by the SVM model and observed at this zone are 164.2 kW and 243.7 kW, respectively. The R2 value for this region is 0.88. This indicates that such site-specific performance curves can be a better choice than the MPC for applications like wind power forecasting and turbine health monitoring.

Fig. 4
figure 4

SVM-based site-specific performance curve for the 2-MW turbine

Adaptability of the SVM model

To demonstrate the replicability of the proposed SVM model, it has been applied to another wind turbine which differed in its size, design features and working environment. The turbine, with its 120 m rotor diameter, has 3.6 MW rated capacity and works under offshore environment. Cut-in, rated and cut-out wind velocities of the turbine are 3–5 m/s, 12–13 m/s and 25 m/s, respectively. As in the previous case, SVM model with radial kernel function with optimal parameters was adopted for the current turbine. With these features, the model was developed and tested with the training and testing data sets corresponding to the current turbine.

Performance of the SVM model in case of the 3.6-MW turbine is shown in Fig. 5. Close agreements between the predicted and observed power can be observed in this case as well. The RMSE and MAE between the modelled and observed power for this turbine are 251.9 kW and 128.7 kW, respectively. These model errors are relatively low, considering the rated capacity of the turbine and observed production values. The R2 between the modelled and observed powers is 0.96. These higher accuracies show the capability of the developed SVM model in defining the site performance of turbines working under different working environments.

Fig. 5
figure 5

SVM model applied to the 3.6-kW offshore turbine

Conclusions

The velocity–power relationship of wind turbines, which are conventionally represented by the MPCs, is a vital information for the successful design and efficient management of wind power projects. However, performance of turbines at the actual sites significantly differs from MPCs, which are derived under the ideal IEC test conditions. In case of projects with sufficient SCADA data on the turbine’s performance, AI based models can give better insights into the velocity–power variations of the turbines. In this paper, we apply four such AI models to a 2-MW land-based wind turbine to compare their accuracies. Data-driven models based on ANN, KNN, SVM and MARS are developed for the turbine and its field performance under varying wind velocities is compared with the modeled performance. Compared to MPC, all the AI models could perform well in defining the performance of the turbine at the specific site. Among the models, SVM-based approach showed the highest accuracy.

A site-specific performance curve for 2-MW turbine, based on SVM model, is developed and presented. For establishing the adaptability of the proposed SVM model for wind turbines working under different conditions, it is applied to another 3.6-MW offshore turbine. The SVM model could predict the performance of this turbine also with high accuracy. This is because, in contrast with MPCs, the features of the site and corresponding flow conditions at different wind velocities are embedded in such data-driven models. This indicates that the proposed method can be a better choice than MPCs for defining the velocity–power variations of wind turbines at a given site.

Such site-specific performance models have several applications in the better management of wind power systems, where the MPCs are conventionally used. For example, in short-term wind power forecasting, physical models are extensively used by the industry [34]. In this method, MPC-based parametric models are often used to predict the power developed by individual turbines at different forecasted wind velocities. Being site specific and more accurate, SVM method can be a better choice than MPCs for such applications.

Another application of the proposed SVM model is for condition monitoring and prognostics of wind turbines. Once the expected velocity–power characteristics of the turbines at a specific site are more precisely defined through these models, it can be used as a benchmark for the normal performance of the turbine. This benchmark performance can be compared with the production status of the system in real time and any deviations can be interpreted as an indication of the health conditions of the turbines using suitable techniques [35]. Hence, the proposed models have several potential applications in the management of existing wind energy systems, where sufficient performance data are available for successful model development.