Introduction

In recent years, wind energy sources draw the increasing attention of researchers due to their advantages as compared with fossil fuel-based energy sources. However, one of the main drawbacks of wind energy is that its generation seriously depends on weather conditions. Therefore, accurate forecasting of wind power becomes of great importance to the exploitation of renewable energy resources. Several researchers have made great efforts in the development of wind power forecasting methods. The developed methods can be split into two categories: physical and statistical approaches. Physical approaches, such as the numerical weather prediction model, use physical rules that govern the atmosphere’s demeanor and rely on weather data to predict the local wind speed and direction [1]. Statistical approaches make use of a large amount of historical data and optimize model parameters to minimize the error between the forecasted and observed values [2]. Because of the high acquisition and computation costs associated with mathematical weather prediction, statistical-based models are the most appropriate option for short-term wind power forecasting [3, 4]. A wide range of statistical approaches has been proposed for short-term wind power forecasting. Time series techniques were the focus of early methods, including the Kalman filter [5], autoregressive moving average (ARMA) [6], and autoregressive-integrated moving average (ARIMA) [7]. However, there are numerous significant limitations to these methods, one of which is their insufficient accuracy [8, 9].

In recent years, conventional machine learning-based methods such as artificial neural network (ANN) [10], decision tree (DT) [11], random forest (RF) [12], and adaptive neuro-fuzzy inference system (ANFIS) [13] have been widely used in wind power forecasting. It is common for conventional machine learning methods to be computationally simple, but their ability to extract in-depth features and prediction accuracy is typically poor [14].

The support vector regression (SVR) [15] model is considered one of the strong conventional machine learning tools that can efficiently solve complex nonlinear regression problems. Therefore, it has been widely employed in short-term wind power forecasting [16, 17]. The SVR method uses statistical learning theory and structural risk minimization principles, which enables it to improve the generalization ability and determine the best balance between experience, risk, and confidence range using limited data sets [15]. In comparison to other traditional machine learning methods, SVR has been proven to be more effective and superior [16]. Despite the excellent characteristics of SVR, there are still some limitations, mainly in selecting the appropriate hyper-parameters, such as kernel function parameters, non-sensitivity coefficients, and penalty factors that could affect its accuracy and speed [18, 19]. Often, these parameters are selected by experts based on their experience, and the system has a low performance. To improve the performance of the SVR model and overcome the low accuracy caused by the unsuitable choice of its hyper-parameters, several optimization methods have been proposed. Conventional methods, such as grid search [20] and gradient descent [21] are usually used to optimize the hyper-parameters of the SVR. However, these methods are unable to perform large-scale calculations with high precision [22]. To deal with this problem, numerous metaheuristic optimization algorithms have been proposed. For instance, a new approach based on the SVR and an enhanced particle swarm optimization (e-PSO) algorithm was developed and applied in short-term wind power forecasting [23]. In the developed model, the proposed intelligent e-PSO algorithm optimized the hyper-parameters of SVR to increase the forecast accuracy. However, the performance of this method is imperfect, as the particle’s multiple parameters can influence the PSO’s efficiency [24]. Li et al. [22] propose a short-term wind power prediction model based on data mining technology and an improved SVR model. A cuckoo search algorithm (CS) is used in the proposed model for optimizing the kernel function and penalty factor of the SVR. The obtained results show that the proposed model has the highest performance against ARIMA and BPNN models. According to [25], the CS algorithm has three main shortcomings related to the initialization process, the parameter tuning, and the boundary issue. In [26], an improved dragonfly algorithm (IDA), by the introduction of adaptive learning factors and differential evolution strategies, was combined with the SVR model to forecast wind power over short periods. In the proposed model, optimal parameter settings for SVR are determined using the IDA algorithm. Data from the La Haute Borne wind farm in France have been used to confirm the effectiveness of the proposed model. Li et al. [27] developed a hybrid model, by combining the SVR model with a hybrid improved cuckoo search (HICS) algorithm, for short-term wind power forecasting based on wind speed and wind direction input vectors. Compared with the GA–SVR, IDA–SVR, and CS–SVR, the proposed hybrid HICS–SVR displayed more ability to predict short-term wind power output. However, the hybrid HICS-SVR also has some disadvantages such as the complex structure of the HICS algorithm [27].

As a new nature-inspired intelligent algorithm, bald eagle search (BES) [28] has attracted the attention of researchers since proposed. In this paper, a new short-term wind power forecasting model based on the SVR and BES algorithms is presented. In the proposed model, the BES algorithm is adopted to fine-tune the hyper-parameters of the SVR model to enhance its forecasting accuracy. The data gathered from the real wind farm of Sotavento Galicia in Spain is used to evaluate the performance of the developed wind power forecaster. The obtained results are compared to the results provided by other existing forecasting models such as decision tree (DT), random forest (RF), traditional SVR, hybrid SVR, and gray wolf optimization algorithm (SVR–GWO) and hybrid SVR and manta ray foraging optimizer (SVR–MRFO). The comparison reveals that the proposed model has a better performance.

The following points summarize the current paper’s main objective and contributions:

  • An improved SVR model is introduced. In this model, the BES algorithm is used to optimize the hyper-parameters of SVR to enhance the prediction efficiency

  • The proposed hybrid artificial intelligence tool (SVR−BES) is adopted to forecast short-term wind power using historical datasets gathered from the Sotavento Galicia wind farm in Spain.

  • The predicting capability of the proposed SVR−BES model is investigated and assessed using performance evaluation measures such as root-mean-square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and correlation coefficient (R).

  • Comparisons are carried out with other machine learning techniques including decision tree (DT), random forest (RF), SVR, SVR–GWO, and SVR–MRFO to provide strong evidence regarding the accuracy of the proposed model.

The remainder of this paper is organized in the following sections. Section 2 presents a background of the methods. Section 3 illustrates the proposed forecasting model. Section 4 shows simulation results and compares the proposed model against other approaches. Finally, section 5 draws conclusions.

Methods

Support vector regression (SVR)

SVR is a powerful machine learning tool that solves non-linear regression problems and ensures a globally optimal solution. Generally, it involves transforming samples from the input space into a high-dimensional feature space using nonlinear transformations. The network architecture of the SVR model is shown in Fig. 1. The regression function of the SVR model can be defined as follows [29]:

$$F={\omega}^T\varphi (x)+b$$
(1)
Fig. 1
figure 1

The network structure of the SVR model

where F is the forecasted output, ω is the weight vector, b is the bias, and φ(x) is the high-dimensional input vector.

The ω and b coefficients can be calculated by minimizing the risk function R(F) given by:

$$R(F)=\frac{1}{2}{\left\Vert \omega \right\Vert}^2+C\frac{1}{n}\sum \limits_{i=1}^n{L}_{\varepsilon}\left({y}_i,{F}_i\right)$$
(2)

where C represents the penalty parameter used to determine the trade-off between function intricacy and losses, ε represents the loss, and Lε(yi, Fi) is the ε-insensitive loss function.

Equation (2) can be transformed into the following form [29]:

$$\operatorname{minimize}\frac{1}{2}{\left\Vert \omega \right\Vert}^2+C\sum \limits_{i=1}^n\left({\xi}_i+{\xi}_i^{\ast}\right)$$
(3)

where ξ and ξ are two positive slack variables.

As described below, the optimization problem in (3) is easier to solve when expressed in its dual formulation.

$$\operatorname{minimize}\left[-\frac{1}{2}\sum \limits_{i,j=1}^n\left({\alpha}_i+{\alpha}_i^{\ast}\right)\left({\alpha}_j+{\alpha}_j^{\ast}\right)\left(\varphi \left({x}_i\right)\cdot \varphi \left({x}_j\right)\right)-\varepsilon \sum \limits_{i=1}^n\left({\alpha}_i+{\alpha}_i^{\ast}\right)+\sum \limits_{i=1}^n{y}_i\left({\alpha}_i+{\alpha}_i^{\ast}\right)\right]$$
(4)

where α and α are nonlinear Lagrangian multipliers.

The dual maximization problem in (3) can be solved in the following way to obtain the SVR function F [27]:

$$F\left(x,{\alpha}_i,{\alpha}_i^{\ast}\right)=\sum \limits_{i=1}^n\left({\alpha}_i-{\alpha}_i^{\ast}\right)k\left({x}_i+{x}_j\right)+b$$
(5)

where k is a kernel function which can either be a linear, polynomial, sigmoid, or radial basis. Usually, the radial basis function (RBF) has been used due to its simplicity and reliability [30]. This function can be expressed as follows:

$$k\left({x}_i,{x}_j\right)=\exp \left(\frac{{\left\Vert {x}_i-{x}_j\right\Vert}^2}{2\gamma}\right)$$
(6)

where γ represents the bandwidth of the RBF function.

Bald eagle search (BES) algorithm

BES is a novel nature-inspired optimization algorithm developed by Alsattar et al. [28]. This method mimics the social behavior of bald eagles, which are known for their clever hunting techniques. The BES algorithm is mathematically modeled in three stages as shown in Fig. 2: (i) selecting space, (ii) searching in space, and (iii) swooping. The eagle chooses the space with a lot of prey in the first stage. The eagle then moves indoors this space to look for prey in the second stage. In the third stage, the eagle swings from the best-identified position in the second stage towards the prey [29]. The mathematical modeling of the BES algorithm can be summarized as follows:

Fig. 2
figure 2

Steps of bald eagle hunting in order

Selecting space

The selection of the appropriate space can be expressed by equation [28].

$${P}_{new,i}={P}_{best}+\alpha \times r\left({P}_{mean}-{P}_i\right)$$
(7)

where Pbest is the search space chosen by bald eagles, α is the control parameter in the interval [1.5 2], r is a random number in the range [0 1], Pmean designates that these eagles have used up all of the information from the preceding points, amd Pi is the current position of the eagle.

Searching in space

In this stage, the eagle updates its position based on Eq. (8) [28]:

$${P}_{new,i}={P}_i+y(i)\times \left({P}_i-{P}_{i+1}\right)+x(i)\times \left({P}_i-{P}_{mean}\right)$$
(8)

where

$$x(i)= xr(i)/\max \left(\left| xr\right|\right)$$
(9)
$$y(i)= yr(i)/\max \left(\left| yr\right|\right)$$
(10)
$$xr(i)=r(i)\times \sin \left(\theta (i)\right)$$
(11)
$$yr(i)=r(i)\times \cos \left(\theta (i)\right)$$
(12)
$$\theta (i)=a\times \pi \times \mathit{\operatorname{rand}}$$
(13)
$$r(i)=\theta (i)\times R\times \mathit{\operatorname{rand}}$$
(14)

where a and R are coefficients that take values in the ranges [5 10] and [0.5 2], respectively.

Swooping

The swooping strategy of eagles can be described by [28]:

$${P}_{new,i}=\mathit{\operatorname{rand}}\times {P}_{best}+{x}_1(i)\times \left({P}_i-{c}_1\times {P}_{mean}\right)+{y}_1(i)\times \left({P}_i-{c}_2\times {P}_{best}\right)$$
(15)

where c1, c2 ∈ [1, 2]

$${x}_1(i)= xr(i)/\max \left(\left| xr\right|\right)$$
(16)
$${y}_1(i)= yr(i)/\max \left(\left| yr\right|\right)$$
(17)
$$xr(i)=r(i)\times \sinh \left(\theta (i)\right)$$
(18)
$$yr(i)=r(i)\times \cosh \left(\theta (i)\right)$$
(19)
$$\theta (i)=a\times \pi \times \mathit{\operatorname{rand}}$$
(20)
$$r(i)=\theta (i)$$
(21)

Based on the aforementioned stages, the initial arbitrarily generated collection of candidate solutions is enhanced during several iterations until the global optimum is achieved.

Proposed forecasting model

This section deals with the presentation of the proposed hybrid SVR–BES model employed in wind power forecasting. As aforementioned, to build an effective SVR model with good predicting ability, three hyper-parameters must be properly chosen. These parameters are the penalty parameter (C), the non-sensitivity coefficient (ε), and the Gaussian RBF kernel (γ) [31]. The proposed model can estimate wind power by learning from historical data. In the training process, the data collected from historical measurements of wind speed (m/s), wind direction (degree), and wind power (MW) provided by a wind farm are used to train the proposed SVR–BES model as shown in Fig. 3. The BES algorithm is applied to optimize the hyper-parameters of the SVR. This process is repeated until the number of iterations (t) reaches its maximum value. The root mean squared error (RMSE), expressed in Eq. (22), is chosen as the fitness function (f) to evaluate the performance of the proposed model.

$$f=\sqrt{\left(\frac{1}{n}\sum \limits_{i=1}^n{\left({y}_a-{y}_p\right)}^2\right)}$$
(22)
Fig. 3
figure 3

The framework of the SVR–BES wind power forecasting model

where a and P are the actual and forecasted outputs, respectively, and n is the number of training samples.

To describe the remaining operations and facilitate the implementation of SVR–BES, the pseudo-code of its complete algorithm is provided in Algorithm 1.

Algorithm 1
figure a

Hybrid SVR–BES algorithm

Results and discussion

Description of the data set

The performance of the proposed hybrid SVR−BES model has been evaluated using the real power generated from the Sotavento Galicia wind farm in Spain (Fig. 4). This wind farm has a line of 24 wind turbines of 5 different technologies. Its nominal power is 17.56 MW, and its average annual generation is 38.5 MWh [32]. The proposed model has been employed to forecast 48-h-ahead of Sotavento wind power values, taking into account the previous 3-week (504 h) dataset consisting of wind speed, wind direction, and the measured output power of the corresponding wind generator with a time interval of 1 h. The overall simulation period is as follows:

  • Training period: 17 November 2020–2007 December 2020

  • Evaluation period: 08 December 2020–2009 December 2020

Fig. 4
figure 4

The location of the Sotavento wind farm shown on Google map

The graphical representation of historical wind speed and wind direction data gathered from the Sotavento wind farm for 1 h are shown in Fig. 5a. Figure 5b represents the hourly wind power series. A typical three-dimensional graph of wind power versus wind speed and wind direction is shown in Fig. 5c.

Fig. 5
figure 5

Sotavento wind farm hourly data from November 17, 2020, to December 07, 2020. a Wind speed and direction profile, b wind power generation, and c wind power with respect to wind speed and direction

Evaluation procedure and comparative methods

The performance of the developed wind power forecasting model is evaluated based on the common statistical error criteria including root-mean-square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and correlation coefficient (R). The formulas of these statistical metrics are expressed as follows:

$$RMSE=\sqrt{\left(\frac{1}{n}\sum \limits_{i=1}^n{\left({y}_a-{y}_p\right)}^2\right)}$$
(23)
$$MAE=\frac{1}{n}\sum \limits_{i=1}^n\left|{y}_a-{y}_p\right|$$
(24)
$$MAPE=\frac{1}{n}\sum \limits_{i=1}^n\left|\frac{y_a-{y}_p}{y_a}\right|\times 100\%$$
(25)
$$R=\frac{\sum_{i=1}^n\left({a}_i-\overset{\_}{a}\right)\left({P}_i-\overset{\_}{P}\right)}{\sqrt{\sum_{i=1}^n{\left({a}_i-\overset{\_}{a}\right)}^2{\sum}_{i=1}^n{\left({P}_i-\overset{\_}{P}\right)}^2}}$$
(26)

where a and P are the actual and forecasted outputs, respectively, amd n is the number of hours in the testing period; \(\overline{a}\) and \(\overline{P}\) are the rate of the actual and forecasted values, respectively.

The predictive ability of the proposed hybrid SVR–BES model is compared with the following five machine learning techniques:

  • Decision tree (DT): it is a supervised learning method utilized in statistics, data mining, and machine learning. It can be used as a forecasting model to draw conclusions about a set of observations. Inherent simplicity and intelligibility make DT among the most popular algorithms in machine learning.

  • Random forest (RF) algorithm: it is one of the most popular supervised machine learning techniques used for classification and regression. This technique uses several decision trees on various subsets of the dataset and takes the average to improve the predictive accuracy.

  • Conventional SVR: this technique has been described in “Support vector regression (SVR)” section.

  • Hybrid SVR–GWO: in this hybrid model, the SVR is tuned using the well-known GWO optimization algorithm. GWO is formulated based on the hierarchy of leadership and hunting mechanisms of grey wolves in the wild. For simulations of the leadership hierarchy, four different types of gray wolves are used, namely alphas, betas, deltas, and omegas. The optimization process also involves the use of three main stages, namely search, encroachment, and prey attack [33].

  • Hybrid SVR–MRFO: this technique uses a manta ray foraging optimization (MRFO) algorithm to optimize the hyper-parameters of the SVR. MRFO is a novel bio-inspired optimization technique inspired by three foraging behaviors of Manta rays including chain foraging, cyclone foraging, and somersault foraging. In comparison to other optimizers, MRFO offers a faster convergence rate during the optimization process and obtains competitive solutions with less computational effort for most engineering problems [34].

Forecasting results

The proposed model has been trained and tested using the data collected from the Sotavento Galicia wind farm. As previously described, the SVR parameters (C, γ, and ε) are optimized using the BES algorithm. In this work, the BES algorithm parameters used to train the SVR model are presented in Table 1. Figure 6 shows the convergence characteristic of the BES algorithm. According to this figure, the optimal fitness value is determined after 50 iterations. The optimal fond values of C, γ, and ε are 53.255, 0.982845, and 0.001, respectively. Thereafter, the trained SVR model is tested for 2 days (48 h) from 08 to 09 December 2020. The results obtained by the proposed forecasting model are presented in Table 2 and Fig. 7 and compared with the obtained results of 5 other methods including DT, RF, SVR, SVR−GWO, and SVR−MRFO. From these results, it is clear that the best forecasting performance with the lowest deviation from the actual values is achieved using the proposed SVR–BES model.

Table 1 Parameters of the BSE algorithm
Fig. 6
figure 6

Convergence curve of BES algorithm

Table 2 Forecasting results of 08 and 09 December 2020
Fig. 7
figure 7

Forecasting results of different algorithms

The proposed model will be evaluated against the aforementioned methods based on the correlation coefficient R. The R values of the proposed SVR–BES model and the five other compared methods (DT, RF, SVR, SVR−GWO, and SVR−MRFO) are indicated in Fig. 8 and Table 3. It can be seen from these results that the proposed model provided the best R value of 0.9457 for the testing phase. The second-best performing method in the R coefficient is the SVR–MRFO with a value of 0.9443. This was followed by the SVR–GWO model which had a testing score of 0.9396, while the conventional SVR and DT models demonstrated the worst testing results with a correlation coefficient of 0.9107 and 0.8817, respectively. From these results, it is seen that the proposed SVR–BES forecaster outperforms all five comparative techniques in terms of correlation coefficient statistical criteria.

Fig. 8
figure 8

Scatter plots of forecasted wind power for 48 h

Table 3 Goodness-of-fit results of SVR−BES versus other forecasting models

To illustrate the advantages of the proposed model, RMSE, MAE, and MAPE metrics are also used as comparative indicators. The results obtained by the proposed model along with those obtained using the five aforementioned methods are shown in Table 3 and Fig. 9. For example, based on RMSE calculated error, the forecasted results using the proposed SVR–BES model were about 25.58%, 6.33%, 15.58%, 4.79%, and 1.89% better than those forecasted using the DT, RF, SVR, SVR–GWO, and SVR–MRFO models, respectively. Similarly, the proposed approach’s MAE index improvement for the previous methods is 26.60%, 7.25%, 15.86%, 6.78%, and 2.99%, respectively. The proposed approach also shows the lowest value of the MAPE index against the previous methods. From these results, it can be noticed that the proposed hybrid SVR–BES model provided the minimum values of RMSE, MAE, and MAPE and the highest value of R for the DT, RF, SVR, SVR–GWO, and SVR–MRFO prediction methods. As a result, the proposed forecasting model can be considered a new and efficient tool for wind power short-term forecasting.

Fig. 9
figure 9

Comparison of the RMSE, MAE, and MAPE of different wind power forecasters

Conclusions

This paper proposed a new hybrid model for short-term wind power forecasting by combining the support vector regression (SVR) and bald eagle search (BES) optimization algorithm. In the proposed model, the bald eagle search (BES) nature-inspired optimization algorithm optimized the hyper-parameters of support vector regression (SVR), i.e., C, γ, and ε to improve its forecasting accuracy. To show the prediction ability of the proposed model in comparison with other machine learning techniques such as traditional SVR, hybrid SVR, and gray wolf optimization algorithm (SVR–GWO) and hybrid SVR and manta ray foraging optimizer (SVR–MRFO), we tested these models on real power generated from the Sotavento Galicia wind farm in Spain. Four statistical indicators (R, RMSE, MAE, and MAPE) are taken as the criteria for the evaluation of the model’s performance. The simulation results show that the proposed hybrid SVR−BES model has the lowest values of RMSE, MAE, and MAE and the highest value of R which reveals the efficiency of the proposed model in short-term wind power forecasting.

The hybrid SVR–BES model also has some weaknesses, and there are future studies to be completed. Although, fine-tuning the SVR parameters can enhance its performance; however, finding optimal hyper-parameters using meta-heuristic algorithms is time-consuming. The BES algorithm can be enhanced to overcome this problem. The BES algorithm could also be combined with other machine learning methods to develop powerful tools that can be used in a wide range of engineering applications. Furthermore, other advanced optimization algorithms could be incorporated with the SVR model to enhance its performance.