1 Introduction

Renewable energy utilization mainly wind power generation has acquired magnificent considerations in eye-catching number of countries following the adoption of the Kyoto protocol environmental convention. Despite its significant environmental benefits, the continuous intermittency and chaotic fluctuations of wind speed and other weather variables make the output power of wind power generation systems completely stochastic and different from those of conventional energy sources. Due to this indeterminacy, it may get several challenges to connect large quantities of wind power into a power system network. However, this challenge is not surmountable. In order to enhance the economic competence and acceptability of the wind power and to allow a reduction in the penalty of an instantaneous spot market coming from over estimation or underrating of the generation, the exact forecasting of wind power as well as wind speed is necessary. Definitely, a reliable forecasting system can help distribution system operators and power traders to make a better decision on critical situation.

Recently, several techniques have been developed to forecast the wind power and speed. Existing techniques can be classified as statistical, physical and time series modeling techniques based on the forecasting models they used [1]. Currently, it is observed that researchers employ a combination of statistical model and physical methods besides each other to get an optimal approach that is applicable for longer horizons of prediction systems. In these techniques statistical model plays auxiliary role to data collected by physical methods.

Although two major classes of methods have been recognized for the wind power forecasting, (in [2] and [3], comprehensive reviews of these techniques are presented), as mentioned earlier, combination of statistical and physical techniques are more common than the others [4, 5]. Besides, numerous other spatial correlation methods are proposed for short term wind power prediction with the goal of attaining higher prediction accuracy [6]. However, through the passage of time, more advanced and intelligent methods have been proposed. For instance, Artificial Neural Network (ANN) in [7,8,9], ANN with Gaussian process approximation and adaptive Bayesian learning in [10], combination of wavelet transform with ANN [11], fuzzy logic methods in [5, 12], Kalman filter in [13], support vector machine in [14], and adaptive neuro-fuzzy inference system (ANFIS) in [15] have been proposed for wind power prediction.

Regarding the available research works in the area, new forecasting approaches and techniques of input–output data manipulations are still in demand in order to enhance prediction accuracy and decrease the uncertainty in wind power forecasting, while keeping practically acceptable computation time. This objective leads to the new double-stage hybrid approach proposed in this research paper to utilize both statistical (wind farm SCADA records) and physical (NWP meteorological variables) data sources for achieving an effective and more accurate short-term wind power forecaster model.

In this paper, a new effective short-term wind power forecasting approach based on a double stage hierarchical ANFIS is proposed. The proposed approach utilizes Back Propagation (BP) algorithm to optimize the parameters of the membership functions of fuzzy inference system to achieve a lower error.

The proposed methodology has two hierarchical ANFIS stages. In the first stage, ANFIS is implemented to predict wind speed at the exact height of the wind turbine hub at the point of wind farm installation. In this phase, forecasted meteorological variables (wind speed, wind direction, air pressure, air temperature and humidity) from NWP model as inputs and actual wind speed measurement recorded by the wind farm SCADA as output are utilized to train the ANFIS. In the second stage, ANFIS is developed to map the wind turbine wind speed vs. wind power characteristics based on real operational conditions. Actual wind speed and power measurements recorded by the wind farm SCADA are used, respectively, as input and output to train the ANFIS in this stage. Then, the forecasted wind speed by the ANFIS model in the first stage is applied to the developed (trained) ANFIS model in the second stage in order to estimate the next day wind power output of the wind farm.

The prediction results are presented for the next 24 h with 1 hour time steps. The developed DSA (double-stage ANFIS) prediction approach is compared with DSN (double-stage neural network), DSHGN (double-stage hybrid GA-NN, i.e., NN combined with genetic algorithm), and DSHPN (double-stage hybrid PSO-NN, i.e., NN in combination with PSO) approaches, to demonstrate its effectiveness regarding short-term wind power prediction accuracy and computation time.

The paper is organized as follows. Section II discusses the proposed forecasting model in which methods of employing SCADA system and NWP model are described. Methods of the ANFIS framework as a prediction system and its brief working principles are provided in section III. Sections IV provides different criterions used to evaluate the prediction accuracy. The numerical findings and prediction results for the considered real case-study are provided in sections V. The paper conclusions are drawn in section VI.

2 Discussion

2.1 Short-term wind power forecasting model

2.1.1 Proposed wind power forecasting strategy

In this paper, a short-term (24-h ahead) wind power forecasting using a double-stage ANFIS model is presented. The main data sources are historical measurement records of the wind farm SCADA system database and meteorological variables of NWP model. The prediction system uses the meteorological predictions of the NWP model obtained at the vicinity of Goldwind microgrid system wind farm in Yizhuang, Beijing, China within 5 km resolution, and actual measurement records of the wind farm SCADA system database. The proposed approach has two hierarchical ANFIS stages. In the first stage, the wind turbine is modeled by a PSO-ANFIS black box to develop a relationship between the predicted NWP meteorological variables (i.e. wind speed, wind direction, air pressure, air temperature and humidity) and the actual wind speed measurement recorded by the wind farm SCADA system. In the second stage, ANFIS model is developed to map the wind turbine wind speed vs. wind power characteristics based on the real operational conditions. Then, the forecasted wind speed by the ANFIS model in the first stage is applied to the developed (trained) ANFIS model in the second stage in order to forecast the next day wind power output of the wind farm.

The prediction performance of the wind power forecaster in this approach, actually, highly depends on the quantities of the NWP models. In fact, the main focus of this research study is on utilizing the NWP data that remarkably plays an auxiliary role to improve accuracy of the short term prediction. The prediction scheme is depicted in Fig. 1.

Fig. 1
figure 1

ANFIS-based double-stage wind power prediction model

In the process of modeling, a 1 year information record provided from SCADA historical measurements and NWP/WRF model historical weather forecasts are used to train an ANFIS that successfully can estimate a transfer function between specific patterns of input and output quantities. Then, BP is applied to optimize the parameters of the membership functions of ANFIS. This process continues until the prediction error reaches to a suitable value.

Real-time SCADA system

SCADA system as the central nerve system and inseparable component of the wind farm plays a key role for forecasting systems. Usage of a real-time SCADA gives the operator access to manage wind farm by supervising all of the wind turbines online. This opportunity is presented for operator to set relevant actions in critical situations by a 10 min or 1 h record of the wind farm turbines. In addition, this management system presents a comprehensive record of the wind velocity and power outputs as well as turbines operational availability, which acts as a foundation for short-term wind power prediction.

Numerical weather prediction (NWP) model

Wind data has a significant impact on wind power forecasting. There are numerous techniques to obtain the wind data: measurements/observations, data mining and numerical weather simulations. The most direct and reliable method to obtain wind data is via on-site observations or measurements. But, they are not always available. Data mining method is flexible, however its potential to downscale the meteorological weather data is limited. The NWP models utilize equations of physical conservation of energy and this permits a more realistic downscaling of the data. Certainly, high-resolution (low-radius prediction sphere) NWP of wind plays the vital role for power prediction.

Recently, concerning availability of advanced computational systems, several wind power prediction researches are directed using NWP models weather data. These studies utilize several NWP models like WRF, COSMO, MM5, and RAMS [16,17,18,19]. Besides, several techniques of extrapolation such as wind shear power law and logarithmic law have proposed by researchers to provide appropriate weather information at the height of specific wind turbine hub using meteorological data that are collected at 10 m above the surface of the ground [20].

3 Method

3.1 Proposed structure for ANFIS

3.1.1 Adaptive neuro-fuzzy inference system (ANFIS)

A fuzzy logic system is capable of mapping nonlinear relationships between an input vector and a scalar output; moreover it can handle both numerical values and human-like linguistic knowledge or variables.

Fuzzy logic system consists of four main components: fuzzifier, rules, inference engine and defuzzifier. Fuzzifier converts a non-fuzzy (crisp) input variable into a fuzzy variable representation, where membership functions assign the degree of belongingness of the variable to a specified attribute. Fuzzy rules are simple “if-then” type, and can be obtained from numerical data relationships or from expert linguistic experience. Sugeno and Mamdani inference engines are the two main kinds of inference mechanisms used in fuzzy logic systems [15].

The Mamdani engine type merges together fuzzy rules into a mapping from fuzzy sets to fuzzy output sets and then apply defuzzification to the output fuzzy set to obtain crisp outputs, while the Takagi-Sugeno directly relates fuzzy inputs and crisp outputs using singleton spikes output membership function. The defuzzification step is the final stage of the fuzzy inference process which is accomplished by the defuzzifier by converting the output fuzzy set into a crisp number using different methods such as: centroid of area, bisector of area, mean of maxima, or maximum criteria.

ANNs have the advantage over the fuzzy inference systems that knowledge is automatically gained during the training process by updating the connection weights between neurons [21]. But, this knowledge cannot be taken out from the trained network acting as a black box. On the other hand, fuzzy inference systems can be distinguished by their rules, but these rules are tricky to define when the system has many variables and their relationships are even more complex [22].

A hybrid of NNs and fuzzy inference systems has the advantages of each of them and better performances than any one of them. In a neuro-fuzzy system, NNs automatically extract fuzzy rules from the numerical data and, through the training process, the parameters of the membership functions are adaptively attuned.

ANFIS is a type of adaptive multi-layered feedforward networks [23], applied to nonlinear prediction where past data samples are utilized to predict the data samples ahead. ANFIS adopts the self-learning ability of neural networks with the linguistic expression function of fuzzy inference system [24].

The ANFIS architecture is shown in Fig. 2. The ANFIS network considered is a Takagi-Sugeno fuzzy inference system mapped onto a neural network structure with five layers. Every layer contains a number of nodes characterized by the node function. This node function is discussed as follows. Suppose Oi represents the output of the i th node in layer j.

Fig. 2
figure 2

ANFIS architecture

In layer 1, each node i is an adaptive node with the following node function:

$$ {O}_{1, i}=\mu {A}_i(x), i = 1,\ 2, $$
(1)

or

$$ {O}_{1, i}=\mu {B}_{i-2}(y), i = 3,\ 4 $$
(2)

Here, either x or y is the input to the i th node and Ai (or Bi-2) is a linguistic label associated with this specific node.

Hence, O1,i is the membership value of a fuzzy set A (A1, A2, B1, or B2) and it indicates the degree to which the specified input x (or y) satisfies the quantifier A. The membership functions for A and B are frequently described by a generalized bell functions as follows:

$$ \mu {A}_i(x)=\frac{1}{1+{\left|\frac{x-{r}_i}{p_i}\right|}^{2{q}_i}} $$
(3)

where p i , q i , and r i are the parameters of the membership function. When the values of these parameters vary, the bell-shaped membership function changes accordingly, thus illustrating various forms of membership functions on linguistic label A i .

Actually, any continuous and piecewise differentiable functions, like triangular-shaped functions, are also eligible candidates for the node function in this layer [25]. Parameters of this layer are called premise parameters.

In layer 2, each nodeΠis fixed whose output representing the rule firing strength is the product of the incoming input signals:

$$ {O}_{2, i}={w}_i={\displaystyle {\prod}_j{\mu}_j}=\mu {A}_i(x).\mu {B}_i(x), i = 1,\ 2 $$
(4)

In layer 3, every node N calculates the ratio of the i th rule’s firing strength to the total sum of all rules’ firing strengths (normalization):

$$ {O}_{3, i}={\overline{w}}_i=\frac{w_i}{{\displaystyle {\sum}_j{w}_j}}=\frac{w_i}{w_1+{w}_2}, i = 1,\ 2 $$
(5)

The results of this layer are referred to as normalized firing strengths.

In layer 4, every node is adaptive and determines the role of the i th rule to the overall total output:

$$ {O}_{4, i}={\overline{w}}_i{f}_i={\overline{w}}_i\left({a}_i x+{b}_i x+{c}_i\right) $$
(6)

where \( {\overline{w}}_i \) is the outcome of layer 3, and (a i , b i and c i ) is the parameter set. Parameters in this layer are said to be consequent parameters.

In layer 5, the single node calculates the final output by summing up all the incoming input signals to this layer:

$$ {O}_{5, i}={\displaystyle \sum_i{\overline{w}}_i{f}_i}=\frac{{\displaystyle \sum_i{w}_i{f}_i}}{{\displaystyle \sum_i{w}_i}} $$
(7)

Hence, an adaptive network is equivalent to a Sugeno-type fuzzy inference system from functionality point of view.

Optimization of ANFIS membership function parameters

In this paper, the two-stage hierarchical ANFIS networks utilize BP algorithm to tune the parameters of the membership functions. The fuzzy membership functions considered in this research paper are triangular-shaped type.

As aforementioned, fundamentally, ANFIS network is a fuzzy inference system mapped onto a neural network structure whose membership function parameters are tuned with a BP algorithm based on some collection of input–output data. This allows the ANFIS network to learn. BP carries out a gradient descent within the solution’s vector space towards a global minimum value along the steepest vector of the error surface.

BP learning algorithms are fast and thus suits for wind power predictions which can be utilized in real-time applications such as energy management, dynamic dispatching and scheduling in large interconnected power systems or microgrids. ANFIS membership functions’ parameters are formed as variables of the BP and the mean squared error is utilized as a cost function in BP. The objective of proposed approach is to reach a minimum value for this cost function. Figure 3 shows the general scheme of the forecasting system.

Fig. 3
figure 3

Double-stage ANFIS wind power prediction algorithm

4 Wind power prediction accuracy evaluation

In order to evaluate the accuracy of the DSA wind power prediction approach, the mean absolute percentage error (MAPE), the sum squared error (SSE), the root mean squared error (RMSE), and the standard deviation of error (SDE) criterions are used. These performance criterions are computed as a function of the actual wind power that occurred, and defined as follows.

The MAPE criterion is defined as:

$$ MAPE=\frac{100}{N}{\displaystyle \sum_{h=1}^N\left|\frac{P_h^a-{P}_h^f}{{\overline{P}}_h^a}\right|} $$
(8)
$$ {\overline{P}}_h^a=\frac{1}{N}{\displaystyle \sum_{h=1}^N{P}_h^a} $$
(9)

where \( {P}_h^a \) and \( {P}_h^f \) are respectively the actual and forecasted wind power at hour h, \( {\overline{P}}_h^a \) is the average actual wind power of the prediction horizon and N is the prediction horizon.

The SSE criterion is defined as:

$$ S S E={\displaystyle \sum_{h=1}^N{\left({P}_h^a-{P}_h^f\right)}^2} $$
(10)

The RMSE criterion is defined by:

$$ RMSE=\sqrt{\frac{1}{N}{\displaystyle \sum_{h=1}^N{\left({P}_h^a-{P}_h^f\right)}^2}} $$
(11)

The SDE criterion is defined by:

$$ S D E=\sqrt{\frac{1}{N}{\displaystyle \sum_{h=1}^N{\left({e}_h-\overline{e}\right)}^2}} $$
(12)
$$ {e}_h={P}_h^a-{P}_h^f $$
(13)
$$ \overline{e}=\frac{1}{N}{\displaystyle \sum_{h=1}^N{e}_h} $$
(14)

where e h is the prediction error at hour h and ē is the average error of the prediction horizon.

The variability after fitting a prediction model is a measure of the uncertainty of a model, which can be measured through the evaluation of the variance of the prediction error. The prediction is more precise if this variance is smaller [15, 26]. From definition (12), daily error variance can be evaluated as:

$$ {\sigma}_{e, day}^2=\frac{1}{N}{\displaystyle \sum_{h=1}^N{\left(\left|\frac{P_h^a-{P}_h^f}{{\overline{P}}_h^a}\right|-\left({e}_{day}\right)\right)}^2} $$
(15)
$$ {e}_{day}=\frac{1}{N}{\displaystyle \sum_{h=1}^N\left|\frac{P_h^a-{P}_h^f}{{\overline{P}}_h^a}\right|} $$
(16)

5 Result

5.1 Case study and numerical Results

The DSA approach has been applied for short-term wind power forecasting in a microgrid wind farm in Beijing, China. This wind farm has a single wind turbine unit with a generation capacity of 2500 kW. NWP meteorological forecasts and historical wind speed and power data are the main data inputs for training. The influence of input parameter dependency on the accuracy of the prediction has been analyzed by dividing the input data set into different subsets.

The prediction horizon is 1 day with a time-interval of 1 hour. Time series of NWP weather forecast, actual SCADA measurement of wind speed and actual SCADA measurement of wind power for the wind farm are recorded from the 1st May 2014 to the 31st April 2015. The forecasting information is given for 4 days corresponding to the four seasons of a year (July 21, 2015, October 15, 2015, January 4, 2016 and April 13, 2016). Thus, days with specifically good wind power characteristics are purposely not selected. This results in an irregular accuracy allocation throughout the year that shows the reality.

Numerical results with the DSA approach are shown in Figs. 4, 5, 6 and 7, respectively for the winter, spring, summer and fall days. Each figure shows the SCADA actual wind power record with the forecasted wind power by the proposed approach.

Fig. 4
figure 4

Actual wind power vs. forecasted wind power for a winter day

Fig. 5
figure 5

Actual wind power vs. forecasted wind power for a spring day

Fig. 6
figure 6

Actual wind power vs. forecasted wind power for a summer day

Fig. 7
figure 7

Actual wind power vs. forecasted wind power for a fall day

Table 1 gives the values of the criterions used to evaluate the accuracy of the DSA approach in predicting wind power. The first column shows the day, the second column gives the MAPE, the third column gives the square root of the SSE, the fourth column gives the RMSE, and the fifth column gives the SDE.

Table 1 Daily forecasting error statistical analysis

Table 2 presents a comparison between the DSA prediction approach and three other approaches (DSN, DSHGN, and DSHPN), with respect to the MAPE criterion.

Table 2 Mape results comparison of different methods

The proposed forecasting approach gives better forecasting accuracy: the MAPE has 8.1133% average value. The proposed approach’s average MAPE improvement with respect to the previous three approaches is 37.93, 34.26 and 20.73%, respectively.

In addition to implementing effective wind power forecasting approach, analysis of impacts of input-data dependency (input-parameter selection) on the accuracy of a prediction model is highly important in developing a stable wind power forecasting model.

The influence of input-data dependency on the forecasting accuracy of the proposed approach are also analyzed by dividing the prediction input data set into five subsets: where subset #1 consists of wind speed, subset #2 contains wind speed and wind direction, subset #3 contains wind speed, wind direction and air temperature, subset #4 contains wind speed, wind direction, air temperature and air pressure, and subset #5 contains wind speed, wind direction, air temperature, air pressure and humidity. Table 3 presents a comparison between input-data subset #5 and four other subsets (input-data subsets #1 to 4), regarding the MAPE criterion. The proposed forecasting approach with input-data subset #5 gives better forecasting accuracy: the MAPE has 8.1133% average value. The proposed approach’s average MAPE improvement using input-data subset #5 with respect to the previous four subsets is 3.20, 1.83, 1.37 and 0.92%, respectively.

Table 3 Comparison of mape results for different input-data subsets

The absolute values of prediction errors with respect to the maximum capacity of the wind farm (i.e., normalized by the maximum wind farm capacity), considering all the approaches, are shown in Figs. 8, 9, 10 and 11, respectively for the winter, spring, summer and fall days.

Fig. 8
figure 8

Normalized absolute values of forecast errors for a winter day

Fig. 9
figure 9

Normalized absolute values of forecast errors for a spring day

Fig. 10
figure 10

Normalized absolute values of forecast errors for a summer day

Fig. 11
figure 11

Normalized absolute values of forecast errors for a fall day

The DSA approach provides smaller errors compared with the other approaches.

Besides the MAPE criterion, consistency of results is another vital factor to compare prediction approaches. Table 4 presents a comparison between the DSA approach and three other approaches (DSN, DSHGN, DSHPN), with respect to the daily prediction error variance.

Table 4 Daily prediction error variance

As shown in Table 4, the average forecasting error variance is smaller for the DSA approach, reflecting less uncertainty in the forecasts. The proposed approach’s average error variance improvement with respect to the previous three approaches is 123.88, 100 and 58.21%, respectively.

The DSA approach results in improved prediction accuracy, outperforming the other approaches. The proposed prediction approach with input data subset #5 has presented the best performance and enhanced accuracy over the other data subsets.

Furthermore, the average computation time is around 8 s, using MATLAB on a PC with Intel core i5-5200 CPU, 2.20 GHz processor and 4 GB RAM. Therefore, the proposed multi-stage hierarchical forecasting strategy is both novel and effective for short-term wind power forecasting.

6 Conclusion

In this research paper, a new hierarchical hybrid approach is proposed for short-term wind power forecasting using ANFIS. The approach has two hierarchical stages. The first ANFIS network models the relationship between NWP meteorological parameters around the vicinity of the wind farm within 5 km resolution and the exact wind speed measurement at the wind farm. Whereas, the SCADA records of the actual wind speed and output power relationships are modeled by the ANFIS network in the second stage. Then, the wind speed prediction result from the first stage is applied to the second stage to forecast the wind power for the next day. The implementation of the proposed approach for wind power forecasting is both novel and effective. The MAPE has 8.1133% average value, outperforming other three prediction methods while the average computational time is lower than 8 s. Therefore, the presented numerical results validate the effectiveness of the proposed approach for short-term wind power prediction.