Wind power has been undergoing a rapid development in recent years. The large-scale integration of wind power is challenging power grids operation and management [1]. Compared with conventional generation, one of the largest problems of wind power is its dependence on the volatility of the wind [2]. Unexpected variations of wind generation may increase operating costs, increasing reserves requirements, and posing potential risks to system reliability. In order to schedule the spinning reserve capacity and manage the grid operation, persistence approach was commonly used to predict changes of the wind power production in the ultra-short-term [3, 4].

Wind power is impacted by wind speed, temperature, humidity, latitude, terrain topography, air pressure, and other factors [5]. Modeling the behavior of wind is a challenge due to its stochastic nature. The existing forecasting methods can be mainly classified as physical approaches, statistical approaches, as well as a combination of both [6]. The physical models use physical considerations, such as meteorological (numerical weather predictions), and topological (orography, roughness, obstacles) information and technical characteristics of the wind turbines (hub height, power curve, thrust coefficient) [7]. Statistical models find the relationships between a set of explanatory variables including NWP results and online measured generation data [7]. The basic statistical approach includes the time-series analysis and neural networks, such as Box-Jenkins ARMA (p, q) models, where p represents most recent wind speeds and q represents most recent forecasting errors [8]. Neural network (NN) models have been widely applied in a variety of business fields including accounting, management information systems, marketing, and production management. Many researchers focus on the improvement of NN, including recurrent NN, deep NN and so on [914]. Extreme Learning Machine(ELM) is based on a single-hidden layer feed-forward neutral network and only needs to calculate random weight between inputting layer and hidden layer. The performance of ELM is better than traditional NN in terms of numerical experiments. Furthermore, the combined models have been widely used to improve wind power forecasting accuracy. Wind power forecasting method based on empirical mode decomposition (EMD) and support vector machine (SVM) was proposed to cope with the nonlinearity and non-stationarity of wind speed data. The combined approach can improve the forecasting accuracy by 5–10 % compared to single statistics [15].

Several wind power forecasting tools have been developed across the world. A Wind Power Prediction Tool (WPPT) is developed to predict the wind power on various time scales, from half an hour to 36 h ahead. This tool is based on adaptive recursive least square estimation with exponential forgetting factor [16]. The WPPT can forecast the wind power production in relatively large geographical regions. For each individual wind farm, it uses statistical models to describe the relationship between observed power production and the weather predictions. Another tool named as the Prediktor developed at Risø mainly uses physical relations to transform the predicted wind into predicted power [17]. The Zephyr tool is the combination of the Wind Power Prediction Tool (WPPT) and Prediktor tool and its main goal is to merge Prediktor and WPPT to obtain synergy between the physical and the statistical approach [17]. The Sipreolico tool, developed by the University of Carlos III of Madrid, consists of nine adaptive nonparametric statistical models that are recursively estimated with either the recursive least squares algorithm or a Kalman filter. The tool is based on Spanish HIRLAM forecasts, taking into account hourly SCADA data from 80 % of all Spanish wind turbines [18]. The EWIND model developed by TrueWind, Inc. applies a once-and-for-all parameterization for the local effect by using the output of the ForeWind NWP model, and it uses either a multiple screening linear regression model or a Bayesian neural network to find out the systematic errors [19]. The Advanced Wind Power Prediction tool (AWPT) was developed by ISET (the institute of “Solare Energieversorgungstechnik”) and the tool uses weather forecasts coming from Lokalmodell (LM) of the Deutsche Wetterdienst (DWD) and predicts the wind power with artificial neural networks [20]. Ecole de Mines de Paris (ARMINES) and Rutherford Appleton Laboratory (RAL) have developed models for short-term prediction based on fuzzy-neural networks [20].

The individual forecasting method cannot achieve a high accuracy due to the intrinsic characteristics of wind speed and wind power. In this paper, a combined statistical approach for wind power forecasting is presented by using Extreme learning method and error correction. The ultra-short-term power forecasting is acquired based on processing the forecasting error of short-term forecasting results based on the persistence method in terms of Normalized Root Mean Square Error (RMSE).


Extreme Learning Machine (ELM)

ELM theory was proposed to predict wind power, which tends to provide good generation and performance at extremely fast learning speed in theory and practical applications. ELM has the following several advantages:

  1. 1)

    The parameters of ELM can be set easily, and ELM originally can get a good performance only with fitful references in hidden layers.

  2. 2)

    The computation of ELM is efficient, which does not need as many iterations as Neural Network (NN) and as complexity as Support Vector Machine (SVM) in when solving quadratic optimization.

  3. 3)

    ELM has good generalization performance. And the experimental results show that the ELM can achieve good generalization performance in most cases and can learn faster than feed-forward neural networks [21].

Now the EML has been widely used in several fields such as face recognition, image classification,wind prediction in short-term scale. Wind power forecasting can be regarded as an ELM problem, because some factors such as wind speed, air condition, temperature & humidity, wind turbine arrangement have influence on wind production. As for how they exactly affect wind production has not been clearly known [22]. ELM model can be established by using example data and predict the curve of power in short-term.

The ELM model is based on a single-hidden layer feed-forward neural network (SLFN). The advantage of the ELM algorithm is that it distributes the weights and thresholds between the inputting layer and the hidden layer in random and does not need to adjust these random parameters during the whole learning process so that it can complete the training process extremely fast. Based the above advantage, ELM is chosen as a predictor to predict day-ahead wind power in the short-term time scale. The structure of a standard ELM network is demonstrated in Fig. 1.

Fig. 1
figure 1

Architecture of Extreme Learning Machine

The main parameters of ELM are described as follow:

$$ \boldsymbol{\omega} ={\left[\begin{array}{cccc}\hfill {\omega}_{11}\hfill & \hfill {\omega}_{12}\hfill & \hfill \cdots \hfill & \hfill {\omega}_{1n}\hfill \\ {}\hfill {\omega}_{21}\hfill & \hfill {\omega}_{22}\hfill & \hfill \cdots \hfill & \hfill {\omega}_{2n}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill {\omega}_{l1}\hfill & \hfill {\omega}_{l2}\hfill & \hfill \cdots \hfill & \hfill {\omega}_{\boldsymbol{ln}}\hfill \end{array}\right]}_{l\times n} $$

where ‘ω’ is the network weight between the input layer and the hidden layer, and ‘ω ij ’ is the weight between the i th input node of the input layer and the j th hidden node of the hidden layer. ‘l ’ is the number of input nodes in input layer. ‘n’ is the number of hidden nodes in output layer.

$$ \boldsymbol{\beta} ={\left[\begin{array}{cccc}\hfill {\beta}_{11}\hfill & \hfill {\beta}_{12}\hfill & \hfill \cdots \hfill & \hfill {\beta}_{1m}\hfill \\ {}\hfill {\beta}_{21}\hfill & \hfill {\beta}_{22}\hfill & \hfill \cdots \hfill & \hfill {\beta}_{2m}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill {\beta}_{l1}\hfill & \hfill {\beta}_{l2}\hfill & \hfill \cdots \hfill & \hfill {\beta}_{lm}\hfill \end{array}\right]}_{l\times m} $$

where the ‘β’ is the network weight between the hidden layer and the output layer, and ‘β ij ’ is the weight between the i th hidden node of the hidden layer and the j th hidden node of the output layer. ‘m’ is the number of output nodes in output layer.

$$ b={\left[\begin{array}{cccc}\hfill {b}_1\hfill & \hfill {b}_2\hfill & \hfill \cdots \hfill & \hfill {b}_l\hfill \end{array}\right]}_{l\times 1}^{-1} $$

where ‘b’ is the threshold of the hidden layer.

X is supposed to be the input matrix and the history data X are used to train the ELM network.

$$ \boldsymbol{X}={\left[\begin{array}{cccc}\hfill {x}_{11}\hfill & \hfill {x}_{12}\hfill & \hfill \cdots \hfill & \hfill {x}_{1p}\hfill \\ {}\hfill {x}_{21}\hfill & \hfill {x}_{22}\hfill & \hfill \cdots \hfill & \hfill {x}_{2p}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \hfill & \hfill \vdots \hfill \\ {}\hfill {x}_{n1}\hfill & \hfill {x}_{n2}\hfill & \hfill \cdots \hfill & \hfill {x}_{np}\hfill \end{array}\right]}_{n\times p} $$

The real outputting matrix of the ELM network can be defined as below:

$$ \boldsymbol{T}={\left[\begin{array}{cccc}\hfill {\boldsymbol{t}}_1\hfill & \hfill {\boldsymbol{t}}_2\hfill & \hfill \cdots \hfill & \hfill {\boldsymbol{t}}_p\hfill \end{array}\right]}_{m\times p} $$

And based on the equations (1)–(4), the real outputting matrix of the ELM can be defined as follow:

$$ {\boldsymbol{t}}_{\boldsymbol{j}}={\left[\begin{array}{c}\hfill {t}_{1j}\hfill \\ {}\hfill {t}_{2j}\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {t}_{mj}\hfill \end{array}\right]}_{m\times 1}={\left[\begin{array}{c}\hfill {\displaystyle \sum_{i=1}^l{\beta}_{i1}g\left({\boldsymbol{\omega}}_{\boldsymbol{i}}{\boldsymbol{X}}_{\boldsymbol{j}}+{b}_i\right)}\hfill \\ {}\hfill {\displaystyle \sum_{i=1}^l{\beta}_{i2}g\left({\boldsymbol{\omega}}_{\boldsymbol{i}}{\boldsymbol{X}}_{\boldsymbol{j}}+{b}_i\right)}\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {\displaystyle \sum_{i=1}^l{\beta}_{im}g\left({\boldsymbol{\omega}}_{\boldsymbol{i}}{\boldsymbol{X}}_{\boldsymbol{j}}+{b}_i\right)}\hfill \end{array}\right]}_{m\times 1},\left(j=1,2,3,......,p\right) $$


$$ {\boldsymbol{\omega}}_{\boldsymbol{i}}=\left[\begin{array}{cccc}\hfill {\omega}_{i1}\hfill & \hfill {\omega}_{i2}\hfill & \hfill \cdots \hfill & \hfill {\omega}_{in}\hfill \end{array}\right] $$
$$ {\boldsymbol{X}}_{\boldsymbol{j}}={\left[\begin{array}{cccc}\hfill {x}_{1j}\hfill & \hfill {x}_{2j}\hfill & \hfill \cdots \hfill & \hfill {x}_{nj}\hfill \end{array}\right]}^T $$

g(x) is an activation function in the hidden layer of the ELM.

The following equations can be acquired by Eqs. (5)–(8):

$$ \widehat{\boldsymbol{\beta}}={\boldsymbol{H}}^{-1}{\boldsymbol{T}}^T $$


$$ \begin{array}{l}\boldsymbol{H}\left({\boldsymbol{\omega}}_1,{\boldsymbol{\omega}}_2,\cdots, {\boldsymbol{\omega}}_l,{b}_1,{b}_2,\cdots, {b}_l,{\boldsymbol{X}}_1,{\boldsymbol{X}}_2,\cdots {\boldsymbol{X}}_p\right)\\ {}=\left[\begin{array}{cccc}\hfill g\left({\boldsymbol{\omega}}_1\cdot {\boldsymbol{X}}_1+{\boldsymbol{b}}_1\right)\hfill & \hfill g\left({\boldsymbol{\omega}}_2\cdot {\boldsymbol{X}}_1+{b}_2\right)\hfill & \hfill \cdots \hfill & \hfill g\left({\boldsymbol{\omega}}_l\cdot {\boldsymbol{X}}_1+{b}_l\right)\hfill \\ {}\hfill g\left({\boldsymbol{\omega}}_1\cdot {\boldsymbol{X}}_2+{\boldsymbol{b}}_1\right)\hfill & \hfill g\left({\boldsymbol{\omega}}_2\cdot {\boldsymbol{X}}_2+{b}_2\right)\hfill & \hfill \cdots \hfill & \hfill g\left({\boldsymbol{\omega}}_l\cdot {\boldsymbol{X}}_2+{b}_l\right)\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill g\left({\boldsymbol{\omega}}_1\cdot {\boldsymbol{X}}_p+{\boldsymbol{b}}_1\right)\hfill & \hfill g\left({\boldsymbol{\omega}}_2\cdot {\boldsymbol{X}}_p+{b}_2\right)\hfill & \hfill \cdots \hfill & \hfill g\left({\boldsymbol{\omega}}_l\cdot {\boldsymbol{X}}_p+{b}_l\right)\hfill \end{array}\right]\end{array} $$

And H −1 is pseudo-inverse matrix of H. The ELM can be solved in the following algorithm:

Algorithm ELM: Given a training set {(X, T)| XR n×p, TR m×p }, activation function g(x), testing set \( \widehat{\boldsymbol{X}} \) and hidden node number p.

Step one: Randomly assign input weight ω and bias b.

Step two: Calculate the hidden layer output matrix H.

Step three: Calculate the output weight matrix β.

Step four: Input matrix \( \widehat{\boldsymbol{X}} \) and get the output testing results by the transform (9).

Error correction

Based on the ELM forecasting results, an error correction model is applied to obtain the ultra-short-term forecasting. The persistence method is used as a benchmark model to examine whether an advanced model can perform well. In this model the future wind power will be the same as occurred in the present time step as given by

$$ {\widehat{P}}_{t+k\Big|t}={P}_t $$

where \( {\widehat{P}}_{t+k\Big|t} \) is the forecast at time t for the look-ahead time k and P t is the measurement at time t.

In comparison with wind power, the temporal characteristics of wind power forecasting errors are less mentioned in literatures. However, it is found that the forecasting error level at next time point tends to keep the same as present time point by analyzing the states transition probability among different error levels. Thus, the error for next time point can be written as

$$ {\widehat{e}}_{t+1\Big|t}={e}_t $$

where e t is the deviation between forecasted and measured wind power.

$$ {e}_t={p}_t-{\widehat{p}}_{t\Big|t-1} $$

The computed error is then added to the forecasted wind power for next time point to get the corrected forecasting.

$$ {\tilde{p}}_{t+1\Big|t}={\widehat{p}}_{t+1\Big|t}+{\widehat{e}}_{t+1\Big|t}={\widehat{p}}_{t+1\Big|t}+{e}_t $$

The flow chart of wind power forecasting procedure is shown in Fig. 2.

Fig. 2
figure 2

Flow chart of wind power forecasting procedure

Data description and pre-processing

The proposed model is verified using the measured data in a wind farm located in the northern China for a period of about 15 months from 24 February 2014 to 31 May 2015. The 41072 non-consecutive data points before 02 March 2015 are used for training the ELM models whereas the consecutive time series of 66 days from 02 March 2015 to 31 May 2015 is used to verify the models performances. The total installed capacity of the wind farm is 50 MW. The measured data are used for both training the ELM model and verifying the model. The time scale of collecting data is 15 min. The scatter of wind power versus wind speed of the wind farm is plotted in Fig. 3.

Fig. 3
figure 3

Scatter plot of wind power versus wind speed

The characteristic of wind speed is shown as the frequency histogram in Fig. 4. It can be well fitted using a Webull distribution.

Fig. 4
figure 4

Characteristics of wind speed data

The mechanical power extracted from wind by a wind turbine is a function of the wind speed, blade pitch angle, and shaft speed. The algebraic equation shown below characterizes the power extracted [23].

$$ {P}_m=\frac{1}{2}\rho {v}_w^3\pi {r}^2{C}_p\left(\lambda \right) $$

where P m is the power extracted from the wind, in watts; ρ the air density, in kg/m3; r the radius swept by the rotor blades, in m; v w the wind speed, in m/s; C p the performance coefficient; λ the tip ratio, i.e., the ratio of turbine blade speed to that of the wind

$$ \lambda =\frac{\omega_tr}{v_w} $$

where ω is mechanical rotor speed in radians/s.

From Eq. (15) it is noted that the air density, the wind speed are not quantities that can be controlled. That means a wind turbine will yield different wind power output even at the same wind speed. A wind farm comprises tens or even hundreds of turbines, which making the relationship between the farm output and speed much weaker than that of a wind turbine. Even so, the wind power output depends on wind speed obviously, as shown in Fig. 3. The object of modeling an ELM is to characterize such kind of implicit dependence. However some anomalous data exists in the original datasets, which will have negative influence on the wind power forecasting accuracy. Two kinds of anomalies are supposed to be eliminated before building an ELM model.

  1. 1)

    When the wind speed is very large (e.g. larger than 5 m/s) but the corresponding wind power is close to zero.

  2. 2)

    When the wind speed is close to zero but the wind farm output is very large (e.g. larger than half of the rated capacity of the wind farm).

Moreover, wind speed and power data are normalized by using the following formula

$$ {x}_{normal}=\frac{x- \max (x)}{ \max (x)- \min (x)} $$

where x is the original data, x normal is the normalized data, max(x) is the maximum of original datasets, and min(x) is the minimum of original datasets.

Results and discussion

Model parameterization

The input of the ELM model is wind speed and the output is wind power, hence the number of nodes for both of input layer and output layer is set as one. As for the hidden layer, different values of the hidden nodes number are tested. The criterion for quantifying the performance of wind power forecasting is normalized root mean squared error (NRMSE).

$$ NRMSE=\frac{\sqrt{{\displaystyle \sum_{i=1}^n{\left({p}_i-{\widehat{p}}_i\right)}^2}}}{Cap\cdot \sqrt{n}} $$

NRMSE for day-ahead (24 h-ahead) forecasting by ELM models with different hidden nodes number are shown in Fig. 5.

Fig. 5
figure 5

NRMSE for day-ahead forecasting by ELM models with different hidden nodes number

From Fig. 5 it can be seen that, the value of NRMSE decreases dramatically when the hidden nodes number ranges from 1 to 3. With 3 hidden nodes, the ELM has the best performance of 21.09 % in terms of NRMSE. Adding more nodes will not lead to better results. Thus, the nodes number is finally set to be 3.

Results and analysis

The day-ahead forecasting results in short-term scale before error correction by using ELM and their corresponding measurements dated from 22 April 2015 to 23 April 2015 are shown in Fig. 6. The 15 min-ahead forecasting results in ultra-short-term scale and their corresponding measurements after error correction are shown in Fig. 7.

Fig. 6
figure 6

Short-term forecasting results of ELM and their corresponding measurements

Fig. 7
figure 7

Ultra-short-term forecasting results and their corresponding measurements after error correction

In Fig. 6, the ELM doesn’t perform well due to the strong stochastic feature of wind and the weak relationship between wind farm output and wind speed. Particularly, the ramp events are not accurately forecasted. It can be seen that large errors occur around peak value of measurement curve, where wind power changes drastically in a short time. The overall NRMSE of forecasting results for 66 days is 21.09 %.

The short-term forecasting errors are shown in Fig. 8 and their distribution is shown in Fig. 9. Ultra-short-term forecasting error after error correction is depicted in Fig. 10. As can be seen from Fig. 10, the forecasting error fluctuates drastically with large amplitude. The maximal error reaches up to 50 % of installed capacity of the wind farm. It is noted that from Fig. 8, though most of the errors lie in the interval of [−20 MW, 20 MW], there is still a portion of large errors that cannot be neglected. It is indicated that the spread range of errors distribution is large and positively biased, which reveals the poor forecasting performance.

Fig. 8
figure 8

Short-term forecasting error by using ELM

Fig. 9
figure 9

Short-term forecasting error distribution by using ELM

Fig. 10
figure 10

Ultra-short-term forecasting error after error correction

While in Fig. 7 the ultra-short-term forecasting curve follows closely with the measurement curve. In addition, Figs. 10 and 11 show that most of forecasting errors lie in the interval of [−10 MW, 10 MW]. The distribution is concentrated and almost unbiased. The overall NRMSE is 5.76 %, indicating a good result of the correction method for ultra-short-term forecasting.

Fig. 11
figure 11

Ultra-short-term forecasting error distribution after error correction

In terms of the computational efficiency, time consumed for training and testing ELM models with varying hidden nodes number is depicted in Fig. 12.

Fig. 12
figure 12

Computation time for training and testing ELM models with varying hidden nodes number

It is noted that the computation time is approximately proportional to the hidden nodes number. When the hidden nodes number is 3, the computation time is 1.5377 s, including the time for training model with 41072 data points and forecasting 6336 data points. That indicates a high computational efficiency for wind power forecasting, which can satisfy the practical needs.


In this paper, an extreme learning machine model with error correction with higher efficiency is developed to predict power output of a wind farm in a ultra-short-term time scale. Case study shows that the ultra-short-term wind forecasting accuracy is further improved in terms of normalized root mean squared error (NRMSE).