1 Introduction

In the context of increasing power demand as well as the energy and environmental issues, wind energy has been growing rapidly around the world [14]. However, the intermittent behavior of wind poses a great challenge to increase the wind energy penetration. Thus wind power forecasting technique has been extensively studied in order to reduce the impact of wind intermittency on the power system [3, 4].

Generally, wind power forecasting techniques can be classified into three groups, numeric weather prediction (NWP) methods, statistical methods, and hybrid approaches [3, 5]. NWP models are based on complex mathematical models, thus have advantages in prediction accuracy, especially for longer horizons. However, it is difficult to develop an accurate mathematical model without the knowledge of aerodynamics and atmosphere physics, and the model calculation is time-consuming and requires super computers to get solutions. The statistical methods such as autoregressive moving average (ARMA) and artificial neural network (ANN) provide a way to predict the wind power using only the data, thus they are of special interests for a large number of engineering applications [6]. However, the prediction capability of statistical methods drops as the forecast horizon grows. In addition, support vector regression-based methods [7, 8] and generalized locally weighted group method of data handling (GMDH) [9] were also proposed in recent years. However, the existing forecasting techniques still cannot adequately meet the engineering requirements.

This paper focuses on improving the problem of forecast accuracy drops as time horizon grows, when time series model is used for forecasting. It embraces two aspects: first, a horse racing principle is introduced, which assumed a gambler named A, who is frustrated by persistent horse-racing losses and envious of his friends’ winnings decides to make some positive changes [10]. An easy and efficient way to help him is to allocate his wager in accordance with the luckiest fellows to bet, based on the previous performances of his friends. Boosting theory exactly focuses on the problem how a set of weak capacity forecasts can create a single accurate forecasting result by a weighted vote process. It was first introduced by Michael Kearns and Leslie Valiant [1113]. Boosting method also has many engineering advantages [14, 15], which have been tested empirically by many researchers, for example, it is fast, simple and easy to program. Supposing those gamblers can be replaced by forecasting models, the boosting model can be used to improve forecasting accuracy of existing models. On the other side, the multi-step-ahead (MS) technique was indicated to improve the issue of forecasting accuracy fades rapidly, and it has a wide application in economic time series forecasts [16]. Therefore, at the first step, the MS technique is applied to build a base model using an ARMA model for forecasts. At the second step, furthermore, the base model is used to construct the proposed method by boosting algorithm in Section 3. In this study it is called ARMA based MS base forecasting model (ARMA-MS), and more details for MS approach and its parameter identification can be found in [16].

Based on the above derivation, this paper proposes a methodology of applying boosting method and MS technique to significantly improve the accuracy of time series forecasting model for day-ahead wind power forecasts. In the first part, the advantages of boosting method for wind power forecast are generalized. Then, a novel method is proposed which combining boosting algorithm with ARMA-MS model. The procedure of the proposed modeling method can be summarized as follows: ① combining multi-step approach and ARMA model to build basic forecasting model (weak forecasting capacity); ② using boosting algorithm to combine these weak forecasting models into an accurate assembled model H by a weighted vote process; ③ applying the model H to forecast the wind power output of forecast date τ.

In the second part, based on the results of [10], this paper theoretically proves the existence of error bounds of the proposed forecast method. For instance, one of the results shows that the forecasting error of the proposed algorithm can be bounded by

$$\left( {\frac{2}{T}} \right)\frac{{\sqrt {\varepsilon _{1} (1 - \varepsilon _{1} )} }}{{1 - q_{1} }} + \frac{{\varepsilon _{1} }}{T}$$

where T is the number of iteration times; \(\varepsilon_{1}\) is forecast error of the first round; \(q_{1}\) is an constant defined in Section 3.

Finally, in order to test the proposed method, hourly wind power output data of one year from three operating wind farms in the east coast of Jiangsu Province of China is used for analysis. The results indicate that the proposed algorithm can improve the wind power forecast accuracy with respect to traditional ARMA model and persistence model (PM). Econometric views (EViews) is a statistical package for Windows operating systems. It is used for time-series oriented econometric analysis, general statistical analysis, and wind power forecast. In this paper, EViews (Version 8) is used as the analysis tool to get simulation results.

Structure of this paper is organized as follows: Section 2 reviews the main aspects of boosting techniques. Section 3 proposes a novel forecasting method and analyzes the calculating error bounds. Section 4 shows simulation results to evaluate the performances of the proposed method, and presents discussions. Section 5 highlights final remarks for concluding the paper.

2 Related algorithms

According to [17], boosting algorithm has its roots in a theoretical framework for studying machine learning called probably approximately correct (PAC) learning model [18], regarding learning as a phenomenon of knowledge acquisition in the absence of explicit programming. After that, whether a “weak” learning algorithm which performs just slightly better than random guessing using the PAC learning algorithm can be “boosted” into an arbitrarily accurate “strong” learning algorithm was analyzed in [11]. An efficient boosting algorithm named adaptive boost (AdaBoost) was proposed to solve many of the practical difficulties of earlier boosting algorithms [10]. Genetic programming boost (GPBoost) algorithm was proposed based on genetic programming (GP) [19]. After that, based on the GPBoost method, boosting algorithm using a correlation coefficients (BCC) was proposed to update the weights of calculation and improve accuracy of prediction [14, 20].

2.1 Conventional boosting algorithm

Boosting is traditionally considered as a general technique of combining rules of thumb, or weak classifiers, to form highly accurate combined classifiers. References [10, 13] presented that a class C is existing learnable concepts, and there exists a class of estimated models h such that for all \(n \ge 1\), an actual concept \(c \in C\), for all distributions D on \(X\), and \(0 < \gamma ,\;_{{}} \delta \le 1\), given parameters \(\delta ,_{{}} n,_{{}} \gamma\), and the size s of the target concept c. Using running time polynomial in \(1/\delta\), n, \(1/\gamma\) and s, boosting algorithm outputs an accurate forecast model \(H\) that with probability at least \(1 - \delta\) is \(\gamma\) -close to c under D. As defined the class h required having a prediction error slightly better than 1/2 with respect to the distribution D on which it is trained. The boosting procedure is as follows: ① the learner receives M examples \(\left\{ { (x_{ 1} ,y_{ 1} ),\ldots, (x_{M} ,y_{M} )} \right\}\) chosen according to the distribution D on \(X \times Y\), where \(Y\) is an actual data set of the forecasting target associated with training-example set \(X\); ② on each round t = 1,…,T, the booster devises a distribution \(D_{t}\) over the set of examples, and requests a rule-of-thumb \(h_{t}\) \(\in\) h with low error \(\varepsilon_{t}\) with respect to \(D_{t}\); ③ after T rounds, the booster combines the original weak forecast algorithm into a single strongly accurate forecast algorithm H.

2.2 ARMA model

ARMA model is generally one of the most widely used approaches for forecast. ARMA models can effectively be used to predict the behavior of a time series from past values alone. In [21], ARMA model improves significantly the wind speed forecasts as compared to those obtained with persistence models, for forecasting one hour in advance or even ten hours in advance.

The notation ARMA (p, q) refers to the model with p autoregressive terms and q moving-average terms. On the other hand, by seasonal variations the parameters of the ARMA model cannot simply be accepted as constant. Generally, ARMA models incorporated into prediction requiring three main steps, identification, estimation and diagnostic check.

In the step of model estimation, it is indicated that AR(1) and AR(2) are both appropriate models for wind power forecast [22]. Reference [23] indicated that ARMA(2, 3) is a suitable model. In this paper, tools of the sample autocorrelation function (SACF) and the sample partial autocorrelation function (SPACF) are both used to identify the parameters (p, q) of the ARMA model [6]. In this paper, based on the analysis of the one-year period wind power data, the ARMA(1, 1) model is more favorable than others using Akaike information criterion (AIC), and this simplest model was also used in [24].

2.3 Persistence model

The persistence model is considered as the simplest time series models. However, it surpasses many other models in very short-term prediction. Therefore, the PM is used as a benchmark. The persistence forecast can be written as [21]:

$$\hat{P}\left( {t + k|t} \right) = \frac{1}{{\hat{T}}}\sum\nolimits^{n - 1} _{i = 0} P(t - i\Delta t)$$

where \(\hat{P}\left( {t + k|t} \right)\) is the wind power forecast for time \(t + k\) made at time origin t; k is the prediction horizon; \(\hat{T}\) is the prediction interval length (\(\hat{T} = k\)); \(P(t - i\Delta t)\) is the measured wind power for time t and the previous i time steps within \(\hat{T}\); n is the number of time steps within \(\hat{T}\); \(\Delta t\) is the time step length of the measured time series (\(\hat{T} = n\Delta t\)).

3 Boosting based hybrid method and its analysis

3.1 Structure of hybrid method

For more than half a century, as a well-known time series technique ARMA models have been widely applied in the construction of accurate hybrid models and the engineering forecasts of wind power. Therefore, based on the advantages of boosting algorithm, a hybrid forecasting method that combines boosting algorithm with the ARMA-MS model is proposed to improve the forecast accuracy.

The generation process of the final model \(H\) which is the output model for wind power forecast via T boosting iterations. It shows that forecast model \(H\) is a weighted vote of T weak models \(h\) \(\in\) h where \(\alpha\) is the weight assigned to each. Intuitively, \(\alpha_{t}\) measures the importance that is assigned to \(h_{t}\), and that \(\alpha_{t}\) gets larger as \(\varepsilon_{t}\) gets smaller as shown in (6).

The proposed method includes four steps as shown in Fig. 1.

Fig. 1
figure 1

Logic structure of proposed method for wind power forecast

  • Step 1: Define the forecast date τ and let parameter “training target date” equal τ  1 (measured wind power \(Y_{0}\)), considering the existence of wind speed persistence. Define the parameter T the number of iteration rounds and the training set for each iterative forecast.

  • Step 2: Apply ARMA-MS model \(h_{t}\) to forecast wind power of date \(\tau - 1\) based on the training date \(\tau - (t + 1)\). Calculate the forecasting error \(\varepsilon_{t}\) of model \(h_{t}\) using (4), and the weight value \(\alpha_{t}\) of model \(h_{t}\) using (6). Due to \(h_{t}\) is a week capacity forecasting model, its forecast error \(\varepsilon_{t}\) should be less than 1/2 defined by Freund and Schapire [10]. In other words, each generated forecasting model h with error larger than 1/2 is ignored.

  • Step 3: After T round forecasts, boosting algorithm combine these weak forecasting models into an accurate assembled model H by a weighted vote process as shown in (8).

  • Step 4: Finally, the model \(H(X_{0} )\) is used to forecast the wind power output of the final forecast date τ.

3.2 Forecasting process

The calculation process and pseudo-code of the new method for wind power prediction is shown below: Given \(\left( {X_{ 1} ,Y_{ 0} } \right) ,\cdots ,\left( {X_{T} ,Y_{ 0} } \right)\), where \(Y_{0}\) is the actually measured wind power, \(\left( {X_{t} ,Y_{ 0} } \right) = \left\{ { (x_{ 1}^{t} ,y_{ 1}^{ 0} ) ,\cdots , (x_{i}^{t} ,y_{i}^{ 0} )} \right\}\) where \(t \in \left[ { 1,T} \right]\), \(i \in \left[ { 1,M} \right]\), and M is the number of wind farms.

Initialize: \(\varvec{w}^{ 1} = \left\{ {w_{ 1}^{ 1} , \ldots ,w_{M}^{ 1} } \right\} = \left\{ { 1, \cdots , 1} \right\}.\)

Do for t  =  1, 2,···, T

Set ARMA-MS model as

$$D^{t} = \frac{{w^{t} }}{{\mathop \sum \nolimits_{{i = 1}}^{M} w_{i}^{t} }}$$

Choosing forecasting hypothesis of wind power \(h_{t} :\;X \to R\) with the calculated error.

$$\varepsilon _{t} = \user2{D}^{t} {\varvec{\ell }}^{t} = \frac{{\sum _{{i = 1}}^{M} D^{t} (i)\left| {h_{t} \left( {x_{i} } \right) - y_{i} } \right|}}{{P_{{{\text{Cap}}}} }}$$

where \(h_{t} (x_{i}^{t} ) \ne y_{i}^{0} ,\;\varepsilon_{t} \in \text{(0, 1/2)}\) guaranteed by forecast- accuracy selection, and \(P_{\text{Cap}}\) is the installed capacity.

$${\text{Set}}\;\beta _{t} = \frac{{\varepsilon _{t} }}{{1 - \varepsilon _{t} }}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \beta _{t} \in (0,1)$$
$$\alpha_{t} = \frac{1}{2}{ \ln }\left( {\frac{{1 - \varepsilon_{t} }}{{\varepsilon_{t} }}} \right){\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \alpha_{t} \in [0, + \infty )$$

Set the new weights vector to be

$$w_{i}^{t + 1} = w_{i}^{t} \beta_{t}^{{ 1- l_{i}^{t} }} = w_{i}^{t} \beta_{t}^{{ 1- \left[ {\frac{{\left| {h_{t} \left( {x_{i}^{t} } \right) - y_{i}^{ 0} } \right|}}{{P_{\text{Cap}} }}} \right]}}$$

Output the final forecasting algorithm as

$$H\left( {X_{0} } \right) = \alpha_{0} h_{0} \left( {X_{0} } \right) + \frac{{\sum_{t = 1}^{T} \alpha_{t} h_{t} (X_{0} )}}{{\sum_{t = 1}^{T} \alpha_{t} }}$$

In the first step, it is assumed that the sequence of M training examples \(\left\{ { (x_{ 1}^{ 1} ,\;y_{ 1}^{ 0} ) ,\cdots ,(x_{M}^{ 1} , y_{M}^{ 0} )} \right\}\) is drawn from \(X \times Y\) according to distribution D. The value of \(h(x_{i} )\) is the forecasted result on \(x_{i}\). \(Y_{0} = \left\{ {y_{1}^{0} , \ldots ,y_{M}^{0} } \right\}\) is the actually measured wind power in the target date τ  1 from M wind farms, which is used for model check in the whole training process. Then a boosting process is started to find out the hypothesis \(H\) which is consistent with most of the forecasting sub-models.

After each iteration t, \(h_{t}\) suffers forecasting accuracy loss \(\ell_{i}^{t} \in [0,\;1]\) named loss function in (4), which can be expressed as

$$\ell _{i}^{t} = \frac{{\left| {h_{t} \left( {x_{i} } \right) - y_{0} } \right|}}{{P_{{{\text{Cap}}}}}}$$

The loss suffered from D can be written in the form \(\sum_{i = 1}^{M} D_{i} \ell_{i} = \varvec{D}{\varvec{\ell }}\), and the average loss of the forecasting algorithm \(h_{t}\) with respect to \(D^{t}\) is shown below.

$$\user2{D}^{t} {\varvec{\ell }}^{t}$$

As described in Fig. 1, the objective of the novel method is to find out a final forecasting model \(H\) for the next day, which has a higher close-to the actual and most accurate model c defined in Section 2, which cannot be known in advance.

3.3 Forecasting error bounds analysis

According to [25], a number of bounds on forecasting errors are theoretically proven in this section. For instance, one of the results shows that the forecasting error of the new algorithm can be bounded by (11).

Theorem 1: Suppose the weak learning algorithm \(h \in \varvec{h}\), when called by boosting algorithm, generates hypothetical forecast models with error \(\varepsilon_{1} , \ldots ,\varepsilon_{T}\). Then the error \(\varepsilon = Pr_{i\sim D} [h\left( {x_{i} } \right) \ne y_{i} ]\) of the output of the final hypothesis H is bounded by

$$\varepsilon \le \left( {\frac{{\text{2}}}{T}} \right)\frac{{\sqrt {\varepsilon _{{\text{1}}} \left( {{\text{1}} - \varepsilon _{{\text{1}}} } \right)} }}{{{\text{1}} - q_{{\text{1}}} }} + \frac{{\varepsilon _{{\text{1}}} }}{T}$$

where \(q_{ 1} = \mathop {\hbox{max} }\limits_{t \in [1,T]} 2\sqrt {\varepsilon_{t} \left( { 1- \varepsilon_{t} } \right)}, \quad 0< q_{ 1} < 1.\)

Proof: Use \(w^{t}\) defined in (7), and for simplicity, the initial distribution D is set to be uniform so that \(D(i) = {1 \mathord{\left/ {\vphantom {1 M}} \right. \kern-0pt} M}\). Using a result of the error \(\varepsilon = Pr_{i\sim D} [h(x_{i} \ne y_{i} )]\) for binary forecast problem, which has been proved in [10], we get that

$$\sum \nolimits_{{i = 1}}^{M} w_{i}^{{T + 1}} \le \prod\nolimits _{{t = 1}}^{T} [1 - (1 - \varepsilon _{t} )(1 - \beta _{t} )]$$


$$\sum \nolimits_{{i = 1}}^{M} w_{i}^{{T + 1}} \ge \sum \nolimits_{{i:h_{{T + 1}} \left( {x_{i} } \right) \ne y_{i} }} w_{i}^{{T + 1}} \ge \left(\sum \nolimits_{{i:h_{{T + 1}} \left( {x_{i} } \right) \ne y_{i} }} D(i)\right)\left(\prod\nolimits_{{t = 1}}^{T} \beta _{t} \right)^{{\frac{1}{2}}}$$

where \(\beta_{t}\) is defined in (5).

Clearly, in this paper

$$\varepsilon_{T + 1} = \sum\nolimits_{{i:h_{T + 1} \left( {x_{i} } \right) \ne y_{i} }} \frac{{D\left( i \right)\left| {h_{T + 1} \left( {x_{i} } \right) - y_{i} } \right|}}{{P_{\text{Cap}} }} \le \sum_{{i:h_{T + 1} \left( {x_{i} } \right) \ne y_{i} }} D\left( i \right)$$

since \(\frac{{\left| {h_{T + 1} \left( {x_{i} } \right) - y_{i} } \right|}}{{P_{\text{Cap}} }} \in (0,1]\), combining (13) and (14) we get

$$\sum\nolimits _{{i = 1}}^{M} w_{i}^{{T + 1}} \ge \varepsilon _{{T + 1}} \left(\prod\nolimits_{{t = 1}}^{T} \beta _{t} \right)^{{\frac{1}{2}}}$$

where \(\varepsilon_{T + 1}\) is the error of \(h_{T + 1}\).

Combining (12) and (15) we get

$$\varepsilon _{{T + 1}} \le \prod \nolimits_{{t = 1}}^{T} \frac{{1 - (1 - \varepsilon _{t} )(1 - \beta _{t} )}}{{\sqrt {\beta _{t} } }}$$

According to (5), we get

$$\varepsilon _{{T + 1}} \le \prod \nolimits_{{t = 1}}^{T} 2\sqrt {\varepsilon _{t} (1 - \varepsilon _{t} )} = 2^{T} \prod \nolimits_{{t = 1}}^{T} \sqrt {\varepsilon _{t} (1 - \varepsilon _{t} )}$$

The form of error \(\varepsilon\) is written as

$$\varepsilon = \frac{{\sum _{{t = 1}}^{T} \alpha _{t} \varepsilon _{t} }}{{\sum _{{t = 1}}^{T} \alpha _{t} }} \le \frac{1}{T}\sum \nolimits_{{t = 1}}^{T} \varepsilon _{t}$$

where \(\alpha_{t}\) is a convex function on \(\varepsilon_{t}\) as defined in (6).

According to the definition in (8) and combining (17) and (18), the bound on the error of H in (8) can be written as

$$\begin{gathered} \varepsilon \le \left( {\frac{1}{T}} \right)\sum _{{i = 2}}^{T} \left( {2^{{i - 1}} \prod _{{t = 1}}^{{i - 1}} \sqrt {\varepsilon _{t} \left( {1 - \varepsilon _{t} } \right)} } \right) + \frac{{\varepsilon _{1} }}{T} = \hfill \\ \;\;\;\;\;\left( {\frac{1}{T}} \right)\sum _{{i = 2}}^{T} \left( {\prod _{{t = 1}}^{{i - 1}} 2\sqrt {\varepsilon _{t} \left( {1 - \varepsilon _{t} } \right)} } \right) + \frac{{\varepsilon _{1} }}{T} \le \hfill \\ \;\;\;\;\;\left( {\frac{1}{T}} \right)\sum _{{i = 2}}^{T} \left( {2\sqrt {\varepsilon _{1} \left( {1 - \varepsilon _{1} } \right)} q_{1}^{{i - 2}} } \right) + \frac{{\varepsilon _{1} }}{T} \hfill \\ \end{gathered}$$

where \(q_{1} = \mathop {\max }\limits_{{t \in [1,T]}} \{ 2\sqrt {\varepsilon _{t} \left( {1 - \varepsilon _{t} } \right)} \}\), \(0 < q_{{{1} }} < 1\).

The bound can also be written as

$$\varepsilon \le \left( {\frac{2}{T}} \right)\frac{{\sqrt {{{\varepsilon }}_{1} \left( {1 - {{\varepsilon }}_{1} } \right)} (1 - q_{1}^{{T - 1}} )}}{{1 - q_{1} }} + \frac{{{{\varepsilon }}_{1} }}{T} \le \left( {\frac{2}{T}} \right)\frac{{\sqrt {{{\varepsilon }}_{1} \left( {1 - {{\varepsilon }}_{1} } \right)} }}{{1 - q_{1} }} + \frac{{{{\varepsilon }}_{1} }}{T}$$

Then, the proof is completed.

Through iterative selection the accurate forecast model h is selected by (4), who’s forecast error is less than 1/2. In other words, the estimated forecast models, whose forecast errors are lower than 1/2, are ignored. Only the high accurate models established in iterative learning process are used to the final forecast according to (8). According to the above derivation, the final forecast accuracy is theoretically guaranteed by the bound-control capability of the boosting algorithm.

4 Simulation results

4.1 Simulation data and settings

This study focuses on the forecast accuracy of aggregated wind power in the Nantong-Yancheng regional power grid. It is a region that features well-developed wind power energy and heavily industrial load. Therefore, this study uses hourly real wind power data from this power grid to test proposed method. The three wind farms have a total installed power capacity of 701.3 MW. They are located in the east coast of Jiangsu Province, China: Dong Yuan (100.5 MW), Long Yuan (400.5 MW), Da Feng (200.3 MW). The study data covers a period ranging from May 2012 to August 2013. Data were continuously acquired over this period with the only unavailability occurred for 85 days (from November 9, 2012 to January 20, 2013, and from April 26, 2013 to May 7, 2013) due to continuous faults of data acquisition system. The availability of wind power output data is 80%.

Two important issues are considered in designing and conducting the experiment to test the proposed method: ① how well does a model retain accuracy in its time horizons; ② how robust is the algorithm to the choice of test set. To address these problems two cases with different time horizons are collected for test: the short-term data of one-month period (July 2012) is mainly used for principle description, and the long-term data of fourteen-month period is used for the capability validation and an economic analysis.

Several literatures have proposed various indices to evaluate the efficiency of predictive methods [5]. Specially, mean absolute error (MAE), normalized average absolute energy production error (NMAE) of (21) and root mean-square error (RMSE) were widely accepted metrics used by the statistical community [14]. In this paper, for the purpose of comparing the prediction accuracy of the ARMA model, the persistence model, and the proposed boosting method, the indices of MAE, NMAE and RMSE are all used to study the absolute errors between the measured values and the forecasts.

$$1 0 0\times \frac{{\left| {P_{\text{real}} - P_{\text{forecast}} } \right|}}{{P_{\text{Cap}} }}$$

where \(P_{\text{real}}\) is the measured wind power data; \(P_{\text{forecast}}\) is the forecast result.

4.2 Case 1

A preliminary study has been carried out using EViews (Version 8) software as an analysis tool. According to the validation set measured in July 2012 and the training set of June, Fig. 2 presents the comparison of forecasting results of the ARMA model and the proposed novel method. Parameter T equals 30 (the length of training set of June), and M equals 3 (Dong Yuan, Long Yuan, and Da Feng wind farms).

Fig. 2
figure 2

Comparison of forecasting results

From Fig. 2 it can be clearly observed that the proposed method is advantageous for wind power forecasts due to its capability of quick convergence of forecasting errors as described in the Theorem 1 of Section 3. It explicitly shows that only in very few cases the accuracies of the proposed method are lower than the ones obtained by the normal ARMA model. The test results show that the index NMAE of the proposed method is 8%, the normal ARMA is 9.94%, and the persistence model is 9.09%. The accuracy improvement of the proposed method is 19.52% compared with the normal ARMA model, and 11.99% compared with the PM model, 24 hours in advance. It can be intuitively seen that the proposed forecasting method can effectively improve the forecasting accuracy than that uses only ARMA and PM. Especially, due to the application of MS approach, the proposed method (blue line) tracks the real data better than the benchmark models (green & yellow lines) in the tail of each forecasting series, as shown in Fig. 2.

4.3 Case 2

Table 1 shows that compared with traditional ARMA and PM models, the indices of NMAE, MAE and RMSE of the proposed method have much better performance, based on one-year period of real data collected from three operating wind farms in the east coast of Jiangsu Province, China. The indices have an improvement ranging from 3.21% to 15.73%.

Table 1 Forecast results analysis

5 Conclusions

This paper proposes a novel hybrid method using the boosting algorithm to boost the forecasting capability of ARMA models. Comparing with traditional time series forecasting models’ poor accuracy as forecasting time horizon grows, the proposed method improves this limitation through combining the boosting model and the MS technique. Especially, the forecasting accuracy of the proposed method is theoretically guaranteed by the error bound deduced out in this paper. To validate the accuracy of the proposed method, a real data collected from operating wind farms and covering fourteen months is used in experiments. Simulation results show that from point view of the indices of MAE, NMAE and RMS, the proposed hybrid method is more accurate and more efficient than those of traditional ARMA model and persistence model.

In the future work, the proposed novel method will focus on performing accuracy validation compared with more existing approaches, such as wavelet transform and fuzzy ARTMAP networks combined forecast approach, wavelet-ARIMA forecast, hybrid Kalman filters forecast method and other hybrid algorithms.