1 Introduction

A power system is a complex dynamic system. The power grid must maintain a balance between power generation, transmission, and usage. The output power of a wind farm is volatile and intermittent; further, large-scale access to wind power makes it difficult to establish effective generation plans. Accurately predicting the output power of a wind power plant can relieve the peak load and frequency modulation pressure on the power system and make wind power safer and easier to utilize [1].

Existing wind power prediction methods include the continuous [2,3,4,5,6,7], time series [8,9,10], Kalman filter [11, 12], decomposition [13], neural network [14,15,16,17,18,19,20], and combination prediction methods [21, 22]. The continuous method is a relatively basic prediction method wherein measured values of wind power at the latest point are directly applied as prediction values of the next time point [2,3,4,5]. This method is simple and suitable for prediction within 3-6 hour time periods, but the prediction accuracy is poor over lengthier amounts of time. The time series method is a very effective time-domain analysis method based on dynamic data parameters. The unsteadiness and nonlinearity of wind power necessitates differential processing which will lead to poor accuracy of low-order model predictions, and high-order prediction model parameters are not easy to estimate [23, 24].

The Kalman filter method involves constructing features and state equations based on the statistical prediction of wind power noise characteristics, but is difficult to apply because of the inherent difficulty of estimating noise [11, 12]. In the decomposition method, signal processing is applied to obtain regular “subsequences” which are used to mitigate the impact of the randomness and volatility of the original wind power information [25, 26]. The neural network is suitable for short-term wind power series fitting due to its strong nonlinear fitting ability, which can help realize short-term wind power prediction [17, 19]. The several simple models described above can also be combined via certain strategies to exploit their advantages simultaneously. This is the so-called combined prediction method. Much research has showed that the combined prediction of multiple models can effectively avoid the influence of larger error points in single prediction points to improve overall prediction accuracy [27].

Decomposition prediction models based on empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD) have grown popular in recent years [26, 28]. However, the EMD and EEMD suffer “endpoint effects” during the decomposition process where divergence phenomena occur at the edge of the data. During the iteration process, the endpoint effect causes “pollution” in the data. This gradually moves inwards. As the iterations progress, the data sequence may grow severely distorted with modal aliasing and spurious components [29]. The variational mode decomposition (VMD) method, which is completely theoretically distinct from EMD and EEMD, has been successfully applied to similar fields such as fault diagnosis research [30,31,32] and power quality feature selection [33].

In this study, we established a multi frequency combined prediction model based on the idea of decomposition—multi model prediction—integration for wind power prediction. We used a wind farm in Northern Shaanxi as the research object. We first decomposed its original wind power sequence into modal components of different frequencies (according to different intrinsic mode functions (IMFs)) by VMD, then classified the IMFs per their fluctuations into high, intermediate, and low frequency sequences. We used a back propagation neural network (BPNN), autoregressive moving average (ARMA) model, and least squares support vector machine (LS-SVM) to predict the high, intermediate, and low frequency modal components, respectively. Finally, we applied the BPNN to integrate the prediction results into an overall wind power prediction value. Comparing with the real data from Shaanxi Province, the proposed method outperformed both the single model and traditional decomposition prediction model.

2 Methodologies

2.1 VMD method

2.1.1 VMD principle

The VMD method is a relatively new variable scale signal processing method [34] which can decompose a complex signal into the K amplitude modulation and frequency modulation (AM-FM) component signal of a preset scale. K can be set in advance; when chosen properly, K effectively suppresses mode aliasing. Wiener filtering is applied for denoising. This has excellent performance. The estimated K center angle frequency \( \omega_{k} \) is obtained by setting the limited bandwidth parameter \( \alpha \) by the center angle frequency initialization method. Each mode function \( u_{k} \) is then obtained according to the different central angular frequency \( \omega_{k} \), and each mode function is a single component AM-FM function [35].

The band-limited intrinsic mode function (BIMF) and the Carson’s principle of the BIMF bandwidth estimation are defined as follows:

  1. 1)

    BIMF is:

    $$ u_{k} \left( t \right) = A_{k} \left( t \right)\cos \left[ {\varphi_{k} \left( t \right)} \right] $$
    (1)

    where \( \varphi_{k} \left( t \right) \) is non-monotonic decreasing; envelope \( A_{k} \left( t \right) \ge 0 \); \( A_{k} \left( t \right) \) and instantaneous angular frequency do not change faster than \( \varphi_{k} \left( t \right) \).

  2. 2)

    The BIMF bandwidth estimation under Carson’s principle is:

    $$ BW_{{AM{\text{-}}FM}} = 2\left( {\Delta f + f_{FM} + f_{AM} } \right) $$
    (2)

    where \( \Delta f \) is the instantaneous frequency; \( f_{FM} \) is the instantaneous frequency deviation rate; and fAM is the maximum frequency of the envelope \( A_{k} \left( t \right) \).

When acquiring the BIMF components by the VMD method, the decomposition of the signal is used to solve the variational model while signal decomposition is performed by finding the optimal solution to the constraint variational model. The central frequency and bandwidth of each BIMF component are constantly alternately updated throughout this process. Finally, the frequency band of the signal is decomposed adaptively and K (the number of preset scales) narrow-band BIMF components are obtained. The following framework is used to estimate the frequency bandwidth of the IMF components.

  1. 1)

    The marginal spectrum is obtained by Hilbert transformation for each modal function \( u_{k} \);

  2. 2)

    The spectrum of the modal function is shifted to the respective estimated center frequencies via exponential correction;

  3. 3)

    The bandwidth of each modal function is obtained by Gaussian smoothing [35].

The objective function of the variational constraint problems is:

$$ \mathop {\hbox{min} }\limits_{{\left\{ {u_{k} } \right\},\left\{ {\omega_{k} } \right\}}} \left\{ {\sum\limits_{k = 1}^{K} {||} \partial_{t} \left\{ {\left[ {\delta \left( t \right) + \frac{\text{j}}{{{\uppi}t}}} \right] * u_{k} \left( t \right)} \right\}{\text{e}}^{{ - {\text{j}}\omega_{k} t}} ||_{2}^{2} } \right\} $$
(3)

where \( u_{k} = \left\{ {u_{1} ,u_{2} , \ldots ,u_{k} } \right\} \) is the modal function set; \( \omega_{k} = \left\{ {\omega_{1} ,\omega_{2} , \ldots ,\omega_{k} } \right\} \) is the central frequency set; \( \partial_{t} \) is the partial derivative of the function for time t; \( \delta \left( t \right) \) is the unit pulse function; j is the imaginary unit; \( * \) indicates the convolution.

2.1.2 VMD parameter determination

  1. 1)

    Modal number

    The parameter K (number of modalities) should be determined before the VMD is used to decompose. If the K value is too small, multiple components of the signal in a modality may appear simultaneously, or certain components become unpredictable. Conversely, if K is too large, the same component will appear in multiple modes and the mode center frequency obtained from the iterations will overlap [34]. To remedy this, we adopt the method of [36] to determine the number of modal K. The steps are as follows:

    1. a)

      Estimate the initial value of the modal number K through the signal spectrogram;

    2. b)

      Judge whether the center frequency of each mode is overlapped when the mode number is K;

    3. c)

      If the center frequency is overlapped, the number of modes is reduced to conduct VMD decomposition until there is no overlap of center frequency and output K;

    4. d)

      If the center frequency is not overlapped, the modal number is added to conduct VMD decomposition until the center frequency is overlapped and K−1 is output.

  2. 2)

    Penalty factor

    The penalty factor changes the constrained variational problem into a non-constrained variational problem. Taking into consideration avoiding modal aliasing and ensuring a certain rate of convergence, the penalty factor of standard VMD is 2000. This has strong adaptability [36].

2.2 Prediction methods for each frequency component

The BPNN [37], ARMA model [38] and LS-SVM [39] are very mature. In addition, the three methods are widely used in the study of wind power prediction [10, 15, 40]. Therefore, we use these three methods to predict the high, intermediate and low frequency components respectively.

3 Proposed prediction model

In the proposed model, the VMD is first applied to perform modal decomposition of the input wind power and divide modal components into high frequency, intermediate frequency, and low frequency modal sequences. The BPNN is used to predict high frequency components. The ARMA is then used to predict the intermediate frequency while LS-SVM predicts low frequency components; Finally, we used the BPNN to integrate the predicted components. A flow chart of the proposed model is shown in Fig. 1.

Fig. 1
figure 1

Multi-frequency combination prediction method

Due to the uncertainty of wind speed, if the wind speed does not satisfy the cut-in characteristics, the wind power is zero. We use MAE, MSE, and RMSE to evaluate the multi-frequency combined prediction performance of the proposed method without using the evaluation index which has actual value in the denominator. The formulas for these three indices are as follows:

$$ MAE = \frac{1}{S}\sum\limits_{s = 1}^{S} {\left| {Y_{s} - F_{s} } \right|} $$
(4)
$$ MSE = \frac{1}{S}\sum\limits_{s = 1}^{S} {\left( {Y_{s} - F_{s} } \right)}^{2} $$
(5)
$$ RMSE = \sqrt {\frac{1}{S}\sum\limits_{s = 1}^{S} {\left( {Y_{s} - F_{s} } \right)^{2} } } $$
(6)

4 Example validation

We conducted a case study on wind power prediction to validate the proposed model. There are 144 wind power points per day in the given wind farm; the data is massive and the changes are complex. We selected 13248 wind power points from June 1, 2009, to August 31, 2009 as research samples, among which 11952 wind power points from June 1, 2009, to August 23, 2009 were used as sample data for the fitting and the selection of parameters. We then used the selected model to predict 1152 wind power points from August 24, 2009, to August 31, 2009.

4.1 Decomposition of raw wind power data

4.1.1 Modality quantity determination

According to the decomposition principle of VMD stated in Section 2.1, we first determined the number of modalities by studying the sample. Figure 2 shows the spectrum of the sample after Fast Fourier Transform (FFT). The full spectrum map is easily observable due to the large amount of data, but is indeed symmetrical, so we were able to analyze only half of it.

Fig. 2
figure 2

Sample spectrum diagram

Figure 2 has three major band components. The initial value of the modal number was taken as 6 because of the symmetrical characteristic of the spectrum map. When K = 6 and K = 7 are used to decompose the wind power data separately, the iteration curves of the modal center frequencies of the two different K values are as shown in Fig. 3.

Fig. 3
figure 3

Center frequency iteration curve

Through the comparison we can see that when K = 7, the ends of the two iterated curves of the label are very close. In other words, central frequency aliasing appears. Therefore, the mode number was finally determined to be 6.

4.1.2 Decomposition of wind power data

The VMD decomposition method we adopted in this study improves the modal aliasing and spurious components that occur when EMD and EEMD are used to decompose. “Modal aliasing” means that the modal function of the specific time scale cannot be separated effectively after the decomposition of the original signal. Modal aliasing makes different modal components appear in the same decomposition result, or decomposes the same modal component into multiple decomposition results. Mathematically, modal aliasing is the coupling of all modal components which fail to meet orthogonality requirements. The direct result of modal aliasing is the appearance of illusive components. An illusive component has no real meaning; it is only an arbitrary calculation. When the number of modes is K = 6, the original wind power is decomposed and its modal decomposition and frequency spectrum are as shown in Fig. 4.

Fig. 4
figure 4

VMD decomposition and spectrum diagram

Figure 4 shows that the spectrum distribution is such that the modal components are not coupled with each other and satisfy orthogonality. That is, there is no modal aliasing. Therefore, the illusive component has also been greatly improved. The decomposed modes in this figure can be split into three categories: the first four cycles of short modal components IMF1, IMF2, IMF3, and IMF4 are high frequency data, and the longest IMF6 is low frequency data; the remaining IMF5 is intermediate frequency data.

4.2 Prediction of VMD components

4.2.1 Prediction of high frequency components

We used the BPNN to predict high frequency components after training via three-input and one-output model. In the three-input and one-output model, the number of input layer nodes is 3 and the number of output layer nodes is 1. We set the number of iterations to 1000, the learning rate to 0.1, and the expected error to 0.0004. The number of hidden layer nodes of high-frequency components are as shown in Table 1.

Table 1 Number of hidden layer nodes of high frequency components

Based on the above analysis, we then predicted IMF1-IMF4 components with the well-trained model. Predictions of IMF1-IMF4 are shown in Fig. 5.

Fig. 5
figure 5

Prediction of high frequency component

From the above figures, we can see that the error of IMF1 is largest compared with other components because IMF1 has the strongest volatility. The volatility and error of IMF2-IMF4 is smaller than IMF1. The error of IMF4 is smallest among all the high frequency prediction results. Overall, the BPNN’s strong self-learning and adaptive capabilities make it well-suited to predicting high frequency components with strong volatility and short periods.

4.2.2 Prediction of intermediate frequency components

As mentioned above, we used the ARMA model to predict the intermediate frequency components.

We use the AIC criterion to select the minimum value as the optimal model. The optimal model of IMF5 is ARMA (5,3). We used this ARMA model to predict the intermediate frequency component as shown in Fig. 6a.

Fig. 6
figure 6

Prediction of the intermediate frequency component

In Fig. 6a, ARMA (5,3) is used to predict the intermediate frequency component IMF5 with mild fluctuation and the results show relatively small error, which indicates that ARMA has strong nonlinear wave data learning ability and is suitable for the prediction of intermediate frequency components.

4.2.3 Prediction of low frequency components

As mentioned above, we predict low frequency components by LS-SVM, and the differential evolution algorithm showed a regularization parameter of 98.98 and kernel parameter of 5.492. The predictions are shown in Fig. 6b.

In Fig. 6b, the LS-SVM is used to predict the low-frequency component IMF6 with moderate fluctuations and the result shows small error. That is to say, the LS-SVM has a fast learning speed and good generalization and is suitable for the prediction of low frequency components.

4.2.4 Combination of each prediction components

We used the BPNN to combine the component prediction values described above. We first took each component sample data as input and the actual wind power sample value as output to train the model. We then took the predicted values of each component as inputs to determine the final wind power prediction value. The number of hidden layer nodes of the BPNN is 6. The final wind power prediction is shown in Fig. 7.

Fig. 7
figure 7

Final wind power prediction

4.3 Multi model prediction and comparison

The results of the above-mentioned evaluation indices which we used to ensure a comprehensive comparison are shown in Table 2.

Table 2 Various model prediction results

We found that the error of the single prediction model is larger than the multi-frequent combination prediction model. In the multi-frequency prediction models, the different decomposition methods markedly affect the error: the EEMD is better than EMD, and the VMD based on the different principle is better than the former two. The proposed VMD-based multi-frequency combination prediction model outperformed the other models.

5 Conclusion

This paper proposed a multi-frequency combination prediction model based on VMD. The proposed model decomposes the original wind power under completely different principles from EMD or EEMD and improves modal aliasing and illusive component problems which otherwise would arise in the modal decomposition process. We apply the frequency spectrum of the sample values to preliminarily determine the number of decompositions before conducting VMD, then decompose the original samples around this number. We check for overlap in the center frequency iteration curve to select the appropriate number of decompositions, then generate high frequency, intermediate frequency, and low frequency components accordingly. After that, different methods are used to predict the components with different frequency.

Based on the prediction results of each components, we then combine these prediction values into a BPNN by taking each component sample data as an input, using the actual wind power sample value as an output to train the model, and then substituting into the predicted value of each component to ultimately obtain a final wind power prediction value. Finally, we adopt the MAE, MSE, and RMSE as the evaluation indices to compare the prediction performance of the proposed model, single prediction model (BPNN, ARMA, LS-SVM), and multi-modal prediction model based on EMD and EEMD decomposition. The results show that the proposed model outperformed the others.

During the prediction process, we find that VMD mitigated modal aliasing and illusive component problems very effectively. However, due to the limited value precision of the decomposition number K, it has not been completely eliminated. In addition, the threshold and weight value of the BPNN will be initialized in each training process. In other words, the same model may yield different results. Multiple training models are necessary to obtain a better number of hidden layer nodes. This makes for a cumbersome workload. These problems do result in increased errors in the prediction process of each component which will affect the final prediction results.