1 Introduction

The wind power has been developing very fast in China since 2006. Within the mainland, the total installed capacity of wind power had reached up to 91424 MW by the end of 2013 [1]. Since many wind farms are centrally constructed in the wind-rich zones, some regional power grids in China have already had a relatively high wind power penetration [2].

To improve the operating security and economics of the power grid which is integrated with large-scale wind power, a project named as collaborative optimization of the thermal power, hydro power and wind power in extremely cold areas was carried out by Shandong University and Heilongjiang Electric Power Company. Developing a short-term wind power forecast program is the main and fundamental research objective of this project.

Heilongjiang power grid has 45 integrated wind farms, and the installed capacity reaches up to 3153 MW which accounts for 14.8% of the total installed capacity in that region. Because of the rapid growth of wind power and relatively slow expansion of the transmission networks, transmission congestion happens from time to time in the grid. To consider the transmission constraints during the scheduling process, forecast is required to be performed for each single wind farm as well as the whole region. Moreover, the cross-correlation among the outputs of multiple wind farms is expected to be estimated to make full use of the adjustable capacity of the power grid.

Although great efforts have been made to improve the forecast accuracy, it is still hard to predict the wind power generation precisely. As a result, estimating the uncertainty of the forecast result is believed to be crucial for the operation of power systems [3, 4]. By now, several parametric or non-parametric approaches, e.g., the quantile regression approaches [5], the interval estimation approaches [6, 7], and the probability density forecast approaches [8, 9], have been proposed to achieve this aim. These approaches can provide end-users with forecast uncertainty information in various ways.

Temporal-spatial dependence relation among the outputs of wind farms is the valuable information for the power system operation [10, 11]. In [12], a short-term joint probability density function (JPDF) forecast approach was proposed to include the temporal correlation of forecast errors into the distribution forecast results. The errors were assumed to follow a joint Gaussian distribution and the correlation matrix was estimated by the recursive statistic estimation. Reference [13] introduced an approach to consider the temporal interdependence structure in the quantile regression based probabilistic forecast approach. In the approach, the interdependence structure was summarized by a unique covariance matrix through the conversion of the prediction errors to a multivariate Gaussian random vector. The approaches mentioned in [12, 13] are instructive. However, the spatial dependence structure is ignored in the approaches.

In this paper, a novel multi-dimensional scenario forecast approach which can capture the dynamic temporal-spatial interdependence relation among the outputs of multiple wind farms is proposed. The advantages of the proposed approach are as follows.

  1. 1)

    The temporal-spatial dependence relation of the forecast errors is included into the probabilistic forecast result, and the approach can provide more useful information for the system operation.

  2. 2)

    By using the kernel based sparse learning approaches and the error correction strategy, the accuracy of the spot and probabilistic forecast results is guaranteed.

  3. 3)

    The dependence structure of the forecast errors is well represented by the Gaussian copula, and it is not necessary to make any assumption on the distributions of the errors.

  4. 4)

    The multi-dimensional scenario is generated with respect to the error distributions and the copula function.

2 Overviews of the proposed approach

The work is carried out on three wind farms located in the same region. The relative location of the wind farms is shown in Fig. 1.

Fig. 1
figure 1

Relative location of wind farm A, B and C

Wind speed and direction of the forecast target period are provided by the commercial numerical weather prediction (NWP) service, and the wind power generation data are collected from the supervisory control and data acquisition (SCADA) system. The time span of the data is from July 1st 2009 to December 31th 2010. In the forecast models, the wind speed and wind power generation are normalized according to the maximum wind speed and the installed capacity respectively. And the wind direction is presented by its sinusoidal and cosinusoidal values.

The samples are divided into a training set, a test set and a validation set. The training data set is used to train the support vector machine (SVM) models for the spot forecast. The test data set is used to produce SVM forecast error samples which are applied to train the sparse Bayesian learning (SBL) models and estimate the parameters of the copula-based dynamic conditional correlation matrix regression (DCCMR) model. The validation data set is used to evaluate the performance of the proposed approach.

In the proposed approach, the wind power generation of each wind farm is treated as a random variable. The forecast result is the possible trajectories of the outputs of the wind farms which are referred to as the multi-dimensional scenarios. Assuming L wind farms and T look-ahead hours are considered, the dimension of one multi-dimensional scenario is K = LT. The framework of the proposed approach is shown in Fig. 2.

Fig. 2
figure 2

Framework of the proposed approach

The approach includes two main parts, i.e., the training part and the forecast part. In the training part, K SVMs are trained using the data of the training set. Then, a virtual spot forecast is performed on the test data set and the forecast error samples are collected. After that, K SBL models are trained using the SVM forecast results and the corresponding NWP data. In parallel, the parameters of the copula-based DCCMR model are estimated. The outputs of the training part are K SVM models for the spot forecast, K SBL models for the error distribution forecast, and one copula-based DCCMR model describing the temporal-spatial interdependence structure of the errors.

The forecast part of the approach has three main modules, i.e., the spot forecast module using the SVMs, the probability density function (PDF) forecast module using the SBLs, and the scenario sampling module. The outputs of this part include the expected wind power generation trajectory of each wind farm, the joint cumulative distribution function (JCDF) of the forecast errors, and the corresponding multi- dimensional scenarios.

3 Spot forecast and error distribution

3.1 Spot forecast based on SVM

SVM is an effective statistical machine learning approach which is suitable for the high-order non-linear regression problem [14, 15]. SVM regression model can be expressed by:

$$ y_{{\rm output}} = \sum\limits_{i = 1}^{M} {w_{i} K\left( {{\user2{x}}_{{\rm input}} ,{\user2{x}}_{i} } \right)} + w_{0} + \varepsilon $$
(1)

where y output is the random variable to be predicted; x input is the input vector; x i is the input vector corresponding to the i th training sample; K(·) is the kernel function (Gaussian kernel function); w i is the i th weight coefficient; and ε is the residual term.

SVM is used to predict the output of each wind farm in the forecast target period. The detailed training and forecast procedures have been explained in [15]. Based on the correlation analysis result, wind speed, wind direction and historical generation data are selected as the input data of the model.

3.2 Statistical analyses of forecast errors

Statistical analysis is essential for choosing a reasonable forecast strategy. The statistical properties of the SVM forecast error will be explained in this subsection.

Auto-correlation function (ACF) indicator is employed to test the auto-correlation property of the SVM forecast error, which can be expressed by [16]:

$$ \rho \left( a \right) = \frac{{E\left( {\left( {e_{t} - \mu } \right)\left( {e_{t - a} - \mu } \right)} \right)}}{h} $$
(2)

where ρ(a) is the value of ACF at lag a; e t is the t th sample of the series; μ is the mean value of the series; and h is the variance of the series.

Figure 3 shows the ACF values of the 1-h-ahead forecast error series at lag a, \( a= 1, 2,\cdots, 16\). In this figure, the red dotted line represents the upper confidence limit of the ACF values. It is observed from the figure that the SVM forecast error has significant auto-correlation at the first several lags, which suggests that the historical forecast errors can be applied as the explanatory variables when predicting the error distribution.

Fig. 3
figure 3

ACF values of SVM forecast error series

Cross-correlation function (CCF) [16] is applied to explore the temporal and spatial dependence relation among the forecast errors. CCF between two forecast error series is defined by:

$$ \rho_{i,j} \left( a \right) = \frac{{E\left( {\left( {e_{i,t} - \mu_{i} } \right)\left( {e_{j,t - a} - \mu_{j} } \right)} \right)}}{{\sqrt {h_{i} } \sqrt {h_{j} } }} $$
(3)

where i and j are the indices of the error series.

Figure 4 describes the CCF values between the 1-h-ahead forecast error series and the forecast error series from 2-h-ahead to 48-h-ahead. In this figure, the red dotted lines represent the upper and lower confidence limits of the CCF values. It can be seen from the figure that the dependence relation is strong between the 1-h-ahead and 2-h-ahead forecast error series, and the CCF value decreases rapidly when the lag increases.

Fig. 4
figure 4

CCF values between 1-h-ahead forecast error series and error series from 2-h-ahead to 48-h-ahead

The CCF values between the error series corresponding to different wind farms are depicted in Fig. 5 to test the temporal-spatial dependence relation of the forecast errors. In the figure, A1-B stands for the CCF values between the 1-h-ahead forecast error series of wind farm A and the error series from 1-h-ahead to 48-h-ahead of wind farm B. So do A1-C and B1-C. The relatively large CCF values verify the existence of the temporal and spatial dependence relation among the forecast errors.

Fig. 5
figure 5

CCF values between 1-h-ahead forecast error series and all the forecast error series of another wind farm

The PDF of the spot forecast error is predicted by SBL in this paper. SBL is a parametric forecast approach which assumes that the wind power generation forecast error at each moment follows a Gaussian distribution. Sometimes this assumption is criticized because the usual statistical distribution of the forecast error is non-Gaussian [17]. Taking a recorded 1-h-ahead forecast error series shown in Fig. 6a as an example, the sharp peak of its statistical distribution distinguishes the error variable from a Gaussian random variable, as shown in Fig. 6b.

Fig. 6
figure 6

Comparison between estimated statistical PDF and forecasted PDF

The criticism seems reasonable. However, the statistical distribution should not be identified as the distribution at each moment considering the non-stationary feature of the error series [8]. To confirm the validity of this argument, Fig. 6c shows the forecasted PDFs corresponding to the error samples described in Fig. 6a using SBL. In the figure the parameters of the forecasted Gaussian distributions are time-varying, which reflects the non-stationary nature of the error series. The mixture distribution [18] of the Gaussian distributions, which represents the realizations of all the Gaussian variables as one random variable, is calculated according to (4) and is shown in Fig. 6d. By comparison, the mixture distribution is very similar to the statistical distribution, which illustrates that SBL is able to capture the non-Gaussian statistical feature of the SVM forecast error even it assumes that the error at each moment follows a Gaussian distribution.

$$ \tilde{f}(x) = \sum\limits_{i = 1}^{N} {\frac{1}{N}f_{i} (x)} $$
(4)

where \( \tilde{f}(x) \) is the PDF of the mixture distribution; f i (x) is the PDF of the i th Gaussian distribution; N is the number of the Gaussian PDFs. In Fig. 6, N = 100.

Additionally, the cross-correlation between the spot forecast error and the corresponding wind speed is tested. According to the test result, the cross-correlation is significant. Therefore, the wind speed data provided by NWP should be incorporated into the input data of the SBL model.

3.3 PDF forecast based on SBL

The distribution of the spot forecast error is estimated by SBL in this paper. SBL is a kernel-based sparse learning model which has significant generalization capability. The parameters of the SBL model are estimated by the maximal posteriori probability estimation according to the Bayesian inference [19, 20]. SBL can provide reliable PDF forecast result, which has been fully verified in [8] and [12].

Thorough descriptions of the SBL model and the corresponding forecast procedure have been given in [8]. The historical forecast data, wind speed and wind direction are selected as the input data of the SBL model in this paper. The outputs of the model are composed of the expectation and variance of the spot forecast error.

4 Multi-dimensional scenario forecast

Multi-dimensional scenarios can be generated from the JCDF of the random variables. To avoid making assumptions on the type of the joint distribution, copula- based DCCMR is applied here to estimate the JCDF of the spot forecast errors.

4.1 Copula-based DCCMR for modeling the time-varying temporal-spatial dependence structure

1) Basic concepts of the copula function

Copula function is a bridge connecting the marginal and joint distributions of the random variables [21]. Multi-dimensional copula function can be expressed as:

$$ F\left( {e_{1} ,e_{2} , \cdots ,e_{K} } \right) = C\left( {F_{1} \left( {e_{1} } \right),F_{2} \left( {e_{2} } \right), \cdots ,F_{K} \left( {e_{K} } \right)} \right) $$
(5)

where e k is the k th random variable; F k (·) is the cumulative distribution function (CDF) of the k th random variable; F(·) is the JCDF of the random variables; K is the number of the random variables; and C(·) is the copula function.

According to Sklar’s theorem [22], if all the CDFs are continuous, the copula function C(·) is unique. Therefore, the dependence structure of the random variables can be uniquely represented by the corresponding copula function.

2) Selection of the copula function

Many categories of copulas, e.g., Gaussian copulas, Archimedean copulas and extreme-value copulas can be used to model the dependence structure of random variables according to the statistical properties of the variables. Scatter plot, which is able to reveal the relationship between two random variables, is applied here for the copula selection.

Figure 7 provides the scatter plot analysis result corresponding to the wind power generation spot forecast errors. Figure 7a shows the scatter plot between the 10-h-ahead and 11-h-ahead spot forecast error series of wind farm A. The CDFs of the two series are shown in Fig. 7b. And the scatter plot between the transformed error series which are obtained according to the probability integral transform rules [23] is shown in Fig. 7c. In the figure, all the scatter plots have an obvious symmetrical dependence structure, which implies that the dependence relation among the errors can be modeled by the Gaussian copula appropriately [24].

Fig. 7
figure 7

Analysis of the dependence structure between the random variables using the scatter plot

Therefore, a K-dimensional Gaussian copula is selected to model the dependence structure of the spot forecast errors. Gaussian copula has an explicit formula and its computational complexity is moderate. The K-dimensional Gaussian copula can be defined by [24]:

$$ \begin{aligned} F\left( {e_{1} ,e_{2} , \cdots ,e_{K} ;{\user2{R}}} \right) &= C\left( {F_{1} \left( {e_{1} } \right),F_{2} \left( {e_{2} } \right), \cdots ,F_{K} \left( {e_{K} } \right);{\user2{R}}} \right) \\&= \, \varPhi_{\user2{R}} \left( {\varPhi^{ - 1} \left( {F_{1} \left( {e_{1} } \right)} \right),\varPhi^{ - 1} \left( {F_{2} \left( {e_{2} } \right)} \right), \cdots ,\varPhi^{ - 1} \left( {F_{K} \left( {e_{K} } \right)} \right)} \right) \\ \end{aligned} $$
(6)

where Φ −1 is the inverse of the one-dimensional standard Gaussian CDF; Φ −1(F k (e k )) is a random variable following the standard Gaussian distribution; and Φ R stands for a K-dimensional Gaussian JCDF with zero means, unit marginal variances and the covariance matrix/correlation matrix R.

It can be seen from (6) that the JCDF F \( (e_1, e_2, \cdots, e_K; \) R) of the forecast errors can be obtained by estimating the CDFs of the forecast errors and the corresponding correlation matrix R.

In this paper, the CDFs are forecasted according to the process mentioned in Section 3, and the matrix R is estimated using the following copula-based DCCMR model.

3) Copula-based DCCMR model

In (6) the random vector [Φ −1(F 1(e 1)), Φ −1(F 2(e 2)), …, Φ −1(F K (e K ))] follows the K-dimensional Gaussian distribution \( \cal{N} \)(0, R) where the correlation matrix R can be estimated dynamically as follows [25]:

$$ {\user2{R}}_{t} = {\text{diag}}\left( {{\user2{Q}}_{t} } \right)^{{ - \frac{1}{2}}} \;{\user2{Q}}_{t} {\text{diag}}\left( {{\user2{Q}}_{t} } \right)^{{ - \frac{1}{2}}} $$
(7)

where R t is the time-varying correlation matrix; t is the index indicating the time to estimate the correlation matrix; Q t can be expressed by

$$ {\user2{Q}}_{t} = \left( {1 - \sum\limits_{i = 1}^{I} {\alpha_{i} - \sum\limits_{j = 1}^{J} {\beta_{j} } } } \right){\bar{\user2{Q}}} + \sum\limits_{i = 1}^{I} {\alpha_{i} } {\varvec{\upnu}}_{t - i} {\varvec{\upnu}}_{t - i}^{\text{T}} + \sum\limits_{j = 1}^{J} {\beta_{j} } {\user2{Q}}_{t - j} $$
(8)

where α i and β j are the parameters need to be identified; I and J are the orders of the model; v t−i is the realization of the random vector [Φ −1(F 1(e 1)), Φ −1(F 2(e 2)),…, Φ −1(F K (e K ))] at time period ti; Q t−j is the covariance matrix estimated at time period tj; and \( {\bar{\user2{Q}}} \) can be expressed as

$$ {\bar{\user2{Q}}} = \frac{1}{{\varOmega} }\sum\limits_{k = 1}^{\varvec{\varOmega} } {{\varvec{\upnu}}_{k} {\varvec{\upnu}}_{k}^{\text{T}} } $$
(9)

where Ω is the number of the samples of the random vector [Φ −1(F 1(e 1)), Φ −1(F 2(e 2)),…, Φ −1(F K (e K ))].

The parameters α i and β j in the above equations are estimated using the composite maximum likelihood approach, and the detailed estimation process is explained in [25].

4.2 Generate multi-dimensional scenarios

According to the spot forecast result, PDF of the spot forecast error, and the correlation matrix of the errors, V groups of K-dimensional wind power generation scenarios can be generated by taking the following steps.

Step 1: Generate V groups of K-dimensional random samples according to the Gaussian copula described in (6). And the i th group of the samples is represented by [u i,1,1, u i,1,2,…, u i,1,T ,…, u i,L,1, u i, L,2,…, u i, L,T ], in which L is the number of the wind farms and T is the number of the look-ahead periods.

Step 2: Generate V groups of error samples through the inverse transform process [24]. With respect to the forecasted marginal distribution functions, the i th error sample vector can be transformed from the samples obtained in Step 1 as:

$$ \begin{aligned} \varvec{e}_{i} = \left[ {F_{1,1}^{ - 1} \left( {u_{i,1,1} } \right),F_{1,2}^{ - 1} \left( {u_{i,1,2} } \right), \ldots ,F_{1,T}^{ - 1} \left( {u_{i,1,T} } \right), \ldots ,} \right. \hfill \\ \left. {\,F_{L,1}^{ - 1} \left( {u_{i,L,1} } \right),F_{L,2}^{ - 1} \left( {u_{i,L,2} } \right), \ldots ,F_{L,T}^{ - 1} \left( {u_{i,L,T} } \right)} \right] \hfill \\ \end{aligned} $$
(10)

Step 3: Generate V groups of multi-dimensional scenarios of wind power generation. And the i th scenario can be generated by:

$$ {\user2{s}}_{i} = {\varvec{\eta}} + {\user2{e}}_{i} $$
(11)

where \( {\varvec{\eta}} = \left[ {\hat{p}_{1,1} ,\hat{p}_{1,2} , \cdots ,\hat{p}_{1,T} , \cdots ,\hat{p}_{L,1} ,\hat{p}_{L,2} , \cdots ,\hat{p}_{L,T} } \right] \) is the generation predicted by the spot forecast module.

5 Test results and discussions

5.1 Performance evaluation indicators

The following indicators are applied to evaluate the performance of the proposed approach.

1) Indicator for the expectation forecast result

Normalized mean absolute error (NMAE) [8] is employed here to evaluate the accuracy of the forecasted expectation of wind power generation. The indicator can be expressed by:

$$ N_{\text{MAE}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {e_{i} } \right|} $$
(12)

where e i is the i th forecast error sample and N is the number of the samples.

2) Indicators for the distribution forecast result

Indicators including the distortion rate (DR), marginal calibration, sharpness, and the continuous ranked probability score (CRPS) are applied to evaluate the distribution forecast performance of the proposed approach.

DR [8] of a forecasted PDF is defined by:

$$ D_{\rm {R}} = \frac{1}{2N}\sum\limits_{i = 1}^{H} {\left| {N_{{i,\text{a} }} - N_{{i,\text{f} }} } \right|} \times 100\,\% $$
(13)

where N i,a is the actual times that the wind power generation sample falls into the i th probability interval; N i,f is the expected times that the wind power generation sample should fall into the i th probability interval according to the forecasted PDF; and H is the number of the probability intervals.

Marginal calibration is an indicator concerning about the equality of the observed CDF and the forecasted CDF [26]. To calculate the indicator, the observed CDF is represented by the average value of the indicator functions:

$$ \bar{G}_{N} \left( p \right) = \frac{1}{N}\sum\limits_{{{\text{i = }}1}}^{N} {1\left\{ {p_{i} \le p} \right\}} $$
(14)

where 1{·} is a {0, 1} indicator function which takes value 1 when the condition is satisfied; p is the normalized wind power generation; and p i is the i th sample of the wind power generation.

The corresponding forecasted CDF is represented by the average forecasted CDF:

$$ \bar{F}_{N} \left( p \right) = \frac{1}{N}\sum\limits_{i = 1}^{N} {F_{i} } \left( p \right) $$
(15)

where F i (·) is the forecasted CDF corresponding to the i th wind power generation sample.

The marginal calibration, which is a function of p, measures the difference between the forecasted CDF and the observed CDF:

$$ M_{\text{C} } =\bar{F}_{N} \left( p \right) - \bar{G}_{N} \left( p \right) $$
(16)

Sharpness is another important performance indicator for evaluating the forecasted PDF. Obviously, the sharper the forecasted distribution is, the better the probabilistic forecast approach will be, since a sharper distribution means less volatility of the forecast result. In this paper, the sharpness of the forecasted PDF is assessed by the coverage of the central probability intervals.

Moreover, in order to measure the overall performance of the probabilistic forecast approach, the CRPS, which can address the calibration and sharpness simultaneously [26], is calculated by:

$$ C_{{\text{RPS}}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left[ {\int_{0}^{1} {\left( {F_{i} \left( p \right) - 1\left\{ {p_{i} \le p} \right\}} \right)^{2} \text{d} p} } \right]} $$
(17)

3) Quality evaluation of multi-dimensional scenarios

Energy score (ES), which is a multivariate verification tool for the forecasted scenarios, is applied to evaluate the quality of the generated multi-dimensional scenarios [27]. The indicator is a negatively-oriented score. The lower the indicator is, the better the forecast result will be. ES is defined by:

$$ E_{\text{s}} = \frac{1}{V}\sum\limits_{i = 1}^{V} {\left\| {{\user2{p}} - {\user2{s}}_{i} } \right\|_{2} } - \frac{1}{{2V^{2} }}\sum\limits_{i = 1}^{V} {\sum\limits_{j = 1}^{V} {\left\| {{\user2{s}}_{i} - {\user2{s}}_{j} } \right\|_{2} } } $$
(18)

where \( {\left\| {\cdot} \right\|_{2} } \) is the Euclidean norm; s i and s j are the predicted scenarios; V is the number of the scenarios; and p is the measured wind power generation series.

5.2 Performance evaluation of the proposed approach

Data collected from the three adjacent wind farms are used to illustrate the effectiveness of the proposed approach. 5000 times forecast tests are implemented on the wind farms and each test will forecast the PDFs of wind power generation for the further 48 hours. An example of the PDF forecast result is shown in Fig. 8, where the black solid line with circles and the asterisked red line stand for the forecasted wind power generation curve and the actual wind power generation curve respectively. The central 0.65 and 0.95 confidence intervals are represented in the figure by two different colors.

Fig. 8
figure 8

An example of PDF forecast result

In Fig. 8, most of the actual wind power generation samples fall into the 0.65 confidence interval and very few samples fall outside the 0.95 confidence interval. The result indicates that the forecasted wind power generation distribution can reflect the real distribution appropriately.

The persistence (PER) model [12], the common SVM model and the linear quantile regression model are selected as the competitive models to evaluate the expectation forecast accuracy of the proposed approach.

Table 1 shows the average NMAE of 48 look-ahead time periods. It can be seen from the table that the SVM model has a remarkable superiority to the PER model, and the accuracy of the SVM forecast result is improved significantly by using the error correction strategy proposed in this paper. Also, the proposed approach has better performance than the linear quantile regression model.

Table 1 Average NMAE of 48 look-ahead time periods

Figure 9 depicts the NMAE values corresponding to the forecast results of wind farm A for the 48 look-ahead time periods. It can be seen from the figure that the proposed approach is much better than the other three benchmark approaches on the expectation forecast accuracy. Similar conclusions can be found from the test results of the other two wind farms.

Fig. 9
figure 9

NMAE curves of wind farm A

Empirical distribution estimation is a popular non-parametric distribution estimation approach, which estimates the probability distribution of a random variable by analyzing its historical realizations [28]. The empirical approach and the linear quantile regression approach are applied here as benchmarks to evaluate the probabilistic forecast performance of the proposed approach.

To calculate the DR indicator of the PDF forecast result, six probability intervals have been specified according to the variance of the Gaussian distribution σ. The probabilities corresponding to the intervals are given in Table 2. And the number of the theoretical falling points for each interval in the 5000 times forecast tests is also presented in the table.

Table 2 Information of the probability intervals

Table 3 summaries the average DR values of 48 look-ahead time periods according to the forecast results. It can be seen in the table that the proposed approach has lower average DR values for all the three wind farms than the other two approaches.

Table 3 Average DR of 48 look-ahead time periods

The marginal calibration curves corresponding to the 6-h-ahead distribution forecast results are shown in Fig. 10. In the figure the proposed approach has relatively lower marginal calibration values, which means the CDF forecasted by the proposed approach is much closer to the real CDF.

Fig. 10
figure 10

Marginal calibration curves of 6-h-ahead forecast results

The coverage of the 50% and 90% central probability intervals corresponding to the results of the two approaches is described in Fig. 11. It can be seen from the figure that the coverage of the proposed approach is almost always smaller than that of the other two approaches, suggesting that the proposed approach can provide less volatile forecast results.

Fig. 11
figure 11

Coverage of 50% and 90% central probability intervals

The CRPS values of the three approaches are depicted in Fig. 12. This figure shows that the proposed approach has lower CRPS values in almost all the periods, meaning that the proposed approach has better overall distribution forecast performance.

Fig. 12
figure 12

CRPS values corresponding to two probabilistic forecast approaches for three wind farms

A set of 50 multi-dimensional scenarios corresponding to the predicted PDF and the estimated Gaussian copula is depicted in Fig. 13. In the figure, it can be seen that the actual wind power generation curve is well covered by the scenarios, which means that the scenarios can reflect the real wind power generation properly.

Fig. 13
figure 13

Generated multi-dimensional scenarios

At the same time, a set of 2000 scenarios is generated for the quantitative evaluation. Table 4 summarizes the ES values corresponding to the generated 2000 scenarios. According to the table, the ES indicator is lower when the correlation information is included in the forecast result. The test result indicates that the temporal-spatial correlation information has positive effects on improving the quality of the forecasted scenarios.

Table 4 ES values corresponding to the forecasted scenarios

6 Conclusions

A multi-dimensional scenario forecast approach is proposed in this paper. In the proposed approach, SVM is used to perform the spot forecast of wind power generation. The expectation and variance of the spot forecast error are estimated by SBL. Then the SVM forecast result is corrected using the estimated error expectation. The dependence structure of the forecast errors is reflected by the Gaussian copula, which is estimated using the copula-based DCCMR model. Therefore, the multi-dimensional scenarios of wind power generation are produced with respect to the spot forecast result, PDF of the spot forecast error, and the Gaussian copula. The proposed approach is tested on three adjacent wind farms, and the test results illustrate the effectiveness of the approach.