1 Introduction

The huge demand for energy in modern society has triggered a rapid development of wind power. By the end of 2016, the cumulative installed capacity of wind power had soared to 486749 MW around the world, 30% of which is contributed by China. Nevertheless, due to lack of peak-shaving of hydroelectric power and the existence of numerous combined heat and power (CHP) plants, a portion of wind power is curtailed in winter in northern China. From the statistical data on wind power generation in China, the average of curtailed wind power is about 15%.

To help dealing with the issue of curtailed wind power, many measures have been proposed, originating from both the supply-side and demand-side. In addition to CHP plants, heat pumps [1,2,3] and electric boilers [4, 5] represent two typical approaches on the supply-side to integrate a high level of wind power with large-scale bulk power grids. Economic comparison between heat pumps and electric boilers in Denmark was evaluated in [6]. This indicated that using the option of heat storage technologies can lower the cost of system operation. Wind power curtailment on the supply-side will require an increase in the initial investment and take up land resources. This leads to the consideration of demand-side accommodation, using such as electric vehicles [7,8,9], distributed energy storage [10,11,12], thermostatically controlled load [13,14,15,16]. These measures on demand-side accommodation have gained general approval due to inexpensive cost, large capacity, and immense potential. In contrast to other thermostatically controlled loads, such as air-conditioners and refrigerators, electric water heaters (EWHs) have some particular features in curtailing wind power: ① EWHs make up a large proportion of the power load in residential appliances and their electric consumption is closely related to the daily load [17]; ② the characteristics of high thermal inertia and large storage capacity make EWHs participate in power regulation continuously, variably and widely; ③ as a pure resistor load, EWH responds to the instruction rapidly and it is fit for demand response (DR) load. A model of EWHs aggregated for wind generation is presented in [18, 19]. This provides a dual advantage for both the grid and participating customers. The dynamic behavior based on a partial differential equation for EWHs is modeled for DR in [17], and it shows a better performance than the conventional method. This study primarily concentrated on aggregated and dynamic modeling for EWHs, while utilization of EWHs to reduce wind power curtailment still lacks systematic discussion. On the basis of the aggregated model, thermostat setpoint control consisting of price information is shown in [18] to balance wind power.

Given the randomness and fluctuation of wind power, previous studies have indicated [20,21,22] that the prediction model matching the time scale of system dispatch is a fundamental issue on large scale wind power penetration into the grid, especially for short-term forecasting (24-48 hours in advance). That is the issue with we are concerned. A novel idea of reducing wind power curtailment by EWHs is proposed based on a dispatching model of economic accommodation, where the accurate prediction of wind power and EWHs load power are of great importance. For wind power or EWHs’ load power, the original data reveals multi-source heterogeneous characteristics, such as historical load, meteorological data (temperature, precipitation, air pressure, relative humidity, wind direction, wind speed), and electricity price. It implies that inherent dispersion, diversity and complexity of input data have to be taken into account in a big data environment for an accurate load forecast, which leads to limitations in computation speed and scale for conventional prediction algorithms. Hence, parallel processing frameworks are introduced into load prediction [23,24,25]. However, the aim of parallel computation as reported in the literature is principally to accelerate calculation speed and improve prediction accuracy. Such work ignores the multi-source heterogeneous characteristics of input data. Recently, multi-kernel (MK) learning has developed to solve the multi-factor problems [26]. An improved MK support vector machine (SVM) is constructed in [27, 28], which yields a more satisfactory accuracy in contrast with single kernel (SK) SVM. A small section of current MK research is based on a simply weighted combination of global kernel function and local kernel function. For SVM, the kernel function must satisfy the Mercer condition and the penalty coefficient is difficult to determine. In response, the relevance vector machine (RVM) appeared [29, 30], in which one of the benefits is that its kernel function does not need to meet the Mercer condition and the limitation of the penalty coefficient is relaxed through a probability model. The choice of the kernel and its parameters is also a key problem for RVMs like SVM. Currently, the SK used in RVM is mainstream [31, 32]. There is a small number of studies on weighted MK-RVM relying on previous experience [33] or an incremental learning approach [34]. After the above analysis, from a perhaps surprising perspective of big data with MapReduce based parallel computation, a new prediction approach integrating MK-RVM and adaptive fruit fly optimization algorithm (AFOA) is presented which takes into account the characteristics of sample distribution. AFOA is used to determine the kernel parameters in MK-RVM adaptively; in this way subjectivity and arbitrariness can be avoided. The correctness and feasibility of the proposed algorithm is established through the actual sampling data from an intelligent residential district.

The novelty in this paper is to supply a new concept of reducing wind power curtailment by storage-type EWHs from the aspect of DR and an economic dispatching model is built which considers the penalty of wind power curtailment and the operating cost of EWHs. Accurate predictions of wind power and EWHs’ load power form the basis of implementing the dispatching model. Therefore, a hybrid MK prediction approach integrating AFOA and multi-kernel relevance vector machine (MKRVM) is presented in consideration of sample distribution of multi-source heterogeneous features determined by the energy entropy method. To tackle the issue of computation speed of the presented prediction approach, a MapReduce model based parallel computation method is introduced into the prediction approach and used to speed up the operation. The structure of the paper is as follows: Sect. 2 describes the hybrid prediction approach. The dispatching model for wind power curtailed with EWHs is established in Sect. 3, and correctness and validity of the algorithm are established in Sect. 4 through an intelligent residential district. Section 5 gives the conclusions.

2 Representation of AFOA-MKRVM prediction approach

2.1 Sample distribution of multi-source heterogeneous features

Let the number of factors used for prediction be M, and these factors’ set is written as \(\left\{ {\varvec{X}_{i} ,i = 1,2, \cdots ,M} \right\}\), \(\varvec{X}_{i} = \left[ {x_{1} ,x_{2} , \cdots ,x_{N} } \right] \in {\mathbf{R}}_{1 \times N}\). The predicted value is \(\varvec{Y}_{i} = \left[ {y_{1} ,y_{2} , \cdots ,y_{T} } \right] \in {\mathbf{R}}_{1 \times T}\). A weighted arithmetic mean based sample distribution method is designed to handle multi-source heterogeneous features of input data.

For each factor, the weighted arithmetic mean value \(\bar{x}\) is calculated through the sampling points and written in (1). Then, the Euclidean distance between \(x_{i}\) and \(\bar{x}\) is calculated and denoted as di, where the maximum Euclidean distance is written as dmax. With the aid of the concept of energy entropy, let energy entropy of the i-th sampling point be \(E_{i} = d_{i}^{2}\). The discriminant function of the distribution characteristic can be defined in (2). A threshold c* is set to identify the characteristic of the sampling data. If \(c \le c^{*}\), the sampling data represents the globality. In this case a global kernel function, such as a polynomial function, should be selected. On the other hand, if the sampling data appears in the locality, the kernel describing the local feature should be chosen, and the Gaussian radial basis function (RBF) is chosen in this context. After comprehensive consideration, c* is set to 0.5.

$$\bar{x} = {{\sum\limits_{i = 1}^{N} {x_{i} f_{i} } } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{N} {x_{i} f_{i} } } {\sum\limits_{i = 1}^{N} {f_{i} } }}} \right. \kern-0pt} {\sum\limits_{i = 1}^{N} {f_{i} } }}$$
(1)
$$c = {{\sum\limits_{i = 1}^{H} {E_{i} } } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{H} {E_{i} } } {\sum\limits_{i = 1}^{N} {E_{i} } }}} \right. \kern-0pt} {\sum\limits_{i = 1}^{N} {E_{i} } }}$$
(2)

where \(f_{i}\) is the frequency of \(x_{i}\) appearing in \(\varvec{X}_{i}\); H is the number of sampling points indicating that the energy entropy is less than \(d_{\hbox{max} }^{2} /4\).

2.2 Brief description of RVM

To overcome the difficulties encountered in SVM, such as the uncertainty of the penalty coefficient and the rigorousness of the Mercer principle of kernel function, an uncertainty analysis and sparse probabilistic model-based RVM prediction approach is proposed in [29] under a Bayesian framework. For a given training sample set \(\varvec{X}_{\varvec{i}}^{\varvec{'}} = \left[ {x_{1}^{'} ,x_{2}^{'} , \cdots ,x_{n}^{'} } \right]\), where n is the number of the sampling points, the output vector is written as below:

$$\varvec{Y}^{'} = f\left( {\varvec{X}^{'} ,\varvec{\omega}} \right) +\varvec{\varepsilon}$$
(3)

where \(\varvec{X}^{'}\) is the input vector; \(\varvec{Y}^{'}\) is the output or target vector; \(\varvec{\omega}\) is the weighted vector, \(\varvec{\omega}= \left[ {\omega_{0} ,\omega_{1} , \cdots ,\omega_{n} } \right]\); \(\varvec{\varepsilon}\) is the random noise obeying the normal distribution with zero mean and \(\sigma^{2}\) variance; f is the map from \(\varvec{X}^{'}\) to \(\varvec{Y}^{'}\). The relation among the training set satisfies the independent distribution. Then, the prediction function of RVM can be written as:

$$f\left( {\varvec{X}^{'} ,\varvec{\omega}} \right) = \sum\limits_{i = 1}^{N} {\omega_{i} K\left( {\varvec{X}^{\varvec{'}} ,\varvec{x}_{i}^{'} } \right)} + \omega_{0}$$
(4)

where \(K\left( {\varvec{X}^{\varvec{'}} ,\varvec{x}_{i}^{'} } \right)\) is the kernel function. The Gaussian prior probability \(P\left( {\varvec{\omega}|\varvec{\alpha}} \right)\) of hyper-parameter \(\varvec{\alpha}\) in accordance with \(\varvec{\omega}\) is defined as:

$$P\left( {\varvec{\omega}|\varvec{\alpha}} \right) = \prod\limits_{i = 1}^{N} {\frac{{\alpha_{i} }}{{\sqrt {2\uppi} }}\text{e}^{{ - \frac{{\alpha_{i} \omega_{i}^{2} }}{2}}} }$$
(5)

In terms of a Bayesian criterion, the posterior normal distribution \(P\left( {\varvec{\alpha},\sigma^{2} |\varvec{Y}^{'} } \right)\) with respect to \(\varvec{\omega}\) and \(\varvec{Y}^{'}\) can be obtained and the optimal solutions \(\varvec{\alpha}_{MP}\) and \(\sigma_{MP}^{2}\) for \(P\left( {\varvec{\alpha},\sigma^{2} |\varvec{Y}^{'} } \right)\) will be obtained through iterating the following equations step by step:

$$\varvec{\alpha}_{MP,k}^{{}} = \frac{{1 - \alpha_{i} \psi_{ii} }}{{\mu_{i}^{2} }}$$
(6)
$$\sigma_{MP,k}^{2} = \frac{{\left\| {\varvec{Y}^{'} -\varvec{\varPhi}\left( {\varvec{X}^{'} } \right)\varvec{\mu}} \right\|^{2} }}{{N - \sum\limits_{t = 0}^{N} {\left( {1 - \alpha_{i} \psi_{ii} } \right)} }}$$
(7)

where \(\psi_{ii}\) is the element on the diagonal line for the posterior covariance matrix \(\varvec{\psi}\), \(\varvec{\psi}\text{ = }\left( {\sigma^{{\text{ - 2}}}\varvec{\varPhi}^{{\rm T}} \left( {\varvec{X}^{'} } \right)\varvec{\varPhi}\left( {\varvec{X}^{'} } \right) + \varvec{A}} \right)^{{\text{ - 1}}} ,\varvec{A}\text{ = diag}\left( {\alpha_{\text{0}} ,\alpha_{1} , \cdot \cdot \cdot ,\alpha_{n} } \right)\); \(\varvec{\mu}\) is the mean value, \(\varvec{\mu}\text{ = }\sigma^{{\text{ - 2}}} \varvec{\psi \varPhi }^{{\rm T}} \left( {\varvec{X}^{'} } \right)\varvec{Y}^{'}\); \(\varvec{\varPhi}\left( {\varvec{X}^{'} } \right)\) is the kernel matrix, as shown in (8).

$$\varvec{\varPhi}\left( {\varvec{X}^{'} } \right)\text{ = }\left[ {\begin{array}{*{20}c} 1 & {K\left( {x_{1}^{'} ,x_{1}^{'} } \right)} & {K\left( {x_{1}^{'} ,x_{2}^{'} } \right)} & \cdots & {K\left( {x_{1}^{'} ,x_{n}^{'} } \right)} \\ 1 & {K\left( {x_{2}^{'} ,x_{1}^{'} } \right)} & {K\left( {x_{2}^{'} ,x_{2}^{'} } \right)} & \cdots & {K\left( {x_{2}^{'} ,x_{n}^{'} } \right)} \\ \vdots & \vdots & \vdots & {} & \vdots \\ 1 & {K\left( {x_{n}^{'} ,x_{1}^{'} } \right)} & {K\left( {x_{n}^{'} ,x_{2}^{'} } \right)} & \cdots & {K\left( {x_{n}^{'} ,x_{n}^{'} } \right)} \\ \end{array} } \right]$$
(8)

Eventually, the prediction value y* can be obtained by the following equation:

$$\varvec{y}^{*} =\varvec{\mu}^{\text{T}} \varvec{\varphi }\left( {\varvec{x}^{*} } \right)$$
(9)

where \(\varvec{\varphi }\left( {\varvec{x}^{*} } \right) = \left[ {1,K\left( {\varvec{x}^{*} ,x_{1}^{'} } \right),K\left( {\varvec{x}^{*} ,x_{2}^{'} } \right), \cdots ,K\left( {\varvec{x}^{*} ,x_{n}^{'} } \right)} \right]\).

2.3 Generation of MK function

MK aggregation through fusing local and global features is an effective solution and can integrate the advantages of each kind of SK function [35]. There are many common SK functions, such as the linear function, Gaussian RBF, and the polynomial function. The polynomial function is good at extracting the global feature of sampling data, but it is poor in learning ability. RBF differs from the polynomial function with its strong learning ability and it is fit for handling local information. Through the organic combination of these two SKs shown in (10), the MK describing multi-source heterogeneous features will be improved. For each factor, a SK has been chosen through the energy entropy method described in Sect. 2.1.

$$\begin{aligned} K_{MKL} \left( {x,y} \right) & = \sum\limits_{i = 1}^{{M_{1} }} {\lambda_{i} K_{Gauss}^{i} \left( {x,x_{i} } \right)} + \sum\limits_{j = 1}^{{M_{2} }} {\theta_{j} K_{Poly}^{j} \left( {x,x_{j} } \right)} \\ & = \sum\limits_{i = 1}^{{M_{1} }} {\lambda_{i} \exp \left( { - \frac{{\left\| {x - x_{i} } \right\|^{2} }}{{2\sigma_{i}^{2} }}} \right) + } \sum\limits_{j = 1}^{{M_{2} }} {\theta_{j} \left( {xx_{j} + 1} \right)^{{p_{j} }} } \\ \end{aligned}$$
(10)

where \(K_{Gauss}^{i} \left( {x,x_{i} } \right)\) and \(K_{Poly}^{j} \left( {x,x_{j} } \right)\) are the Gaussian RBF kernel and polynomial kernel respectively; \(\lambda_{i} ,\theta_{j} \in \left[ {0,1} \right]\) are the weighted coefficients, \(\sum\nolimits_{i = 1}^{{M_{1} }} {\lambda_{i} } + \sum\nolimits_{j = 1}^{{M_{2} }} {\theta_{j} } = 1\); M1 and M2 indicate the numbers of the Gaussian RBF kernel and polynomial kernel respectively, \(M_{1} + M_{2} = M\); \(\sigma_{i}\) and pj are the parameters of the Gaussian RBF kernel and polynomial kernel still to be determined.

The generation of the MK function is shown in Fig. 1. On account of the nonlinearity MK has difficulty in ensuring that the local optimal solution is obtained, the MK is constructed with multiple SK functions through a weighted linear method in the context. For different application fields, the number of SKs included in the MK should be determined in accordance with actual conditions. For this study, the number of input data is 11. The features of sample distributions for the locality exist in load power, air quality index, air temperature, air pressure, relative humidity, vapour pressure, precipitation and EWHs’ power, and RBF kernels are selected to describe them. The others are wind speed, wind direction and similar day, which should be exhibited by polynomial functions because of their global features.

Fig. 1
figure 1

Production flow of multi-kernel considering sample distribution

2.4 Proposed FOA-MKRVM prediction method

The fruit fly optimization algorithm (FOA) is a new global optimization approach based on the foraging behavior of the fruit fly. For the constructed multi-kernel, there are 2×M unknown parameters, where the numbers of weighted coefficients and the kernel parameters are both M. To remove the subjectivity in parameter determination, FOA is used to optimize the parameters of MKRVM. Then, a hybrid AFOA-MKRVM prediction method with self-adaptive generation parameters is proposed. Its specific implementation is as follows:

Step 1: initialize parameters for the fruit fly swarm, including the population size Ne, the dimension De, the maximum iteration time Ge, the threshold of training error e, and the initial locations X0 and Y0.

Step 2: initialize parameters for MK-RVM, including the weighted coefficients and the kernel parameters.

Step 3: start the optimal calculation for the fruit fly and use FOA to regulate the parameters in MK-RVM.

Step 4: update the location and the iteration step in terms of random direction. The individual fruit fly is iteratively evolved by smell search of random direction and adaptive step. The iteration step is adjusted adaptively through the optimal smell concentration in the former generation and the number of iterations in the current generation, which is shown as follows:

$$\left\{ \begin{aligned} & t_{i} = \frac{v}{{S_{i - 1,opt}^{{}} }}\exp \left( { - \vartheta \left( {\frac{i}{{G_{\hbox{max} } }}} \right)^{\eta } } \right) + t_{\hbox{min} } \\ & X_{i} = X_{i - 1} + \varsigma t_{i} \\ & Y_{i} = Y_{i - 1} + \varsigma t_{i} \\ \end{aligned} \right.$$
(11)

where ti is the adaptive step; v is the regulatory coefficient in (0, 1]; \(S_{i - 1,opt}^{{}}\) is the optimal smell concentration in the former generation of the fruit fly swarm; \(\vartheta\) is the constraint coefficient in (0, 1); \(\eta\) is an integer in [1, 10]; \(t_{\hbox{min} }\) is the minimum iteration step; \(\varsigma\) is the regulatory coefficient distributed randomly in [−1, 1]; \(X_{i}\) and \(Y_{i}\) are the location coordinates of the fruit fly.

Step 5: for the unknown location of food, the distance ri is calculated and then the smell concentration value Si is calculated as shown in (12).

$$\left\{ \begin{aligned} & r_{i} = \sqrt {X_{i}^{2} + Y_{i}^{2} } \\ & S_{i} = \frac{1}{{r_{i} }} \\ \end{aligned} \right.$$
(12)

Step 6: in terms of the smell concentration value Si, the prediction values and their root mean square errors are outputted. Meanwhile, the fitness function \(F\left( \bullet \right)\) is constructed through the smell concentration value Si and the maximal smell concentration among the swarm is found. This is shown in the following:

$$\left\{ \begin{aligned} & s_{i} = F\left( {S_{i} } \right) \\ & C_{opt} = \hbox{max} \left( {s_{i} } \right) \\ \end{aligned} \right.$$
(13)

Step 7: retain the optimal smell concentration value and its location coordinates; then the fruit fly swarm will fly towards the location by means of vision.

$$\left\{ \begin{aligned} & S_{opt} = C_{opt} \\ & X = X_{opt} \\ & Y = Y_{opt} \\ \end{aligned} \right.$$
(14)

Step 8: repeat the iterations from Step 2 to Step 6 and judge whether the smell concentration is superior to that of the former generation. If the iteration time extends the maximal iteration time Ge or the training error reaches the threshold e, the iteration will be stopped.

2.5 Parallel prediction based on MapReduce model

In practice, for the same size of sample, the computation speed in MK learning prediction is significantly slower than SK learning prediction. To speed up the computation of FOA-MKRVM, a parallel prediction algorithm based on MapReduce model is implemented and its flow diagram is shown in Fig. 2.

Fig. 2
figure 2

Flow diagram of parallel prediction based on MapReduce model

Step 1: construct the training sample set including N1 sampling points and the testing sample set including S sampling points.

Step 2: constitute the MK function in terms of Fig. 1.

Step 3: split the samples from the training set into Q subsets. Then, each subset consists of about N1/Q training sampling points. These subsets are uploaded to the distributed file system of Hadoop.

Step 4: train Q map functions from Step 3 in parallel through the hybrid FOA-MKRVM prediction method and output RVs of every subset as value.

Step 5: the RVs of every subset are aggregated to form a reduce function consisting of the entire RVs. The hybrid FOA-MKRVM prediction method is reused to train the entire RVs and form the final RVs. The parallel prediction model is ultimately constructed and is used to forecast the testing sample set.

2.6 Performance evaluation

The mean absolute percentage error (MAPE) is selected as the index of performance evaluation and is shown as the following:

$$E_{MAPE} = \left( {\sum\limits_{t = 1}^{T} {\left| {Y_{t} - y_{t} } \right|/y_{t} } } \right)/n \times 100\%$$
(15)

where Yt is the prediction value; yt is the actual value. For the load forecast, a smaller value of MAPE indicates a higher precision accuracy.

Additionally, the general index of speedup ratio is employed to evaluate the parallel efficiency and is described as:

$$R_{SR} = {{T_{sd} } \mathord{\left/ {\vphantom {{T_{sd} } {T_{pl} }}} \right. \kern-0pt} {T_{pl} }}$$
(16)

where \(R_{SR}\) is the speedup ratio; Tsd and Tpl are the operation times of standalone and parallel respectively.

3 Dispatching model for curtailment of wind power with EWHs

A dispatching model of economic accommodation for wind power curtailment through EWHs is established. Economic accommodation of curtailed wind power aims to minimize total coal consumption. If the integration of wind power does not lead to threatening the secure operation of thermal power plants, wind power curtailment will be reasonably executed, where EWHs are connected to maximize the cost saving.

3.1 Objective function

The objective function in economic accommodation of curtailed wind power is constructed as follows:

$$\hbox{min} \sum\limits_{t = 1}^{T} {\left[ {\sum\limits_{i = 1}^{{z_{1} }} {F_{TP} \left( {P_{TP}^{i,t} } \right) + \sum\limits_{i = 1}^{{z_{2} }} {\left( {F_{CHP} \left( {P_{CHP}^{i,t} ,Q_{CHP}^{i,t} } \right) + \beta P_{CW}^{t} + \chi P_{IEWH}^{t} } \right)} } } \right]}$$
(17)
$$F_{CHP} \left( {P_{CHP}^{i,t} ,Q_{CHP}^{i,t} } \right) = b_{1i} + b_{2i} P_{CHP}^{i,t} + b_{3i} Q_{CHP}^{i,t} + b_{4i} \left( {P_{CHP}^{i,t} } \right)^{2} + b_{5i} P_{CHP}^{i,t} Q_{CHP}^{i,t} + b_{6i} \left( {Q_{CHP}^{i,t} } \right)^{2}$$
(18)

where z1 and z2 are the numbers of thermal power (TP) plants and CHP plants; \(P_{TP}^{i,t}\) and \(P_{CHP}^{i,t}\) are the power generated at the time t for the i-th TP and CHP units individually; \(Q_{CHP}^{i,t}\) is the thermal load at the time t for the i-th CHP unit; \(F_{TP}\) is the coal consumption of the TP unit and can be written as \(F_{TP} \left( {P_{TP}^{i,t} } \right) = a_{1i} \left( {P_{TP}^{i,t} } \right)^{2} + a_{2i} P_{TP}^{i,t} + a_{3i}\), a1i, a2i and a3i are the coefficients depicting its coal consumption; \(F_{CHP}\) is the coal consumption of the CHP unit; b1i, b2i, b3i, b4i, b5i and b6i are the coefficients depicting its coal consumption; \(P_{CW}^{t}\) is the curtailed wind power at the time t; \(\beta\) is the penalty coefficient; \(P_{IEWH}^{t}\) is the dispatched power of EWHs at the time t; \(\chi\) is the cost coefficient.

3.2 Constraint conditions

Power load balance constraint:

$$\sum\limits_{t = 1}^{T} {\left( {\sum\limits_{i = 1}^{{z_{1} }} {P_{TP}^{i,t} + \sum\limits_{i = 1}^{{z_{2} }} {P_{CHP}^{i,t} } } } \right)} + P_{W}^{t} = P_{L}^{t} + P_{IEWH}^{t}$$
(19)

where \(P_{W}^{t}\) is the consumed wind power; \(P_{L}^{t}\) is the power of load at the time t.

Thermal load balance constraint:

$$\sum\limits_{t = 1}^{T} {\sum\limits_{i = 1}^{{z_{2} }} {Q_{CHP}^{i,t} } } = Q^{t}$$
(20)

where \(Q^{t}\) is the thermal load at time t.

Technology constraints of TP units:

$$P_{TP,\hbox{min} }^{i} \le P_{TP}^{i,t} \le P_{TP,\hbox{max} }^{i}$$
(21)
$$r_{{TP,\text{down}}}^{i} \Delta t \le P_{TP}^{i,t} - P_{TP}^{i,t - 1} \le r_{{TP,\text{up}}}^{i} \Delta t$$
(22)

where \(P_{TP,\hbox{min} }^{i}\) and \(P_{TP,\hbox{max} }^{i}\) are the minimum and maximum power for the i-th TP unit; \(r_{{TP,\text{down}}}^{i}\) and \(r_{{TP,\text{up}}}^{i}\) are the rates of ramp up and ramp down for the i-th TP unit; \(\Delta t\) is the dispatching time, \(\Delta t = 1 \, \text{h}\).

Technology constraints of CHP units:

$$P_{CHP,\hbox{min} }^{i} \le P_{CHP}^{i,t} \le P_{CHP,\hbox{max} }^{i}$$
(23)
$$r_{{CHP,\text{down}}}^{i} \Delta t \le P_{CHP}^{i,t} - P_{CHP}^{i,t - 1} \le r_{{CHP,\text{up}}}^{i} \Delta t$$
(24)

where \(P_{CHP,\hbox{min} }^{i}\) and \(P_{CHP,\hbox{max} }^{i}\) are the minimum and maximum power for the i-th CHP unit; \(r_{{CHP,\text{down}}}^{i}\) and \(r_{CHP,\text{up}}^{i}\) are the rates of ramp up and ramp down for the i-th CHP unit.

Power constraint of wind power units:

$$P_{W}^{t} + P_{CW}^{t} = P_{Wp}^{t}$$
(25)

where \(P_{Wp}^{t}\) is the predicted power of wind power at the time t.

Power constraint of EWHs:

$$0 \le P_{EWH}^{t} + P_{IEWH}^{t} \le P_{EWHp}$$
(26)
$$0 \le E_{EWH}^{t} + E_{IEWH}^{t} \le E_{EWHp}$$
(27)
$$E_{EWH}^{t} = \int {\kappa P_{EWH}^{t} {\text{d}}t}$$
(28)
$$E_{IEWH}^{t} = \int {\kappa P_{IEWH}^{t} {\text{d}}t}$$
(29)

where \(P_{EWH}^{t}\) is the predicted power of EWHs at the time t; \(P_{EWHp}\) is the total power of EWHs in the grid; \(E_{EWH}^{t}\) is the predicted electrical energy of EWHs during the period of time t; \(E_{IEWH}^{t}\) is the electrical energy consumed by dispatched EWHs during the period of time t; \(E_{EWHp}\) is the total electrical energy that can be consumed by EWHs; \(\kappa\) is the efficiency coefficient.

4 Case study

The experimental data originates from an intelligent residential district powered by a small regional power grid in China. It includes three TP units (#1, #2, #3), three CHP units (#4, #5, #6), and a wind farm with 150 MW rated power, which includes 56 × 1.5 MW doubly-fed induction generators and 33 × 2.0 MW full converted permanent magnet synchronous generators. The generated power of the wind farm is transmitted into the residential district through a booster substation. The parameters of TP and CHP units are shown in Table 1 and Table 2. The data on EWHs with 540 MW rated power and 810 MWh maximum storage capacity in the residential district are gathered into a control centre. EWHs can complete the energy storage during 1.5 hours. The sampling frequency of data is one hour; the sampling data are utilized for day-ahead dispatch.

Table 1 Parameters of TP units
Table 2 Parameters of CHP units

4.1 Verification of prediction approach

In accordance with the prediction requirement and data scale, a Hadoop cluster with five nodes is established in a virtual machine, where one personal computer (PC) works as the master node and the remaining four PCs are configured as the slave nodes. The master node as a central server is responsible for resource allocation and task assignment. The slave nodes execute the tasks of store and operation.

First, the sample distributions of multi-source heterogeneous features are discussed. Nine features are selected for wind power prediction, which are historical load power, wind speed, wind direction, air temperature, air pressure, relative humidity, precipitation, similar day, and air quality index. Simultaneously, six features of historical load power, air temperature, air pressure, relative humidity, precipitation, and similar day are chosen for aggregated power prediction of EWHs. Wind power curtailment in northern China occurs frequently in winter. Hence, one-week of data from December 7th to December 13th, 2015 were extracted to analyze the heterogeneous features through some calculations in terms of (1) to (3). The results are shown in Fig. 3, which shows that the features of sample distributions for historical load power, air quality index, air temperature, air pressure, relative humidity, vapour pressure, precipitation and EWHs load power reflect local behavior. However, wind speed, wind direction, and similar day are distributed as globally.

Fig. 3
figure 3

Sample distributions of multi-source heterogeneous features

Secondly, the training data collected from November 9th, 2015 to December 20th, 2015 is used to predict the load from December 21th, 2015 to December 27th, 2015. The parameters of the fruit fly swarm are shown as follows: the population size Ne = 20, maximum iteration time Ge = 1000, threshold of training error e = 0.1, the adaptive step ti = 0.1, the regulatory coefficient v = 0.6, the constraint coefficient \(\vartheta = 0.2\), \(\eta = 5\), the minimum iteration step \(t_{\hbox{min} } = 0.1\), the regulatory coefficient \(\varsigma = - 0.2\). To evaluate the superiority of the proposed AFOA-MKRVM algorithm, the Gaussian RBF based single kernel RVM (RBF-SKRVM), the polynomial kernel based single kernel RVM (PK-SKRVM), least squares support vector machine (LS-SVM), and weighted least squares support vector machine (WLS-SVM) [37] are compared with the AFOA-MKRVM algorithm. The prediction curves on December 22th, 2015 through the five algorithms are plotted in Fig. 4 and the results of MAPE for seven days are listed in Table 3 and Table 4. In general, Table 3 and Table 4 indicate that the MAPE of prediction values of RBF-SKRVM is lower than that of PK-SKRVM. This is to some extent induced by the local behavior of the majority of sample distributions. However, the prediction accuracy of the proposed AFOA-MKRVM is the highest; especially since it can catch the highest and lowest points illustrated in Fig. 4. Relatively, the prediction accuracy in LS-SVM is the lowest because it neglects the distribution features in the data. The satisfactory prediction both in wind power and in EWHs load power establishes the perfect property of the hybrid MK algorithm in resolving multi-source heterogeneous features through a combination of various approaches.

Fig. 4
figure 4

Prediction results of wind power and EWHs load power with five algorithms

Table 3 Prediction results of MAPE with five algorithms for wind power
Table 4 Prediction results of MAPE with five algorithms for EWHs load power

To compare with the operation time of standalone and parallel, the sampling data are divided into six categories in terms of one-week data: C1 to C6, such as C1 which includes the first week data from November 9th, 2015 to November 15th, 2015, or C2 which includes two weeks data from November 9th, 2015 to November 22th, 2015, and C6 includes six weeks from November 9th, 2015 to December 20th, 2015. For every type of data, the presented algorithm operating in standalone and parallel is conducted respectively, and Fig. 5 gives the comparative result. It can be seen that the speedup ratio of parallel to standalone almost linearly increases with the number of clusters and data scale. For the same number of clusters, the greater data scale resolved in parallel brings a faster speedup ratio relatively; the tendency of the increase in speedup ratio with the number of cluster will gradually reduce a little.

Fig. 5
figure 5

Comparison of operation time for the proposed algorithm running in standalone or parallel

4.2 Validation of dispatching model

The total load power, thermal load, wind power, and EWHs load power of a typical day in winter are selected and plotted in Fig. 6, which demonstrates that the curtailment of wind power frequently occurs in the night from 11 p.m. to 7 a.m.. It is restricted by the minimum power outputs (MPOs) of TP and CHP units. For a TP unit, its MPO is a constant, which is half of the generating power in that case. However, the MPO for a CHP varies with its thermal load. A big power will be forced to output with a large thermal load in the nighttime for heating supply. Combined with a large output of wind farm and a small load power, wind power will have to be curtailed to avoid the forced outage of TP and CHP plants, as shown in Fig. 6.

Fig. 6
figure 6

Mechanism analysis of wind power curtailment

To reveal of the relationship between thermal load and accommodation rate of wind power, the dispatching model shown in (17) is solved by a genetic algorithm (GA) with invariable load power and the result is plotted in Fig. 7. It implies that the accommodation ratio of wind power declines with the augmentation of thermal load for the constraint relation of electricity to heat in CHP units.

Fig. 7
figure 7

Ratio of curtailed wind power with variation of thermal load power

Hence, when the supply side has been limited, the idea of supplementing the regulation of load power through EWHs from the demand side is a valid path to enlarge the system’s peaking capability indirectly. Figure 8 compares the load power and wind power accommodation with and without dispatched EWHs respectively, and indicates that activation of EWHs during the period of wind power curtailment is actually able to increase the power consumed by load. The ratio of wind power connected into the grid is improved from 84.62% to 86.27%. Accordingly, the coal consumption is reduced from 9964.5 t to 9962.7 t as shown in Table 5. For this case, 540 MW/810 MWh EWHs can be used to a day-ahead dispatch. In terms of the prediction data, about 753.6 MWh or 93.04% EWHs has been heated that cannot be employed within the period of curtailed wind power. Therefore, a surplus of less than 10% EWHs can be dispatched to accommodate wind power, and this plays a weak role in wind power curtailment. In other words, a disorderly use of EWHs goes against the accommodation of wind power.

Fig. 8
figure 8

Comparison of load power and wind power accommodation with and without dispatched EWHs

Table 5 Results comparison with and without dispatched EWHs

The relation among dispatched EWHs load power, curtailed wind power and coal consumption is uncovered by solving the constructed dispatch model with different orderly activation of EWHs and the result is given in Fig. 9, which shows that if about 351 MW EWHs are dispatched, the curtailed wind power will be totally accommodated in the case. Simultaneously, the cost of coal consumption obtains the lowest value. In addition, it is worth mentioning that heating EWHs orderly can shave the peak load in the daytime and shift this portion of power into the load valley in the nighttime. Inversely, the curtailed wind power and coal consumption increase nearly linearly with the reduction of dispatched EWHs’ power.

Fig. 9
figure 9

Coal consumption and dispatched EWHs power with different wind power accommodations

To demonstrate the influence of EWHs on the overall cost, the cost ratio of EWHs to overall cost with different curtailed wind power is shown in Fig. 10. It indicates that the cost ratio of EWHs to overall dispatching cost reduces with the increase of wind power curtailment. The maximum cost ratio of dispatching EWHs is about 0.2%. This illustrates that it is a relatively economical and feasible scheme in wind power accommodation.

Fig. 10
figure 10

Cost ratio of EWHs in overall cost with different wind power accommodations

5 Conclusion

This paper addresses the issue of wind power curtailment with an innovative way of dispatching storage-type EWHs. The conclusions are as follow:

  1. 1)

    In contrast to SK prediction such as RBF-SKRVM and PK-SKRVM, the proposed MK prediction algorithm combining AFOA and MKRVM has a higher precision of prediction, especially in resolving the sample distribution of multi-source heterogeneous features.

  2. 2)

    The parallel computation with a Hadoop cluster accelerates the computation speed of the prediction approach in a similar linear tendency with the number of clusters.

  3. 3)

    Wind power curtailment frequently occurs at night because of a low power load and a high thermal load; the accommodation ratio of wind power reduces with the increase of thermal load.

  4. 4)

    Dispatching of EWHs enables the reduction of the curtailment of wind power and decreases coal consumption; moreover, a regular dispatching of EWHs can smooth the curve of load power relatively and demonstrates better economic feasibility.

Our research ignores the limit of transmission capacity for the grid. However, some studies have shown [36] that the structure of the grid network is related to the power transmission and the accommodation of wind power. Future work can build a dispatching model in more detail. In addition, the kernel functions used in the paper are the RBF kernel and the polynomial kernel, and this solution is a relatively restricted one. More work will be carried out on the MK prediction approach in the selection of kernel function.