1 Introduction

The utilization of wind energy has been increasing around the world at an accelerating pace due to its non-exhausted property, environmental and social benefits, together with public support and government incentives. However, generating capacity from wind farms behaves quite differently than that from conventional generating sources because of the fluctuating and intermittent nature of wind. To handle with these features, wind speed is assumed to be a random variable which follows various types of distributions, such as Weibull, Rayleigh, Gauss and etc. [18]. Therefore, wind farms and generating capacity assessment of power system incorporating wind energy [916] should be conducted with previously published wind speed distribution analysis.

The estimation of wind speed distribution is an essential requirement in the reliability analysis of power system with wind power integrated since the wind energy available for a particular location is mainly determined by the probability distribution of wind speed. Therefore, a variety of PDFs have been proposed in literature to describe wind speed distributions. Currently, the widely adopted PDFs are unimodal types [18] including Weibull function, Rayleigh function, Gauss function, Gamma function, lognormal function, etc. Besides, some mixture functions of simple unimodal distributions [1719], such as the two-component mixture Weibull function (Weibull–Weibull) and singly truncated normal Weibull mixture function (Normal–Weibull), are also applied to wind energy analysis recently, and they have been proved to be more effective than unimodal types for wind regimes with bimodal distribution. A number of detailed reviews on modeling the probability distributions of wind speed in wind energy analysis can be found in references [2028].

The wind speed models above have a characteristic in common, that is to assume wind speed can be described by a known probability density function (PDF), and then estimate the function parameters using the historical wind speed data. These assumptions aim to simplify wind speed model, and make analysis and calculation easy. However, the errors in the results could be significant and lead to wrong conclusions when the supposed wind distributions do not match the real ones.

Nonparametric density estimation (NPDE) methods provide a new idea and solution to these problems. The methods have on need of prior knowledge about wind speed distribution or any assumption on probability distribution thus is suitable to analyze feature space of any structure. KDE is the most widely used NPDE techniques in data analysis and has applications in geography [29], ecology [30] and other fields.

This paper presents a technique for modeling long-term wind speed distribution using the KDE and applies this method to reliability assessment of power systems containing wind energy. The effects of adding wind capacity to a conventional generating system are illustrated using IEEE-RTS which consists of 32 traditional generating units with a total capacity of 3405 MW and a peak load of 2850 MW. The generating unit ratings and reliability parameters are shown in Ref. [31]. The loss of load expectation (LOLE) and loss of energy expectation (LOEE) indices are used to assess the risk in this study.

2 Histogram estimation on wind speed probability density

Histogram is a kind of simple and initial NPDE method. Its basic principle is to divide sample spaces into several subspaces, and estimate density based on the sampling amount in every subspace. Let V = (v 1,…,v i ,…,v n ) represent wind speed sample. Divide the sample space into m subspaces and the j th subspace, such that \(B_{j} = [x_{0} + (j - 1)h, \, x_{0} + jh] \quad \left( {j = 1,{ 2}, \ldots ,m} \right)\), where x 0 is the starting point and h is the subspace width. The PDF using the histogram estimation at point v can be expressed as.

$$\hat{f}(v) = \frac{1}{nh}\sum\limits_{i = 1}^{n} {I(v_{i} )I(V)}$$

where I(V) is the indicator function, which is 1 provided that vB j and is 0 otherwise.

The calculation of histogram estimation process is simple, but it has several demerits: 1) Accurate estimation on center point for subspace B j , but weak on edge estimation; 2) Strong dependence on the starting point x 0 and smooth parameter h; 3) High requirement of data for high dimensional feature space. These disadvantages make histogram estimation only suitable to PDE with low dimensions.

To overcome above problems, Rosenblatt [32] and Parzen [33] make improvements on histogram estimation methods. Firstly, replace indicator function in histogram by smooth kernel function, and then set estimation interval center as sample observation value. These improvements lead to method commonly referred to as KDE.

3 KDE on wind speed probability density

3.1 Basic principle of KDE

Let (v 1, v 2,…, v n ) represents a sample of the wind speed series. Its underlying PDF can be estimated by the following kernel density function (KDF):

$$\hat{f}(v) = \frac{1}{nh}\sum\limits_{i = 1}^{n} {K\left( {\frac{{v - v_{i} }}{h}} \right)}$$

where n is the sample size, h is the bandwidth and K(·) is a kernel function, which satisfies the constraints:

$$\left\{ {\begin{array}{*{20}l} {K(u){\text{d}}u = 1} \\ {uK(u){\text{d}}u = 0} \\ {u^{2} K(u){\text{d}}u < \infty } \\ \end{array} } \right.$$

When sample size is sufficiently large, \(\hat{f}\left( v \right)\) is converged to f(v) with probability one.

It can be observed from Eq. (2) that the performance of KDE depends on the kernel function and bandwidth. There are many different kinds of kernel functions. Prakasa Rao [34] indicates that KDE is insensitive to the selection of kernel function when n is sufficiently large. Therefore, a kernel function is generally chosen for a certain type. Table 1 shows the commonly used kernel functions.

Table 1 Commonly used kernel functions

3.2 Selection of bandwidth

Bandwidth has relatively great impacts on the accuracy of KDE. The selection of bandwidth is, therefore, key to accurate estimation of wind speed distribution. The bandwidth can be chosen from measuring the estimation error between the underlying function f(v) and its estimate \(\hat{f}(v)\). One commonly used error measure is the mean integrated squared error (MISE), which is expressed as

$$\begin{array}{*{20}l} {e_{\text{MISE}} (\hat{f}(v)) = \int {E(\hat{f}(v) - f(v)} )^{2} {\text{d}}v} \\ { = \frac{{\mu_{2} (K)^{2} }}{4}R(f^{\prime\prime}(v))h^{4} + (nh)^{ - 1} R(K) + o((nh)^{ - 1} + h^{4} )} \\ \end{array}$$


$$\mu_{2} (K) = \int {u^{2} K(u){\text{d}}u} ;\quad R(K) = \int {K(u)^{2} } {\text{d}}u.$$

Omitting higher order infinitesimal in Eq. (4), an asymptotic MISE (AMISE) can be obtained.

$$e_{AMISE} (\hat{f}(v)) = \frac{{\mu_{2} (K)^{2} }}{4}R(f^{\prime \prime }(v))h^{4} + (nh)^{ - 1} R(K)$$

The optimal bandwidth can be obtained by differentiating AMISE with respect to the h.

$$h_{\text{AMISE}} = \left[ {\frac{R(K)}{{\mu_{2} (K)^{2} R(f^{\prime\prime}(v))n}}} \right]^{{{1 \mathord{\left/ {\vphantom {1 5}} \right. \kern-0pt} 5}}}$$

It can be seen from Eq. (6) that the expression involves second derivative f″(v) of unknown function f(v), so the estimation of f″(v) must be carried out before calculating of smooth parameter. Silverman [35] proposes an empirical method on selecting smooth parameter which takes normal density function as the reference distribution of the unknown PDF f(v). However, it can lead to over-smoothing when the underlying distribution is asymmetric or multimodal. In these cases, more sophisticated selection methods such as direct-plug-in (DPI) and cross-validation (CV) should be adopted. A conceptually simple technique, with theoretical justification and good empirical performance, is the DPI technique. The key step of DPI method relies on finding an estimate of the density functional R(f″(v)) in Eq. (6). The detailed information of this method can refer to Ref. [36].

4 Wind speed sampling

An appropriate random simulation of wind speed is required for the assessment the reliability of power system containing wind energy. The KDF can be used to simulate the wind speed. However, the KDF is extremely complicated according to its expression, which means that there will be significant difficulties in applying direct sampling. This paper presents an acceptance-rejection sampling technique to simulate wind speed [37]. The basic idea of the proposed technique is to generate wind speed data from a proposal PDF g(v;α) instead of sampling directly from the target distribution f(v). To make sure that the technique can be performed, f(v) should satisfy \(f(v) \leqslant Cg(v;a)\), where C is a bias factor with \(C \geqslant 1\).

The sampling process carries out as following steps:

  • Step 1 Simulate a random u from U(0,1) uniform distribution;

  • Step 2 Simulate a random v from the proposal PDF g(v;α);

  • Step 3 Check whether or not u < f(v)/(C × g(v;α)).

    • If this holds, accept v as a sample for f(v);

    • If not, reject v, and repeat the sampling steps.

The sampling efficiency of the proposed technique is inversely proportional to bias factor C, thus C should be as small as possible. An unconstrained nonlinear optimization method is used to select C and α and the objective function is modeled as:

$$\hbox{min} F = \hbox{max} \left( {\frac{f(v)}{g(v;a)}} \right)$$

The simulated wind speed data generated through the above procedures can meet the model (2).

5 Case studies for verification of KDE in modeling wind speed distribution

5.1 Information of wind sites and wind data

The hourly wind speed data at six wind sites from the United States (designated as A, B, C) and New Zealand (designated as D, E, F) for five years (from Jan. 1, 2007 to Dec. 31, 2011) were used to illustrate the accuracy and flexibility of the proposed model. The wind speeds at six sites were upscale to a hub height of 80 m. The geographical information and basic statistics of the six sites are shown in the Table 2.

Table 2 Geographical information of the six sites and their wind speeds

5.2 Accuracy judgment criteria

To show how a theoretical probability function matches with the observation data, a statistic \(R_{a}^{2}\) [17] is used as the judgment criteria. The higher \(R_{a}^{2}\) is, the greater the fit is. The statistic \(R_{a}^{2}\) is given by

$$R_{a}^{2} = 1 - (1 - R^{2} )\left( {\frac{N - 1}{N - s}} \right)$$


$$R^{2} = 1 - \frac{{\sum\limits_{i = 1}^{N} {(p_{i} - \hat{p}_{i} )^{2} } }}{{\sum\limits_{i = 1}^{N} {(p_{i} - \bar{p})^{2} } }}$$

N is the total number of intervals; s is the number of parameters in the model; p i and \(\hat{p}_{i}\) are the probability obtained with the sample data and probability model at the i th interval; \(\bar{p}\) is the mean of p i .

5.3 Wind speed modeling

Gauss kernel is applied as the weighting function in the study and the bandwidths are calculated using DPI technique. Figure 1 illustrates the PDF plot of wind speed for the six sites using the histogram estimation, KDE, Weibull model, Normal model and Rayleigh model.

Fig. 1
figure 1

PDF plots and histogram of wind speed for six sites

These figures demonstrate that the wind speed at different sites can have widely varying distribution modes, and the KDE method can always agrees well with the probability distribution characteristic for each site, whereas the Weibull model, Normal model and Rayleigh model show a high goodness of fit only at certain sites, which indicates the flexibility of KDE method in modeling the wind speed distributions.

The results of statistic \(R_{a}^{2}\) obtained by KDE, Weibull model, Normal model and Rayleigh model are shown in Table 3. It is observed that the KDE presents the best fit to the sample data at the six sites, which verifies its flexibility and accuracy again. Among the three parametric estimation models, Weibull model provide a higher fit degree for sites A, B, C, D and E while Normal model for sites F. This indicates that it is unreasonable to use a fixed parametric function to model the wind speed distribution.

Table 3 Results of \(R_{a}^{2}\) with different models for six sites

5.4 KDF investigation

The influence of kernel function on KDE is examined in this part. Four different kernel functions introduced above are chosen as the weight function in KDE for each site, respectively. The results of statistic \(R_{a}^{2}\) obtained by KDE with different kernel functions at the six sites are shown in Table 4.

Table 4 Results of \(R_{a}^{2}\) of KDE with different kernel functions for six sites

It can be seen from Table 4 that the statistic \(R_{a}^{2}\) obtained by KDE with different kernel functions are similar. It indicates that the KDF has minor impact on KDE. Among the four kernel functions, Gauss kernel always outperforms the others. Thus, Gauss kernel can be used as the weight function in KDE for wind speed.

6 Application of KDE in generating system reliability evaluation

6.1 Wind energy conversion model

Wind energy is converted into power by WTGs (wind turbine generator). The power output characteristics of a WTG are quite different from those of conventional generating units. There is a nonlinear relationship between the power output and the wind speed. The relation can be described using the operational parameters of the WTG. The commonly used parameters are the cut-in wind speed, rated wind speed, and cut-out wind speed. The power output [38] can be obtained from the simulated wind speed v using Eq. (10)

$$P_{r} = \left\{ {\begin{array}{*{20}l} {P_{r} (A + B \times v + C \times v^{2} )} & {(v_{ci} \le v \le v_{r} )} \\ {P_{r} } & {(v_{r} \le v_{t} \le v_{co} )} \\ 0 & {(v > v_{co} )\;{\text{or}}\;(v < v_{ci} )} \\ \end{array} } \right.$$

where P r , v ci , v r , and v co are the rated power output, the cut-in wind speed, rated wind speed, and cut-out wind speed of the WTG, respectively. The constants A, B, and C are determined by v ci , v r , and v co as expressed in Ref. [23].

6.2 Reliability assessment of power systems incorporating wind power

A wind farm with a total capacity of 400 MW is assumed to be located at the six sites and added to the IEEE- RTS. The v ci , v r , and v co of each WTG in the wind farms are 3.0, 14.5 and 20.0 m/s, respectively. Studies in Ref. [9] show that the changes in the FOR of the WTG do not have a significant impact on the calculated system reliability indices. It is, therefore, assumed in this paper that the wind farm consists of identical WTG with zero forced outage rates.

The reliability analysis on the system is conducted using state-sampling simulation method combing with the measured wind data, KDE, Weibull model, Normal model and Rayleigh model, respectively. Weibull PDF is selected as the proposal distribution for KDF in the acceptance-rejection sampling technique. Tables 5 and 6 show the reliability evaluation results of different models.

Table 5 LOLE of the IEEE-RTS using different wind speed models
Table 6 LOEE of the IEEE-RTS using different wind speed models

It can be seen from Tables 5 and 6 that the reliability indices calculated by the proposed model are more close to those calculated by the measured data. The average absolute errors using the proposed model for the two indices are 0.22% and 0.39%, respectively, whereas those calculated by the Weibull, Normal and Rayleigh are 3.24%, 6.25% and 5.06% for LOLE, respectively, and 3.50%, 7.19% and 5.30% for LOEE, respectively. The investigation indicates that the proposed model has higher accuracy compared with the parametric models, and can be directly applied in reliability evaluation of power system containing wind energy.

7 Conclusion

This paper presents a wind speed distribution model based on the KDE. The model is a kind of data-driven approach, without any assumptions for distribution mode of wind speed and flexible to any wind regime. Five years’ actual wind speed data from six sites were used to examined the the proposed model, Weibull model, Normal model and Rayleigh model. A statistic \(R_{a}^{2}\) is used as an index to measure the fit degree of the models to actual wind speed data.

The accuracy and applicability of the proposed method is verified using the wind data at six sites. The results indicate that the KDE can describe the wind speed distributions with a high accuracy and excellent robustness. The fitting statistics of \(R_{a}^{2}\) for six sites are more than 0.99. The average absolute errors of loss of load expectation (LOLE) and loss of energy expectation (LOEE) are 0.22% and 0.39%, respectively. The influence of kernel function on KDE is also investigated. The result shows that the Gauss kernel always performs better than the others for wind speed data.

Although KDE is a good NPDE method for estimation of wind speed distribution, it also has several limitations. For example, its computational complexity makes the computation quite tedious and results in that the traditional KDE method can only deal with small-scale and low dimension data set. Therefore, our next research direction is how to solve or mitigate these problems, and build a multidimensional wind speed model based on KDE.