Background

Growing global population along with fast depleting reserves of fossil fuels is influencing researchers to search for clean and pollution-free sources of energy such as solar, wind, and bioenergies. Wind energy is a never ending natural resource which has shown its great potential in combating climate change while ensuring clean and efficient energy. Further, rapid advances in wind turbine technology led to significant growth of wind power generation across the world. However, wind energy is more sensitive to variations with topography and wind patterns compared to solar energy. Wind energy can be harvested economically if the turbines are installed in a windy area and suitable turbine is properly selected. Wind speed forecasting is a critical factor in assessing wind energy potential and performance of wind energy conversion systems. Several probability density functions (PDFs) have been used in literature to describe wind speed characteristics which include Weibull, Rayleigh, bimodal Weibull, lognormal, gamma, etc.

Islam et al. [1] used two-parameter Weibull distribution function for wind speed forecasting and assessed wind energy potentiality at Kudat and Labuan, Malaysia. Celik [2] used Weibull-representative wind data instead of the measured data in time-series format for estimating the wind energy and had shown that estimated wind energy is highly accurately. Celik [3] made statistical analysis of wind data at southern region of Turkey and summarized that Weibull model was better than Rayleigh model in fitting the measured data distributions. Akdag et al. [4] discussed the suitability of two-parameter Weibull wind speed distribution and the two-component mixture Weibull distribution (WW-PDF) to estimate wind speed characteristics. Carta et al. [5] used WW-PDF because it is able to represent heterogeneous wind regimes in which there was evidence of bimodality or bitangentiality or, simply, unimodality. Maximum likelihood and least-square methods were used to estimate WW-PDF parameters. In [6], wind speed distributions were shown to be satisfactorily described with a log-normal function, and in [7], Weibull and lognormal distribution functions were used to fit wind speed distributions. Kiss and Imre [8] used Rayleigh, Weibull, and gamma distributions to model wind speeds both over land and sea. They found that generalized gamma distribution, which has independent shape parameters for both tails, provides an adequate and unified description almost everywhere. Generalized extreme value (GEV) distribution that combines the Gumbel, Frechet, and Weibull extreme value distributions were used to model extreme wind speeds [912]. In recent past, mixture distributions were used to estimate wind energy potential that are quite accurate in describing wind speed characteristics. Jaramillo and Borja [13] used mixture Weibull distribution (WW) to model bimodal wind speed frequency distribution. Akpinar et al. [14] used mixture normal and Weibull distribution (NW), which is a mixture of truncated normal distribution, and conventional Weibull distribution to model wind speeds. Tian Pau [15] employed mixture gamma and Weibull distribution (GW) which is a combination of gamma and Weibull distributions, and also mixture normal distribution (NN) which is a mixture function of two-component truncated normal distribution for wind speed modeling.

The objective of this study is to propose three new mixture distributions, viz., Weibull-lognormal (WL), GEV-lognormal (GEVL), and Weibull-GEV (WGEV) for wind speed forecasting. Comparison of the proposed mixture distributions with existing distribution functions is done to demonstrate their suitability in describing wind speed characteristics.

The rest of this paper is organized as follows: wind distribution models and goodness of fit tests used in this paper are presented in the section ‘Methods’. Results derived from this study are discussed in ‘Results and discussions’ section. Details about the data used for the analysis are given in this section. Conclusions are presented in the ‘Conclusions’ section.

Methods

Significance

The most suitable wind turbine model which needs to be installed in a wind farm is selected by careful wind energy resource evaluation. Accurate evaluation could be done using best fit distribution model. Using inappropriate distribution models results in inaccurate estimation of wind turbine capacity factor and annual energy production which in turn leads to improper estimation of levelized production cost [13]. Hence, it is important to choose an accurate distribution model which closely mimics the wind speed distribution at a particular site.

Wind distribution models

Wind distribution modeling requires analysis of wind data over a number of years. To reduce the expenses and time required to process long-term wind speed data, it is desirable to use statistical distribution functions for describing the wind speed variations. The primary tools to describe wind speed characteristics are probability density functions. The parameters of probability distribution functions which describe wind-speed frequency distribution are estimated using statistical data of a few years. Many PDFs have been proposed in recent past, but in present study Weibull, Lognormal, gamma, GEV, WW-PDF, mixture gamma and Weibull distribution, mixture normal distribution, mixture normal and Weibull distribution, and three new mixture distributions, viz., Weibull-lognormal, GEV-lognormal, and Weibull-GEV are used to describe wind speed characteristics. Parameters defining each distribution function are calculated using maximum likelihood method.

Weibull distribution

The Weibull function is commonly used for fitting measured wind speed probability distribution. Weibull distribution with two parameters is given by [1]:

Weibull PDF

f v , k , c = k c v c k 1 exp v c k
(1)

Weibull cumulative distribution function (CDF):

F v , k , c = 1 exp v c k
(2)

Weibull shape and scale parameters are calculated using the maximum likelihood method [16] which is given by:

k = i = 1 n v i k 1 n v i i = 1 n v i k i = 1 n 1 n v i n 1
(3)

where v i is the wind speed in time step i and n is the number of data points. To evaluate (3) an iterative technique is used. Scale parameter is obtained by

c = 1 n i = 1 n v i k 1 / k
(4)

Generalized extreme value distribution

GEV distribution is a flexible model that combines the Gumbel, Frechet and Weibull maximum extreme value distributions [11, 17]. GEV PDF is given by

e v , ζ , δ , 1 = 1 δ 1 + ζ v 1 δ 1 ζ 1 exp 1 + ζ v l δ 1 ζ i f ζ 0
(5)

GEV CDF [17] is given by

E v , ζ , δ , l = exp 1 + ζ v 1 δ 1 ζ if ζ 0
(6)

GEV parameters are calculated using the maximum likelihood method which maximizes the Logarithm of likelihood function given by

L L = 1 n i = 1 n e v i ; ζ , δ , l = Σ i = 1 In n e v i ; ζ , 1
(7)

Lognormal distribution

Lognormal distribution is probability distribution of a random variable whose logarithm is normally distributed.

Lognormal PDF is given by [18, 19]

l n v , Ø , λ = 1 v Ø 2 π exp 1 n v λ 2 2 Ø 2
(8)

Lognormal CDF is written as [18]

L N v , Ø , λ = 1 2 + 1 2 erf 1 n v λ Ø 2

where

erf v = 2 π 0 v exp t 2 dt
(9)

Lognormal parameters λ and Φ estimated using maximum likelihood method which do not need an iterative procedure are given by [20]

λ = 1 N i = 1 N 1 n v i ; Ø 2 = 1 N i = 1 N 1 n v i λ 2
(10)

Gamma distribution

The probability density function of gamma distribution is expressed using the below function [15]

g v , α , β = v α 1 β α Γ α exp v β
(11)

The cumulative Gamma distribution function is given by [20]

G v , α , β = v α 1 β α Γ α exp v β d v
(12)

Gamma distribution parameters are estimated using maximum likelihood method that maximizes the logarithm of likelihood function which is given by:

L L = 1 n i 1 n h v i ; α , β = i 1 n 1 n h v i ; α , β
(13)

Two-component mixture Weibull distribution

The probability density function, which depends on five parameters (v; k1, c1, k2, c2, w) is given by [5]

f f v ; k 1 , c 1 , k 2 , c 2 , w = w f v ; k 1 , c 1 + 1 w f v ; k 2 , c 2
(14)

The cumulative distribution function is given by [5]

F F v ; k 1 , c 1 , k 2 , c 2 , w = w F v ; k 1 , c 1 + 1 w F v ; k 2 , c 2
(15)

Relevant likelihood function is

L L = i = 1 n 1 n w f v ; k 1 , c 1 + 1 w f v ; k 2 , c 2
(16)

Mixture gamma and Weibull distribution

The probability density function and cumulative distribution function of the mixture gamma and Weibull distribution are given by [15]

h v ; α , β , k , c , w = w g v ; α , β + 1 w f v ; k , c
(17)
H v ; α , β , k , c , w = w G v ; α , β + 1 w F v ; k , c
(18)

Relevant likelihood function is

L L = i 1 n 1 n w g v ; α , β + 1 w f v ; k , c
(19)

Mixture normal distribution

The probability density function of singly truncated normal distribution is given by [15]

q v ; μ , σ = 1 I μ , σ σ 2 π exp v μ 2 2 σ 2 for v 0 ,
(20)

where I(μ,σ) is the normalization factor that leads the integration of the truncated distribution to one is expressed as

I μ , σ 1 σ 2 π 0 exp v μ 2 2 σ 2 d v .
(21)

The cumulative truncated normal distribution is given by

Q v ; μ , σ = 0 v 1 I μ , σ σ 2 π exp v μ 2 2 σ 2 d v
(22)

The mixture function of two component truncated normal distribution from the above can be written as [15]

r v ; μ 1 σ 1 , μ 2 , σ 2 w = w q v ; μ 1 σ 1 + 1 w q v ; μ 2 σ 2
(23)

The cumulative distribution function is given by

R v ; μ 1 σ 1 , μ 2 , σ 2 w = w Q v ; μ 1 σ 1 + 1 w Q v ; μ 2 σ 2
(24)

Relevant likelihood function to estimate the five parameters is

L L = i = 1 n 1 n w q v ; μ 1 σ 1 + 1 w q v ; μ 2 σ 2
(25)

Mixture normal and Weibull distribution

The probability density function of the mixture distribution comprising of truncated normal and conventional Weibull is written as [15]

s v ; μ , σ , k , c = w q v ; μ , σ + 1 w f v ; k , c
(26)

Its cumulative distribution function is given as

S v ; μ , σ , k , c = w Q v ; μ , σ + 1 w F v ; k , c
(27)

Relevant likelihood function to estimate the five parameters is

L L = i = 1 n 1 n w q v ; μ , σ + 1 w f v ; k , c
(28)

Mixture Weibull and GEV distribution

The probability density function of the mixture distribution comprising Weibull and GEV functions which is applied for the first time to model wind speed distribution is written as

t v ; k , c , ζ , δ , l = w f v ; k , c + 1 w e v ; ζ , δ , l
(29)

Its cumulative distribution function is given as

T v ; k , c , ζ , δ , l = w F v ; k , c + 1 w E v ; k , c , ζ , δ , l
(30)

Relevant likelihood function to estimate the six parameters is

L L = i = 1 n 1 n w f v ; k , c + 1 w e v ; ζ , δ , l
(31)

Mixture Weibull and lognormal distribution

The probability density function of the mixture distribution comprising Weilbull and lognormal of functions which is applied for the first time to model wind speed distribution is written as

u v ; k , c , λ , φ = w f v ; k , c + 1 w l v ; λ , φ
(32)

Its cumulative distribution function is given as

U v ; k , c , λ , φ = w F v ; k , c + 1 w L v ; λ , φ
(33)

Relevant likelihood function to estimate the five parameters is

L L = i = 1 n 1 n w f v ; k , c + 1 w L v ; λ , φ
(34)

Mixture GEV and lognormal distribution

The probability density function of the mixture distribution comprising GEV and lognormal functions which is applied for the first time to model wind speed distribution is written as

v v ; ζ , δ , l , λ , φ = w e v ; ζ , δ , l + 1 w l v ; λ , φ
(35)

Its cumulative distribution function is given as

V v ; ζ , δ , l , λ , φ = w E v ; ζ , δ , l + 1 w L v ; λ , φ
(36)

Relevant likelihood function to estimate the five parameters is

L L = i = 1 n 1 n w e v ; ζ , δ , l + 1 w l v ; λ , φ
(37)

Goodness-of-fit tests

Goodness-of-fit tests are used to measure the deviation between the predicted data using theoretical probability function and the observed data. In this paper five statistical errors are considered as judgment criteria to evaluate the fitness of PDFs.

Kolmogorov-Smirnov test

The first one is the Kolmogorov-Smirnov test (K-S), which is defined as the maximum error in cumulative distribution functions [21].

K-S = max C v 0 v
(38)

where C(v) and O(v) are the cumulative distribution functions for wind speed not exceeding v calculated by distribution function and observed wind speed data respectively. Lesser K-S value indicates better fitness of the PDF.

R2 test

R2 test is used widely for goodness-of-fit comparisons and hypothesis testing because it quantifies the correlation between the observed cumulative probabilities and the predicted cumulative probabilities of a wind speed distribution. A larger value of R2 indicates a better fit of the model cumulative probabilities F ^ to the observed cumulative probabilities F. R2 is defined as [22]:

R 2 = i = 1 n F ^ i F ¯ 2 i = 1 n F ^ i F ¯ 2 + i = 1 n F i F ^ i 2
(39)

where F ¯ = 1 n i = 1 n F ^ i The estimated cumulative probabilities F ^ are obtained from cumulative distribution functions (CDFs).

Chi-square error

Chi-square error is used to assess whether the observed probability differs from the predicted probability. Chi-square error is given by

x 2 = i = 1 n F i F ^ i 2 F ^ i
(40)

Root mean squared error

Root mean squared error (RMSE) provides a term-by-term comparison of the actual deviation between observed probabilities anead predicted probabilities. A lower value of RMSE indicates a better distribution function model.

RMSE = 1 n i = 1 n F i F ^ i 2 1 / 2
(41)

Power density error (PDE)

The relative error between the wind power density calculated from actual time-series data and that from theoretical probability function is expressed as [23]

PDE = P D tp P D ts P D ts 100
(42)

where PD ts is the wind power density calculated from actual time-series data which is given by

P D ts = 1 2 ρ v 3 ¯
(43)

where PD tp is the wind power density based on a theoretical probability density function fn(v) which is given by

P D tp = 1 2 ρ v 3 f n v d v
(44)

Results and discussion

Wind speed data from four wind stations were used in evaluating different PDFs to assess their suitability. Wind speed data provided by National Data Buoy Center [24] at five stations 42056 (Yucatan Basin), 46012 (Half Moon Bay, 24NM South Southwest of San Francisco, CA), 46014 (PT Arena, 19NM North of Point Arena, CA), and 46054 (Santa Barbara W 38 NM West of Santa Barbara, CA) are used for wind speed analysis. Ten-minute mean wind speed data recorded at 5 m above the sea level are used for present studies.

Wind data of over the period 2008 to 2010 is used for wind station 42056.

For station 46012, wind data over a period of 10 years (2001 to 2010) is used for statistical analysis.

Wind data of station 46014 over a period of three years (2008 to 2010) is analyzed for wind distribution modeling.

For wind station 46054 data over the period (1999 to 2000) is used for statistical analysis.

In the present study, suitability of the PDFs is assessed using goodness-of-fit tests. All computational procedures are carried out in MATLAB software package. Computed parameter values of different PDFs used for all the four stations are presented in Table 1.

Table 1 Computed parameter values of different probability density functions

The mean and standard deviation of observed wind speed for Station 42056 are 6.91888 m/s and 2.56768 m/s, respectively. Wind frequency histogram resembles familiar bell-shaped curve; hence, Weibull PDF fits the observed distribution well. The statistical parameters for fitness evaluation of PDFs currently analyzed are presented in Table 2. All the PDFs except lognormal, GEV, and gamma are able to describe the wind speed characteristics well which is evident from their small power density errors shown in Table 2. Considering K-S error, χ2 error, RMSE and PDE, the distribution functions lognormal, GEV and Gamma have large errors indicating their inadequacy in modeling wind speeds. Results presented in Table 2 show clearly that proposed mixture GEVL PDF provided the best fit of observed wind speed distribution. From Figure 1, it is evident that GEVL distribution provides a close fit throughout the entire wind speed spectrum when compared to other distributions. The higher value of R2 and the lower values of K-S error, RMSE and chi-square error indicate that proposed GEVL distribution is more accurate than other PDFs in modeling wind speeds of Station 42056.

Table 2 Statistical errors for different distribution functions of Station 42056
Figure 1
figure 1

Predicted and observed wind frequencies of Station 42056.

Station 46012 has a mean and standard deviation of 5.59185 m/s and 3.13391 m/s, respectively, for the observed wind speed. Statistical errors, K-S, R2, χ2, and RMSE given in Table 3 indicate that proposed mixture WGEV distribution provides best fit for the observed wind frequency distribution, which is closely followed by GW, WL, GEVL, and WW mixture distributions. Convention PDFs such as lognormal and gamma, over predicted wind speeds which are in the range of 2 to 5 m/s and 13 to 24 m/s, respectively. These PDFs have under predicted speeds between 5 to 11 m/s. Apart from WGEV and WL, other mixture PDFs and conventional PDFs over predicted wind speeds in the range of 3 to 5 m/s which are reflected by the overestimated predicted probabilities as depicted in Figure 2. Results indicate that mixture PDFs perform better compared to conventional single PDFs.

Table 3 Statistical errors for different distribution functions of Station 46012
Figure 2
figure 2

Predicted and observed wind frequencies of Station 46012.

Mean and standard deviation of wind speed for Station 46014 are 6.19907 m/s and 3.71888 m/s, respectively. From Figure 3, it is seen that WGEV, WG, WW, and WN distributions are able to model wind speed characteristics better than other PDFs. All other distributions have either over- or under-predicted wind speeds apart from these three. Results presented in Table 4 clearly show that, considering K-S error, χ2, and RMSE, GW PDF has the smallest error followed by WGEV, WW, and NW. If R2 error is considered, GW has a value very close to 1.0, confirming its superiority in performance followed by WGEV, WW, and NW distributions.

Figure 3
figure 3

Predicted and observed wind frequencies of Station 46014.

Table 4 Statistical errors for different distribution functions of Station 46014

Wind regime of Station 46054 has bimodal distribution with mean and standard deviation of 8.1261 m/s and 4.2390 m/s, respectively. Compared to conventional single PDFs, mixture PDFs have performed well in modeling the wind speeds. Lognormal, gamma, Weibull, and GEV fared poorly in describing the wind characteristics compared to other mixture PDFs. As seen from Figure 4 and statistical parameters from Table 5, two component mixture Weibull distribution (WW) provided the best fit for the observed wind data, closely followed by proposed mixture function WGEV.

Figure 4
figure 4

Predicted and observed wind frequencies of Station 46054.

Table 5 Statistical errors for different distribution functions of Station 46054

Figures 1 to 4 show that mixture PDFs fit much better than the conventional Weibull, lognormal, and gamma distributions. Proposed mixture distributions GEVL for Station 42056 and WGEV for Station 46012 have outperformed other existing mixture and conventional single distributions. For Station 46054, WW distribution provided a better fit than others, while WGEV being the close second best fit.

Conclusions

In the present article, a comparison of distribution models has been undertaken for describing wind regimes of four wind stations. Common conventional PDFs and mixture PDFs along with three proposed new mixture PDFs, viz., WGEV, GEVL, and WL are used for wind speed modeling. It is shown that conventional PDFs, such as Weibull, lognormal, and gamma, are inadequate; hence, mixture functions are used to model the observed wind speed distributions better. Though the superiority of proposed mixture functions in Station 46014 is not very significant, the proposed mixture distributions GEVL for Station 42056 and WGEV for Station 46012 have provided better fit of the empirical data than other existing mixture distributions. For Station 46054, both WW and WGEV are found more suitable for describing wind speed distributions than other distributions. The performance difference between WW and WGEV distributions is not significant for this station. Results show clearly that proposed mixture PDFs, WGEV and GEVL, provide viable alternative to other mixture PDFs in describing wind regimes. Mixture PDFs which include GEV are able to provide close fit, particularly for high speed ranges of the wind spectrums. This is critical for wind speed applications as wind power is proportional to the cube of wind speed. Hence, mixture combinations of GEV with other conventional distributions need to be tried out for further analysis.

Authors' information

RK is an assistant professor in the Electrical Engineering Department, Jawaharlal Nehru Technological University, Kakinada, India. His areas of interest include distribution system planning and distributed generation. SRR is an associate professor of the same university. His areas of interest include electric power distribution systems and power systems operation and control. SVLN is a professor of Computer Science and Engineering in the School of IT, Jawaharlal Nehru Technological University, Hyderabad, India. His areas of interests include real time power system operation and control, IT applications in power utility companies, web technologies, and e-governance. KMP is a postgraduate student of the Department of Electrical and Electronics Engineering at JNTU Kakinada, India. His areas of interest include probabilistic DG planning and energy systems.