Evaluating wind speed probability distribution models with a novel goodness of fit metric: a Trinidad and Tobago case study

Wind energy has been explored as a viable alternative to fossil fuels in many small island developing states such as those in the Caribbean for a long time. Central to evaluating the feasibility of any wind energy project is choosing the most appropriate wind speed model. This is a function of the metric used to assess the goodness of fit of the statistical models tested. This paper compares a number of common metrics then proposes an alternative to the application-blind statistical tools commonly used.Wind speeds at two locations are considered: Crown Point, Tobago; and Piarco, Trinidad. Hourly wind speeds over a 15-year period have been analyzed for both sites. The available data is modelled using the Birnbaum–Saunders, Exponential, Gamma, Generalized Extreme Value, Generalized Pareto, Nakagami, Normal, Rayleigh and Weibull probability distributions. The distributions were compared graphically and their parameters were estimated using maximum likelihood estimation. Goodness of fit was assessed using the normalised mean square error testing, Chi-squared testing, Kolmogorov–Smirnov, R-squared, Akaike information criteria and Bayesian information criteria tests and the distributions ranked. The distribution ranking varied widely depending on the test used highlighting the need for a more contextualized goodness of fit metric. With this in mind, the concept of application-specific information criteria (ASIC) for testing goodness of fit is introduced. This allows distributions to be ranked by secondary features which are a function of both the primary data and the application space.


Introduction
Electricity costs in most Caribbean Small Island Developing States (SIDS) are amongst the highest in the world [1] with majority of electrical energy being produced from imported fossil fuels [2]. As a result, wind energy is increasingly being explored as an alternative source of energy [3] with feasibility studies showing great potential for various islands [4][5][6][7]. Conversely, the Caribbean countries are amongst the most wind storm-prone regions in the world suffering from 26 storm impacts in the last 4 years alone [8][9][10]. These storms have an acute impact on the economies of these small states [11]. Additionally, wind speeds can even impact upon the region's flora and fauna [12].
Given the importance of wind to Caribbean SIDS, it is necessary for the characteristics of the wind be studied closely. Energy studies, storm risk studies and aviation considerations, among others, require the wind to be modelled as accurately and comprehensively as possible. Many studies have aimed at characterizing or comparing wind speed distributions at different locations [13][14][15][16][17][18][19][20][21]. Several emphasize seasonal variations while diurnal variations have also been examined [5]. However, these have either been located outside the Caribbean or have been limited in their exploration of candidate distributions. Furthermore, there is no consensus on the goodness of fit criterion which is most suitable for evaluating the appropriateness of the distribution to a particular application [22].
The paper examines the applicability of probability distributions commonly used to model wind speeds to data available from two locations in Trinidad and Tobago. The relative performance of these distributions is compared using goodness of fit tests. Additionally, the concept of application-specific information criteria (ASIC), is introduced as an improved method for distribution ranking in the case of wind energy studies. Section 2, "Description of Data,"describes the data used for this study, investigates the basic statistical properties and outlines the pre-processing required before utilisation of the data. Section 3, "Methodology" describes the candidate distributions, goodness-of-fit criteria used and the method for parameter estimation. Section 4, "Results and Discussion" displays the fit of the candidate distributions to the data graphically. Several goodness-of-fit tests are used to rank the performance of the distributions. In addition, expected wind energy output from a turbine is estimated and compared to the energy output calculated using the actual wind data. Finally, concluding remarks are given in Sect. 5.

General
The locations given in Fig. 1, provide a useful opportunity for comparison as they are both greeted by the same north-easterly trade wind system [23], but are located at sites with differing geography. Crown Point is on a sheltered coast while Piarco is located in-land in an open plain. Piarco also receives some degree of sheltering by mountains to the North.
The dataset consists of the mean hourly wind speeds at Crown Point, Tobago and Piarco, Trinidad (locations indicated in Table 1 for the years 2000-2015, provided by the meteorological offices at airports at both locations and does not include wind direction or peak gust speed. The speeds were recorded in knots, rounded to the nearest knot, and are given in intervals of 1 h for each hour of the 24-h day for every day of the month. It should be noted that there are data points missing from both datasets. For Piarco, Trinidad, approximately 19 days of data from October 21 to November 9, 2009 are missing. There are also missing data points for a number of hours on other days. The total number of measurements is 133,083 out of a maximum possible number of 133,656 (0.4% missing data). For Crown Point, the data for the months of July to September, 2001, and August, 2011, are missing. Additionally, some days are missing data for an hour or a few hours. The total number of measurements is 123,429 out of a maximum possible number of 133,656 (7.7% missing data).

Basic statistics
Before any pre-processing, the data were described by the statistics given in Tables 2 and 3. Data pre-processing included data inspection for possibly erroneous values: -As seen in Table 2 Saffir-Simpson scale [25]. However, review of archived daily newspapers, for the next day make no mention of the event [28]. Again, no corroborating records were found to confirm these wind speeds [14] and as such, they were deemed erroneous.  All noted erroneous measurements were replaced with null values. Tables 4 and 5 reflects the wind statistics after pre-processing.

Distribution fitting
Typically, wind data is modelled as a Weibull distribution, especially when the aim is to characterise the annual resource [5,15,[29][30][31][32], however, a number of other candidate distributions have been catalogued [33]. For other applications such as statistical analysis of extreme wind speeds, the Weibull (or reverse Weibull) has also been recommended [34] while other distributions such as the generalised extreme value distribution [17] and the generalised pareto [19] have been explored. Agustin [20] encouraged using mixed distributions while confirming the applicability of Weibull. Sarkar et al. [35] identified the weakness of the Weibull distribution as its failure to describe the upper tail.
The Rayleigh distribution has also been used as a probability model for wind speed [31], although some applications have found Weibull to be more accurate [32,36]. Recent studies found autoregressive models [37] and maximum entropy distributions [38] to be better suited to wind speed applications than Weibull or Rayleigh. Alavi et al. [39] found that the Nakagami distribution performed well when compared to other distributions frequently used to model wind speed. Additionally, the Normal and Exponential Distributions were identified as potential candidates via visual inspection of the histogram shape. The Birnbaum-Saunders and Gamma distributions performed well when goodness of fit was assessed using the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) criteria, and were thus included in the comparative analysis (Figs. 2, 3).

Review of probability distribution functions
The equations defining the probability density functions (PDFs) for various candidate distributions of interest are given below. While by no means exhaustive, the distributions represent those commonly used in the literature.

Birnbaum-Saunders
where is the location parameter, > 0 is the shape parameter, > 0 is the scale parameter and

Generalized extreme value
where where ∈ is the location parameter, > 0 is the scale parameter and ∈ is the shape parameter [44].

Generalized Pareto
where ∈ is the location parameter, > 0 is the scale parameter and ∈ is the shape parameter [45].

Nakagami
where is the shape parameter and is the spread parameter, for x > 0 [46].

Normal
where is the mean and is the standard deviation [45].

Rayleigh
where is the scale parameter [47].

Weibull
where k > 0 is the shape parameter and > 0 is the scale parameter [48].

Parameter estimation
Several techniques exist for parameter estimation (e.g., [22]). In this work, the parameters for these various distributions were estimated using the maximum likelihood method, which selects as its estimate the parameter value that maximizes the probability of the observed data [49]. This method is popularly used since the resulting estimators are generally asymptotically unbiased and consistent. They also offer the advantage of simplicity in implementation. While this method can be limited through the need to determine closedform estimator solutions, for the distributions of interest in this work, these can be readily obtained [22].

Goodness of fit
After plotting the distributions using the estimated parameters, the goodness of fit of the distributions to the data profile were assessed using the following metrics: -Normalised mean square error (NMSE).

Fig. 5 Wind distribution at Piarco
The normalised mean square error (NMSE) The NMSE was calculated using the previous method with the following equation: Where y n are the modelled values and f n are the reference data.

The Chi-squared statistic
For testing the goodness of fit, the Chi-squared was used. The Chi-square statistic ( 2 ) is calculated as follows: where O i are the observed counts and E i are the expected counts [49]. O i was the estimated sample datasets calculated using the estimated pdf of each distribution. E i was derived via the frequency histogram based on the measured data. N was determined by the number of bins used in the frequency histogram. A smaller Chi-squared statistic indicates a better fit.

The two-sample Kolmogorov-Smirnov test
The two-sample Kolmogorov-Smirnov test statistic was calculated as follows:  where F 1 (x) is the proportion of x1 values less than or equal to x, and F 2 (x) is the proportion of x2 values less than or equal to x. The smaller the test statistic the better the fit [52].

Co-efficient of determination, R 2
The R 2 statistic was calculated as follows: where,

and,
where y i represents the dataset and f i represents the modelled values. R 2 varies between − Inf (bad fit) to 1 (perfect fit) [53].

Akaike information criterion
The AIC statistic was calculated as follows: aic = −2logL( ) + 2k, Birn-Saun Birn-Saun Gamma Birn-Saun Rayleigh where logL( ) denotes the value of the maximized loglikelihood objective function for a model with k parameters. A smaller AIC statistic value indicates a better fit [54].

Bayesian information criterion
The BIC statistic was calculated as follows: where logL( ) denotes the value of the maximized loglikelihood objective function for a model with k parameters fit to N data points. A smaller BIC statistic value indicates a better fit [54] (Figs. 4, 5, 6, 7).

Results and discussion
The estimated parameters for each distribution are shown in the Appendix. The performance of these distributions were compared using the goodness of fit metrics described in Sect. 3.4 (Tables 6,7,8,9).
As evident, rankings varied depending on the goodness of fit metric used. Although in some other studies goodness of fit metrics corroborated each other [32,38,39], similar variability was observed in [55]. Figures 8,9,10,11,12,13,14,15,16 and 17 show the details for each goodness of fit metric. This is particularly evident in Figs. 8 and 9 which show rankings by NMSE and Chi-squared metrics, where the Birnbaum-Sanders distribution was particularly ill fitted as compared to Figs. 11 and 12 in which it is comparable when evaluated using the R 2 , and AIC and BIC criteria, respectively.
The variability in rankings raises the question of suitability of any given metric to the application. Consequently, some method of determining which goodness-of-fit criterion is best suited to the application has to be found or a new application-specific information criterion (ASIC) has to be formulated.

Application-specific information criterion
Wind models are used to calculate the expected energy generated by wind turbines. In this case, expected energy output over a particular time would be an important consideration in design and investment decisions. The ability of the distribution to accurately estimate this value is crucial.
Consider a wind turbine modeled as a 3MW unit using a piecewise linear model with a cut-in speed (cis) of 3.5 ms −1 , rated speed (rs) of 14 ms −1 and cut-out speed ( cos ) of 25 ms −1 as shown in Fig. 18.
The expected energy output of the turbine over a given period of time is calculated according to Eq. 19.
where P(v) is the turbine power vs speed characteristic (Fig. 18) and f(v) is the distribution function used to model the data.
For this work, the proposed ASIC is defined as a normalized weighted error function (in this case normalized error in expected energy is used), with the weightings defined by the turbine characteristic.
is the estimated distribution function. Using this approach, sections of the distribution which contribute more to the application are more heavily weighted than those that do not. In this case, the fit of the distributions below wind speeds of 3.5 ms −1 or above 25 ms −1 are not as important since the wind turbine does not output any power for those conditions. Using the chosen ASIC, the fit of the data over the range of power producing speeds of the turbine is assessed. This marks a departure from the philosophy behind other goodness-of-fit tests which equally weight all sections of a distribution or weight them based on probability and do not consider any external information in the determination of goodness of fit.
The actual energy output for Piarco was calculated as approximately 112 GWh, while the value for Crown Point was 155 GWh. Tables 10 and 11 show the percentage difference in energy predicted by the models as compared to the energy derived directly from the wind data. As evident, the results did not match any ranking derived from the conventional goodness-of-fit metrics.
Among the traditional goodness-of-fit tests, the Chisquared and Kolmogorov-Smirnov tests produced similar results to the ASIC in that they placed similar candidate distributions within the top four ranked distributions, albeit with a different order. This indicates that they may be better suited as goodness-of-fit tests for the purpose of wind energy studies than the other traditional goodness of fit metrics utilised in this paper. Given that the application space is known, however, using an ASIC would still be preferrable since rankings are made according to a parameter (energy in this case) which is meaningful to users of the data.
Finally, it is also noteworthy that the Weibull distribution, which is traditionally used in wind modelling in the Caribbean, performed poorly for both datasets using all the metrics investigated. This is likely due to the large amount of low to zero wind speed measurements. Castellanos [37] has also noted that the Weibull distribution performs poorly when the data contains a large proportion of low wind speeds (Figs. 19, 20; Table 12).

Conclusions
The Weibull distribution was found to perform relatively poorly as a wind probability model for both sites. The Rayleigh distribution performed consistently better than the Weibull but was still ill suited as a model for the data.
The inconsistency in results for the goodness of fit led to the conceptualization of application-specific information criteria (ASIC) as a more meaningful approach for assessment of goodness of fit in cases where the secondary, applicationspecific features must be calculated from the primary data.
For the application in question, the normalized error in expected energy is used as a goodness-of-fit metric to rank candidate distributions. The advantage of this technique   is that the distributions can be examined in terms of overestimation or under-estimation of expected energy as well as the magnitude of deviation while using a metric that is meaningful in the context of the intended application space.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.