1 Introduction

Estimates of extreme values at high return levels depend strongly on the shape parameter of the statistical model (Ragulina and Reitan 2017). With only a few tens of years of observed data, large uncertainty exists for parameter estimation based on data from an individual observational stations. For instance, with a short record of extreme wind gust data, say around 20 years, the unreliably estimated shape parameter could lead in some cases to the prediction errors of 1000-year wind speeds up to a few hundred per cent (Simiu and Filliben 1975). A common approach adopted in wind engineering to ameliorate this shortcoming is the ‘super-station’ (or station-year) approach (Dorman 1983; Buishand 1991; Peterka 1992; Wang, Wang, and Khoo 2013) which takes advantage of having multiple weather stations with valid records in climatologically homogeneous regions. This approach commingles all the data of the valid records from independent events into a single record with the years of the record being the summed years of all the original records, hence extending the length of record which leads to uncertainty reduction at high return levels. However, the problem of potentially large bias and variance in prediction remains for return levels beyond the accumulated record length of the super-station.

With regards to natural hazards, if multiple regions of different climatic conditions are considered, the hazard in each region is analysed based on the data collected in the region. As a result, each region would have its own hazard model parameters, and most likely different regions would have different shape parameters. In practice, for the sake of convenience to engineering applications or the consensus of expert judgement, it may be decided to use a specific shape parameter value across all the regions. An example is the Australian design standard for wind actions (Australian/New Zealand Standard 2021) in which a shape parameter of 0.1 is specified for all of the four wind regions. The ASCE 7 Standards Committee on Loads has assumed shape parameter equal to 0 (effectively the Type I extreme value distribution) for extreme non-hurricane speeds (Lombardo et al. 2009; Simiu and Yeo 2019) in order to avoid underestimating the extreme wind speeds. In such cases, the specific values were chosen based either on the consensus of judgement or heuristic averaging of the set of shape parameters derived from analysing the wind gust data (Buishand 1991; Holmes 2002). Either way, there is a lack of objective criteria or theoretical basis to derive the best shape parameter value. A wide range of goodness-of-fit test methods such as the Anderson–Darling statistic, the Kolmogorov–Smirnov test, and graph-based tests exist (Palutikof et al. 1999), but they are applicable only for testing the within-data fitness of a dataset from one individual station (or one ‘super-station’). In addition, insufficient record length of observational data poses a serious challenge to derive a shape parameter value with high confidence for extrapolation to hundreds of years beyond the record length.

Van Den Brink and Können (2011) introduced a concept in terms of return period for modelling the occurrence probability of the maximum value of each record in an ensemble of independently collected records and found that the logarithmically transformed return period of a maximum value follows approximately the standard Gumbel distribution. With an approximation, they demonstrated that this concept can be employed to check the appropriateness of the probability model and its parameter values used for modelling the observed extremal phenomena (Van Den Brink and Können 2008, 2011). Once appropriate model parameters are found, they can lead to reduction in bias and variance of extrapolated extremal values.

Instead of return period, this paper makes use of the distinction between return period and average recurrence interval (ARI), derives similar concept proposed by Van Den Brink and Können (2011) but in terms of ARI, and shows that the log-transformed ARI follows exactly the standard Gumbel distribution. Because the conceptual distinction between return period and ARI is at the heart of the theory developed in Sect. 2, it is conducive to provide distinctive definitions of the two terms:

  • Return period, commonly used in engineering to express the recurrence time of an extreme random event, is defined based on event occurrence in discretised time intervals (e.g. yearly intervals), which can be modelled as a Bernoulli sequence. The number of time intervals taken in a Bernoulli sequence until an occurrence of the event is governed by the geometric distribution; hence, return period is equal to the reciprocal of the occurrence probability of the event within one time interval (Ang and Tang 2007).

  • ARI is defined as the recurrence time of an extreme random event that follows a Poisson process. The time taken in a Poisson process until an occurrence of the event is governed by the exponential distribution; hence, ARI is equal to the reciprocal of the occurrence rate of the event which may occur at any time instant (Wang and Holmes 2020).

Return period (denoted by \(R\)) and ARI (denoted by \(A\)) can be shown to be related by,

$$1/R = 1 - e^{ - 1/A}$$
(1)

Note that the domain of \(R\) is \(\left( {1,\infty } \right)\) and that of \(A\) is \(\left( {0,\infty } \right)\). This difference is consequential as \(R\) is incapable of accounting for the occurrence of sub-interval events, and the difference between \(R\) and \(A\) is significant, say, when \(A < 5\). In addition, by Eq. (1), \({\text{lim}}_{A \to \infty } \left[ {R\left( A \right) - A} \right] = 1/2\), which, dependent on subject-matter problems, may be negligible when \(A > 10\) (Wang and Holmes 2020).

Extreme natural hazard data were typically extracted through the block maxima (BM) (usually annual extremes) method or the peaks-over-threshold (POT) method, two of the most widely used methods for processing extremal data. For large-scale synoptic wind-induced gusts, for example, extracting annual extremes is straightforward with high-quality data, whereas for less frequent, convective wind events such as thunderstorms, tornadoes, downbursts, and tropical cyclones, which may not occur every year, the wind gusts over a sufficiently high threshold from independent events may be taken for analysis. In this context, return period is conceptually associated with the BM method, whereas ARI conforms to the POT method. However, after a collected dataset is processed by BM or POT method, either the return periods or the ARI’s associated with the data points in the dataset can be estimated, and the other can be obtained by Eq. (1). This leads to a situation where ARI can always be estimated; hence, the method described in Sect. 2 can be applied.

The two most commonly used extremal distributions may be the generalised extreme value distribution (GEV) and the generalised Pareto distribution (GPD). Conceptually, the GEV is associated with the BM data collection method, whereas the GPD conforms to the POT method. Despite of the conceptual distinction, the GEV and GPD have been shown to possess a duality relationship that allows the parameters of one distribution to be converted to that of another (Wang and Holmes 2020). As a result, either of the two distributions can be employed for analysis, whether the dataset is processed via BM or POT, so long as the ARI’s (or exceedance rates) are used rather than the return periods (or exceedance probabilities), allowing the choice between the distributions to be based on the preference of the analyst.

In the following, the theory which describes the log-transformed ARI’s of maximum values from an ensemble of records of different hazard-generating mechanisms constitute a sample drawn from the standard Gumbel distribution is first derived, followed by an illustration of applying the theory to determine the best shape parameter using the wind gust records in South Australia. Since the GEV requires one model parameter (i.e. the rate of threshold exceedance) less than the GPD, it is employed for gust hazard analysis. This study demonstrates a theoretical basis to derive the best shape parameter value for one or multiple regions of heterogeneous climatological conditions, by which the derived hazard model is safeguarded to produce extrapolated wind gust speed at high ARI’s with reduced bias and variance. The illustrated method could be a useful tool in cases where one shape parameter value is applied across regions of various hazards, as of the case in Australian/New Zealand Standard (2021).

2 Method

The theory described herein is applicable to an ensemble of extremal data records that may be produced by either the same or different mechanisms from homogeneous or heterogeneous meteorological regions as it involves only the maximum value of each record. It will be shown that the probability distribution of the ARI associated with the maximum value of a record is independent of the parent distribution. In addition, it applies to cases whether the data are processed by the BM or the POT method.

Assume that the occurrence of an extremal random variable \(Y\) follows a Poisson process and the unit time interval is one year, then the probability distribution of \(Y\) in one year is \(F_{Y} \left( y \right) = P\left\{ {Y \le y} \right\}\). By Eq. (1), the probability of \(Y\) less than \(y_{a}\), the \(Y\) value corresponding to \(a\)-year ARI, is

$$F_{Y} \left( {y_{a} } \right) = P\left\{ {Y \le y_{a} } \right\} = e^{ - 1/a}$$
(2)

which is an inverted exponential distribution (Lin et al. 1989). Let \(Y_{n}\) be the extreme value of \(Y\) in an \(n\)-year time interval, then

$$P\left\{ {Y_{n} \le y_{a} } \right\} = \left[ {P\left\{ {Y \le y_{a} } \right\}} \right]^{n} = e^{ - n/a}$$
(3)

Since for every quantile value \(y\) of \(Y\), there is a unique \(a\) of \(A\), the bijective function \(g:Y \to A\) maps \(Y\) to \(A\), where \(g\left( y \right) = - 1/{\text{ln}}F_{Y} \left( y \right)\). Therefore, \(A\) is also a random variable. That is, \(g\left( y \right)\) first maps \(F_{Y} \left( y \right)\) to the uniform random variable \(U\) on [0, 1], then maps \(U\) to \(A\) by

$$A = - 1/{\text{ln}}U$$
(4)

If \(F_{A} \left( a \right)\) is the distribution function of \(A\), we have

$$F_{A} \left( a \right) = P\left\{ {A \le a} \right\} = P\left\{ {U \le e^{ - 1/a} } \right\} = e^{ - 1/a}$$
(5)

the same as the right-hand side of Eq. (2). This is not surprising as both \(Y\) and \(A\) are mapped bijectively to \(U\) through \(F_{Y} \left( y \right)\) and \(F_{A} \left( a \right)\), respectively.

Let \(A_{n}\) be the corresponding ARI of \(Y_{n}\) and write

$$P\left\{ {A_{n} \le a} \right\} = \left[ {P\left\{ {A \le a} \right\}} \right]^{n} = e^{ - n/a} = e^{{ - e^{{ - \left( {{\text{ln}}a - {\text{ln}}n} \right)}} }}$$
(6)

If we further define

$$\Delta L_{A} = {\text{ln}}A - {\text{ln}}n$$
(7)

then \(\Delta L_{A}\) is also a random variable with the probability distribution,

$$P\left\{ {\Delta L_{A} \le \Delta \hat{L}_{A} } \right\} = e^{{ - e^{{ - \Delta \hat{L}_{A} }} }}$$
(8)

which is the standard Gumbel distribution. It is seen that the distribution of \(\Delta L_{A}\) depends on neither the underlying distribution \(F_{Y} \left( y \right)\) nor its distribution parameters.

Comparing the derivation above to that in Van Den Brink and Können (2011), in which \(R\) is used instead of \(A\). Equation (2) would become \(F_{Y} \left( y \right) = P\left\{ {Y \le y_{a} } \right\} = 1 - 1/R\), which is no longer an inverted exponential distribution. An approximation, \(R \approx - 1/{\text{ln}}\left( {1 - 1/R} \right)\), would then be needed and after that \(R\) would need to be treated as if it were \(A\), then \({\text{ln}}R - {\text{ln}}n\) would be Gumbel distributed.

In theory, if \(F_{Y} \left( y \right)\) is perfectly known, any given \(y_{a}\) and the corresponding \(a\) will be known. In this case, if we choose, e.g. \(a = n\), \(\Delta \hat{L}_{A}\) will be just a constant of zero. In practical situations, however, \(F_{Y} \left( y \right)\) is not known, an observed quantity \(y_{a}\) would correspond to an unknown \(a\), which effectively constitutes a random sampling problem with interest of estimating \(a\) or equivalently \(\Delta \hat{L}_{A}\).

Suppose a dataset is collected from \(m\) stations located in areas of homogeneous meteorology and the events recorded are independent within and between stations. Then, a \(\Delta \hat{L}_{{A_{i} }}\) sample of size \(m\), \(i = 1,...,m\), can be obtained from the \(m\) stations, where \(\Delta \hat{L}_{{A_{i} }}\) is computed from the maximum \(A\) value at station \(i\). These \(\Delta \hat{L}_{{A_{i} }}\) values can be regarded as a sample drawn from the standard Gumbel distribution.

Because the distribution of \(\Delta L_{A}\) is independent of the underlying distribution, the implication is that the \(m\) stations considered for analysis need not be in areas of homogeneous meteorology or due to the same type of extreme events, as illustrated in Sect. 4. Consequently, records from heterogeneous meteorological areas or even different climatic event types may be combined to gauge the conformance of \(\Delta \hat{L}_{{A_{i} }}\)’s to the standard Gumbel distribution, as each \(\Delta L_{{A_{i} }}\) value is a trial from the standard Gumbel. In other words, this property allows the extremes from different hazard environments or different hazard variables to ‘learn’ from the experience of others by pooling the \(\Delta \hat{L}_{{A_{i} }}\)’s since they simply constitute a sample drawn from the standard Gumbel variate.

Even though \(\Delta L_{{A_{i} }}\) is independent of the underlying \(F_{{Y_{i} }} \left( {y_{i} } \right)\)’s, estimation of \(a_{i}\) corresponding to \(y_{i}\) requires a distribution function of \(Y_{i}\). In practical situations, \(F_{{Y_{i} }} \left( {y_{i} } \right)\)’s are usually unavailable and needs to be substituted by an empirically determined distribution \(\hat{F}_{{Y_{i} }} \left( {y_{i} } \right)\) with the distribution parameters being estimated from observational data, which are invariably plagued by sampling errors. Assuming that there are \(N_{i}\) extreme values recorded at station \(i\) and \(y_{i}\)’s are arranged in ascending order, \(y_{{\left( {1_{i} } \right)}} \le y_{{\left( {2_{i} } \right)}} \le ... \le y_{{\left( {N_{i} } \right)}}\), then \(\Delta \hat{L}_{{A_{i} }}\) based on \(\hat{F}_{{Y_{i} }} \left( {y_{{\left( {N_{i} } \right)}} } \right)\) is obtained. Since \(\Delta \hat{L}_{{A_{i} }}\) is determined by the largest value from station \(i\), special attention should be paid to ensure that all the \(m\) largest values are contributed by independent extreme events. If one event contributes to multiple \(\Delta \hat{L}_{{A_{i} }}\)’s, the \(\Delta \hat{L}_{{A_{i} }}\) value which represents the highest ARI among them is kept in the analysis but all others triggered by the same event should be discarded.

The duality of the GEV and GPD ensures that they exhibit the same tail behaviour and have the same shape parameter (Wang and Holmes 2020) when applied to the same set of data. The GEV is used in this study as it does not depend on the rate of exceedance, even though the wind gust data used in this study (described in Sect. 3) were chosen by the POT method. Because of the duality, the outcomes and conclusion drawn for the GEV are equally applicable to the GPD.

The GEV may be expressed as

$$P\left\{ {Y \le y_{a} } \right\} = e^{{ - \left[ {1 - k\left( {\frac{{y_{a} - \eta }}{\sigma }} \right)} \right]^{1/k} }}$$
(9)

where \(\eta\), \(\sigma\), and \(k\) are the location, scale, and shape parameters, respectively, of the distribution. The shape parameter k determines the type of extreme value distribution: \(k = 0\) corresponds to Type I (Gumbel) distribution (\(- \infty < y_{a} < \infty\)); \(k < 0\) corresponds to Type II (Fisher-Tippett) distribution (\(\eta + \sigma /k \le y_{a} < \infty\)); and \(k > 0\) corresponds to Type III (Weibull) distribution (\(0 < y_{a} \le \eta + \sigma /k\)). \(y_{a}\) is related to its corresponding \(a\) as follows:

$$y_{a} = \left\{ {\begin{array}{*{20}l} {\eta + \frac{\sigma }{k}\left( {1 - a^{ - k} } \right),} \hfill & {{\text{if }}k \ne 0;} \hfill \\ {\eta + \sigma {\text{ln}}a,} \hfill & {{\text{otherwise}}.} \hfill \\ \end{array} } \right.$$
(10)

For extreme hazard analysis, the analysts may exercise their own decisions for the type (i.e. the sign and/or magnitude of k) of extreme value distribution. The distribution parameters could be estimated by a model fitting method such as the least-squares regression (Press et al. 2007), method of moments (Ang and Tang 2007), probability weighted moments, maximum likelihood method, principle of maximum entropy, elemental quantile method, or Bayesian approaches (de Zea Bermudez and Kotz 2010). Except the method of moments and the maximum likelihood method, virtually all other methods require an estimate of empirical cumulative distribution function (ECDF) for parameter estimation. For this purpose, the rate of exceedance (or ARI) is used to obtain the ECDF.

For a sample of \(N\) extremal values exceeding a specified threshold in \(n\) years and the occurrence of exceedance obeys a Poisson process, an unbiased estimate of the rate of exceedance, \(\lambda_{j} ,j = 1,...,N\), with respect to the j-th smallest value may be estimated by (Ang and Tang 2007)

$$\lambda_{j} = \frac{N - j + 1}{n}$$
(11)

Because \(a_{j} = 1/\lambda_{j}\), Eq. (11) may be used to obtain the ECDF for hazard model fitting.

For simplicity and without loss of generality, in the following the least-squares linear (for cases with fixed shape parameter) and nonlinear (for cases with free shape parameter) regression techniques for the wind gust speed on ARI were used for model parameter estimation.

Extreme hazards such as wind gusts to a region may be generated by multiple independent mechanisms, and the hazards posed collectively by those mechanisms need to be estimated. One way for estimating the combined hazard is to group all observed gust speeds from all mechanisms as a dataset for wind hazard analysis. However, hazard models derived from such grouped datasets tend to underestimate wind hazard at higher ARI’s (Gomes and Vickery 1978; Lombardo et al. 2009). Instead, a logical way to combine hazards generated by independent mechanisms is by means of probability theory and take advantage of statistical independence. Suppose two independent mechanisms, denoted by \(V_{n}\) and \(V_{s}\), produce gust wind events. Then the probability of combined wind gust, \(V_{c}\), not exceeding v is as follows:

$$P\left\{ {V_{c} \le v} \right\} = P\left\{ {V_{n} \le v} \right\}P\left\{ {V_{s} \le v} \right\}$$
(12)

If \(A_{c}\), \(A_{n}\), and \(A_{s}\) are the ARI’s corresponding to \(V_{c}\), \(V_{n}\), and \(V_{s}\), respectively, then for a given wind speed v, Eq. (12) is equivalent to

$$e^{{ - 1/A_{c} }} = e^{{ - 1/A_{n} }} e^{{ - 1/A_{s} }} = e^{{ - \left( {1/A_{n} + 1/A_{s} } \right)}}$$
(13)

which leads to

$$A_{c}^{ - 1} = A_{n}^{ - 1} + A_{s}^{ - 1}$$
(14)

and the well-known fact that

$$A_{c} = \frac{{A_{n} A_{s} }}{{A_{n} + A_{s} }} < {\text{min}}\left( {A_{n} ,A_{s} } \right)$$
(15)

Equation (14) shows that the combination of hazards can be treated as the addition of the rates of exceedance of independent hazards. The results are readily extended to more than two statistically independent hazard-generating mechanisms.

Note, however, that no closed-form probability distribution is readily available for probabilistic combination of hazards with parent GEV distributions, and no simple expression exists for the shape parameter of the combined hazard model. An alternative is to generate gust speeds by Monte Carlo simulation for \(V_{n}\) and \(V_{s}\), choose \(V_{c} = {\text{max}}\left( {V_{n} ,V_{s} } \right)\), and fit a probability distribution to the sample of \(V_{c}\). While the probabilistic combination determines the exact ARI of the gust speed of combined hazard, the simulation approach allows derivation of a probability distribution through model fitting. These two approaches normally agree well for engineering applications, as illustrated in Fig. 1 for the combined hazard at Adelaide Airport due to synoptic and non-synoptic gust hazards. The combined hazard modelled by the GEV was fitted to a sample of size 2,000 of \(V_{c}\), and can be seen to closely follow the hazard determined by probabilistic combination.

Fig. 1
figure 1

Example wind gust hazard determined by probabilistic combination and by distribution model fitted to simulated gust speeds at Adelaide Airport

3 Data for illustrative example: wind gusts in South Australia

The first Dines pressure-tube/float anemometer in South Australia, managed by the Bureau of Meteorology, Australia, became operational around 1956. Three datasets of 3-s wind gust speeds, recorded at 10 m high, were acquired: half-hourly data (up to January 2015) from 64 stations, daily data (up to May 2017) from 76 stations, and one-minute data (up to May 2017) from 69 stations. The three datasets served to cross-check the veracity of recorded wind speeds and to classify the wind event types. After data quality checking, tidying and screening, some of the stations were eliminated because of a high percentage of missing data, suspect recordings, or complicated topographical surroundings that make highly doubtful a gust speed could be corrected to terrain category 2 (i.e. open terrain) exposure as specified by Australian/New Zealand Standard (2021). To reduce the inadvertent inclusion of multiple peak gust speeds from the same wind event, minimum separation intervals of 4 days for synoptic and of 12 h for non-synoptic wind gusts (Lombardo et al. 2009) were specified. After removing multiple recordings of the same event, only records with data length of at least 10 years and at least 10 events were kept for analysis, among them Adelaide Airport had the longest record of 29 years.

The gust speeds in the dataset for analysis underwent further correction as follows:

  • The instrumented anemometers were changed from the Dines anemometers to the three-cup anemometers around 1991. The wind gust speeds recorded by the two anemometer types were somewhat incompatible, hence required correction as suggested by Holmes and Ginger (2012).

  • The recorded 3-s gust speeds were corrected for the effects of terrain, topography and of shielding by nearby plantation and construction in the cardinal and inter-cardinal directions around each station in accordance with Australian/New Zealand Standard (2021).

  • The 3-s gust speeds were then converted to 0.2-s gust speeds (Holmes and Ginger 2012).

Ideally, the wind gust data should be separated according to the wind event types. Separation of events by storm types is a challenge as the Bureau of Meteorology, Australia, does not normally record the necessary information along with the recorded wind speeds. Some anemograph records may be available and could be used for identification purposes, although this is a time-consuming and painstaking task. The following storm-type separation algorithm was used to examine the one- or two-minute wind gust records to classify the events (Holmes 2019):

  • Identify the occurrence time of peak gust, \(V_{M}\), of the event.

  • Compute the average \(V_{B}\) of recorded wind gusts in the two-hour period before the occurrence of peak gust.

  • Compute the average \(V_{A}\) of recorded wind gusts in the two-hour period after the occurrence of peak gust.

  • Compute the wind gust ratios, \(R_{B} = V_{M} /V_{B}\) and \(R_{A} = V_{M} /V_{A}\).

  • If \(R_{B} \ge 2.0\) and \(R_{A} \ge 2.0\), the event is classified as a downburst.

  • If \(R_{B} < 2.0\) and \(R_{A} < 2.0\), the event is classified as a synoptic.

  • If \(R_{B} < 2.0\) and \(R_{A} \ge 2.0\), the event is classified as a thunderstorm/frontal.

  • If none of the three preceding cases holds, the event is undetermined.

Because of short data lengths available and scarcity of non-synoptic events, separate analyses of downburst and thunderstorm/frontal events became impractical. Therefore, along with the events classified as ‘undetermined’, they were aggregated as ‘non-synoptic’. In addition, datasets with a maximum value greater than the upper bound of the fitted GEV model were excluded as, in such cases, the maximum value has an ARI of infinity. This leaves datasets of records from 39 stations for analysis, among which 33 stations had synoptic, 31 stations had non-synoptic, and 25 had both synoptic and non-synoptic wind records. The locations of the 39 anemometer stations are shown in Fig. 2.

Fig. 2
figure 2

Locations of anemometer stations

4 Illustrative example: wind gust hazard in South Australia

Based on the tidied and corrected datasets, extreme wind gust modelling of hazards due to (a) synoptic, (b) non-synoptic, and (c) combination of synoptic and non-synoptic events, were conducted separately. Only the gust speeds exceeding a threshold of 25 m/s were retained for analysis, as shown in Fig. 1 of Online Resource.

The ARI’s of gust speeds of a wind type at a specific station were estimated by Eq. (11). Because of the duality between the GEV and GPD (Wang and Holmes 2020), either of the two models could be employed for analysis as they should produce identical results. Since GEV does not require estimation of exceedance rate, it was used herein for hazard modelling.

4.1 Shape parameter estimation

Different shape parameter values of a GEV distribution fitted to an extremal dataset lead to different estimates \(\Delta \hat{L}_{A}\) of \(\Delta L_{A}\). This section illustrates the computation of \(\Delta \hat{L}_{A}\) by analyses of wind hazards in South Australia and the determination of optimal shape parameters that give the best fit of \(\Delta \hat{L}_{A}\) to the standard Gumbel distribution.

For the hazard of a wind type, suppose there are m data series collected from m stations and \(n_{i}\) years were recorded from station i. After the data series at station i is fitted to the GEV, \(\Delta \hat{L}_{{A_{i} }}\) of its maximum can be computed by Eq. (8). As a result, a sample of size m of \(\Delta \hat{L}_{A}\) is obtained.

If the GEV models for the m stations fit well, the sample of \(\Delta \hat{L}_{A}\) should mimic a sample drawn from the standard Gumbel distribution. Techniques such as quantile–quantile (Q-Q) plots for Gumbel distribution may be employed to check the conformity. The value of abscissa corresponding to the ascending-ranked \(\Delta \hat{L}_{{A_{i} }}\) can be computed by inverting the standard Gumbel distribution and using a plotting position formula to estimate the ECDF as follows:

$$\Delta G_{i} = - {\text{ln}}\left( { - {\text{ln}}\left( {\frac{i - c}{{m + 1 - 2c}}} \right)} \right)$$
(16)

where \(c\) depends on the plotting position used (Cunnane 1978). \(c = 0\) and \(c = 0.5\) lead to formulae known as the Weibull and the Hazen plotting positions, respectively. Among the most commonly used plotting positions (\(c \le 0.5\)), Weibull and Hazen give rise, respectively, to the most and least conservative hazard modelling results (Folland and Anderson 2002). For small number of stations, \(c\) may need to be carefully chosen since they may result in notably different results. In this illustration, Weibull plotting position was employed for model parameter estimation as it produces comparatively conservative results, which is consistent in concept with design for structural safety under extreme hazards.

To obtain the optimal shape parameter, a range of shape parameter \(k\) values was used to fit the 33 synoptic, 31 non-synoptic, and 25 combined wind hazards. The optimal \(k\) value was chosen based on minimisation of root-mean-squared error (RMSE) between \(\Delta \hat{L}_{A}\) and its idealised counterpart as determined by Eq. (16). Figure 3 shows the RMSE values versus \(k\) values, in which \(k =\) 0.185 for synoptic, \(k =\) 0.25 for non-synoptic, and \(k =\) 0.165 for combined hazard (shown as star-shaped points) were determined to be optimal. The hazard curves derived using the optimal \(k\) values and the observed wind gusts are plotted in Fig. 2 of Online Resource.

Fig. 3
figure 3

Root-mean-squared errors between \(\Delta \hat{L}_{A}\) and standard Gumbel variate versus shape parameters (the star-shaped points denote the \(k\) values where RMSE’s are the minima)

The Gumbel Q-Q plot in Fig. 4(a) shows that the GEV models with fixed \(k =\) 0.185 for synoptic and \(k =\) 0.25 for non-synoptic approximately follow the diagonal line, hence are in agreement with the theory implied in Eq. (8). The commingled \(\Delta \hat{L}_{A}\) (red connected points), representing the pooling of \(\Delta \hat{L}_{A}\)’s from non-synoptic and synoptic events, follow also the standard Gumbel distribution. This shows that the theory developed in Sect. 2 also applies to the pooling of \(\Delta \hat{L}_{A}\)’s of maximum records from multiple mechanisms.

Fig. 4
figure 4

\(\Delta \hat{L}_{A}\) for wind gust models with fixed and free shape parameters on Gumbel Q-Q plots

If the \(m\) data points fall below (above) the diagonal line, it means the model underestimate (overestimate) the ARI value; i.e. overestimate (underestimate) the hazard. If the data points form a linear trend but crosses the diagonal line with slope < 1, then it means the hazard model may have too many parameters and hence may be inappropriate for predicting the extreme values of ARI’s beyond the record length (Van Den Brink and Können 2008). The Q-Q plot in Fig. 4(b) was derived by allowing all three GEV distribution parameters being determined by nonlinear regression for each of the \(m\) data records and shows that the lines connecting \(\Delta \hat{L}_{A}\) values cross the diagonal line; hence, the ARI’s are overestimated (above the diagonal line) in the lower-value range but underestimated (below the diagonal line) in the higher-value range, and hence do not follow the standard Gumbel. This implies that the fitted models may be biased and may hence be inaccurate for extrapolation to high ARI levels. A potential cause of this is that, with free shape parameter, the GEV has too many parameters such that the fitted models exhibit high extrapolation bias at high return levels, which manifests as an underestimated standard deviation of \(\Delta \hat{L}_{A}\). In this regard, fixing the \(k\) value, as of the case in the Australian standard (Australian/New Zealand Standard 2021), would reduce the number of free model parameters and avoid such unfavourable bias, hence a sensible approach for more reliably determining the design wind speeds at ARI’s beyond the available data lengths. If indeed individual \(k\) value for each station is preferred, then in addition to the standard goodness-of-fit tests for interpolation, Eq. (8) can serve as a safeguard for obtaining a hazard model with reduced bias at high extrapolated ARI’s.

As shown in Fig. 2 of Online Resource, the synoptic and non-synoptic wind events at a location pose different extent of threats as they are induced by different climatic mechanisms. The hazards posed by different mechanisms need to be combined for design of structures. Section 2 shows that hazard combination by either probability theory or Monte Carlo simulation provides similar results. As our purpose is to find the optimal shape parameter value of the combined hazard, Monte Carlo simulation was conducted to generate annual gust speeds for 2000 years from the best shape parameters (i.e. \(k =\) 0.185 for synoptic and \(k =\) 0.25 for non-synoptic) for each of the 25 locations where the records of the two wind types were available. For a given year at a specific station, the larger of synoptic and non-synoptic gust speeds was taken as the extreme speed of the year. Figure 3 of Online Resource shows the generated combined annual extremes (red lines) up to 2,000-year ARI and the best fitted hazard curves of the non-synoptic (green lines) and synoptic (blue lines) winds for the 25 locations.

4.2 Comparison with Australian design standard (AS/NZS 1170.2:2021)

For comparison with the Australian Standard, the generated combined annual maxima of 2000 years for 25 stations in the preceding section were fitted to GEV distributions. Similar to the model parameter estimation in Sect. 4.1, we computed the RMSE between estimated \(\Delta \hat{L}_{A}\) and the standard Gumbel quantiles, and plotted it over a range of \(k\) values, as shown in Fig. 3. It shows that the optimal \(k\) value is about 0.165 for the combined hazard. Incidentally, this \(k\) value is close to \(k = 0.161\) obtained by Holmes and Moriarty (1999) for thunderstorm downbursts at Moree, New South Wales, Australia, which is located in wind hazard region A, the same region as the locations studied herein.

For the combined hazards, Fig. 5 shows the Gumbel Q-Q plot of \(\Delta \hat{L}_{A}\) by the GEV models with \(k =\) 0.165, which indicates that the simulated wind hazards agree with the theory, whereas with \(k = 0.10\) the \(\Delta \hat{L}_{A}\) line falls under the diagonal line, indicating that the hazard models overestimate the hazard (i.e. underestimate the ARI). The simulated maximum wind speed of 2,000 years among the 25 stations is \(V_{{{\text{max}}}} =\) 48.3 m/s at Hindmarsh Island Aws. For \(k =\) 0.165 with \(\Delta \hat{L}_{A} =\) 3.21, \(V_{{{\text{max}}}}\) is predicted by Eq. (7) to have an ARI of 49,337 years (compared to 50,000 years inferred by the 25 stations simulated independently for 2,000 years), whereas for \(k = 0.10\) with \(\Delta \hat{L}_{A} =\) 0.06, it is predicted to have an ARI of 2,115 years. In AS/NZS 1170.2:2021 (Australian / New Zealand Standard 2021), with \(k = 0.10\), the regional wind speed of ARI \(=\) 2,000 years for Region A is 48 m/s. This comparison implies that the standard may have overestimated to some extent the wind gust hazard for South Australia.

Fig. 5
figure 5

\(\Delta \hat{L}_{A}\) for combined wind gust models on Gumbel Q-Q plot

Figure 4 of Online Resource illustrates the fitted combined hazard models with \(k = 0.10\) and \(k =\) 0.165 along with the simulated annual extreme gust speeds (i.e. same as the red lines in Fig. 3 of Online Resource) for the 25 locations. Compared with \(k = 0.10\), the curves with \(k =\) 0.165 provide closer fit to the data points in most locations and, as expected, result in lower gust speeds at high ARI’s. As shown in Fig. 6, the average per cent differences between the models with \(k = 0.10\) and \(k =\) 0.165 are about 2.9% and 3.5% for ARI’s of 500 and 1000 years, respectively. Incidentally, a recent study (El Rafei et al. 2023) on the wind gust hazard in New South Wales, Australia, using high-resolution Australian regional reanalysis found that using \(k = 0.10\) overestimates the 500-year ARI gust speeds, when compared to that using variable \(k\) values, by approximately 4% for non-synoptic and 2.5% for synoptic events. That is, the Australian standard-specified design wind speeds for South Australia may generally fall on the conservative side with respect to the design for wind actions of structures specified as of importance levels 2 (domestic housing and structures under normal operations) and 3 (construction designed to contain a large number of people) in the 2019 National Construction Code of Australia (Australian Building Codes Board 2019). Nevertheless, dependent upon the balance of benefits gained versus costs incurred, the resulting higher cost but more conservative construction may be justified if the additional benefits gained are deemed to outweigh the extra costs incurred.

Fig. 6
figure 6

Per cent differences between gust speeds by models with \(k = 0.165\) and \(k = 0.10\)

4.3 Effects on bias and variance reduction

This section illustrates the effects of fixing the shape parameter at the optimal value on bias and variance reduction of gust speeds extrapolated to high ARI’s. The 21-year non-synoptic gust wind record at Adelaide Airport, which was determined to have \(k =\) 0.177 by nonlinear regression, was used for demonstration. A bootstrap sample of size 1,000 was generated and each of them fitted to the GEV with (a) variable k determined by nonlinear regression, and (b) \(k =\) 0.25. Figure 7(a) shows 100 of the models with variable k (thin grey lines) and the model with \(k =\) 0.177 for the original data (thick black line). Similarly, Fig. 7(b) shows the counterparts of Fig. 7(a) but with \(k =\) 0.25. It is clear that the hazard curves in Fig. 7(b) have lower gust speeds with much less spread than that in Fig. 7(a) for a given ARI beyond the data length.

Fig. 7
figure 7

Bootstrapped non-synoptic gust hazards at Adelaide Airport

For quantitative measure of bias and variance reduction of gust speed at high ARI’s by fixing the k value, Fig. 8 shows the probability densities of gust speed at 500-year ARI obtained by the bootstrap sample. The black stars are means, the red dots are medians, and the thick and thin red horizontal lines represent the 66% and 95% confidence intervals, respectively. For the case of \(k = 0.25\), both the mean and median gust speeds are 43.2 m/s, and its 95% quantile (44.4 m/s) is less than the mean (46.1 m/s) and median (45 m/s) of the case of ‘k free’. This means that fixing the shape parameter effectively reduces a ‘bias’ of 2.9 m/s in the mean (or of 1.8 m/s in the median) 500-year gust speed. As a reference, the AS/NZS 1170.2:2021 specifies the 500-year design wind speed to be 45 m/s for Region A.

Fig. 8
figure 8

Probability densities of scale parameters for non-synoptic gust at Adelaide Airport

It is also clear that the case of ‘k free’ exhibits much wider variation of the gust speed value than that of ‘k = 0.25’, and its long right tail beyond 50 m/s are largely due to the estimated \(k < 0\), which makes the hazard models to be of Type II extreme value distribution without an upper bound. The variances for ‘k = 0.25’ and ‘k free’ are 0.361 and 18.9 m2/s2, respectively, meaning that fixing the shape parameter causes a reduction of 98.1% of the variance of 500-year gust speed determined by a 21-year record without fixing the shape parameter. This variance reduction by fixing the shape parameter is comparable to a finding for coastal storm surge in the United Kingdom (Howard 2022), which concluded that unconstrained shape parameter to short records is not advisable.

5 Conclusion

Because many existing extremal data of climatic events such as observational wind gusts span only a few tens of years, resulting in potentially high bias and uncertainty of distribution parameter values estimated based on the data, and hence unreliable return levels when extrapolated to high ARI’s beyond the data length. For the quality of model fitting, the typical goodness-of-fit tests allow assessment within the record length, but unable to test the fitness for higher ARI’s. The approach developed in this study serves to test the appropriateness and unbiasedness of a fitted model for extrapolation of extremal values to high ARI’s, which are typically needed for engineering design and reliability assessment.

Instead of return period, this paper demonstrates the advantage of using ARI for modelling the occurrence of extremes. The ARI is proved to follow the inverted exponential distribution and the log-transformed ARI (i.e. \(\Delta L_{A}\)) follows the Gumbel distribution. Moreover, the ability of the method in pooling the \(\Delta L_{A}\)’s from all observational stations, even when they are from meteorologically heterogeneous regions or of different hazard-generating mechanisms, makes it useful to cases such as the wind gust speed specification in the Australian standard in which a shape parameter value is applied across the four wind regions. In such cases, the estimated values of \(\Delta \hat{L}_{A}\) from the records collected from all regions may be combined and the \(k\) value that fits best the \(\Delta \hat{L}_{A}\) to the standard Gumbel distribution can be chosen to apply to all regions. It was also shown that fixing the shape parameter of hazard model results in a reduction of 2.9 m/s in the potential bias of the mean and a reduction of 98.1% in the variance of 500-year non-synoptic wind gust speed at Adelaide Airport.

In the Australian context, although it often occurs that non-synoptic wind gusts dominate the extreme wind climate, particularly at larger ARI’s, consideration of both synoptic and non-synoptic is necessary as the combined wind hazard tends to have a smaller shape parameter value that typically leads to higher wind gust values at high ARI’s. In addition, synoptic wind gusts at some locations dominate the gust speeds at smaller ARI’s, which is important for construction of temporary and secondary structures such as formwork, circus tents, and farm shelters that are intended to be in services for only a short period of time (Wang and Pham 2011). The analysis of wind gust data from South Australia indicates that the shape parameter value of 0.1 used in the Australian standard, AS/NZS 1170.2:2021, may be lower than necessary since it was shown to overestimate the wind hazard, and hence fall on the conservative side, in South Australia.

Although only extreme wind gust hazard was analysed herein as illustration, the method applies to other extremal phenomena. For instance, Van Den Brink and Können (2011) applied their method, which was modified and improved in this manuscript, to precipitation and sea-level rise, and Howard (2022) applied it to storm surge. Nonetheless, as with typical experimental and observational studies, the accuracy of estimation by the theory described herein clearly depends on the accuracy and uncertainties originated from manufacturing, installation, and data collection of instruments, the violation of assumptions such as the wind speeds being recorded at 10 m high, the quality of the data processing and maintenance, the inadvertent occurrences of large ‘outliers’, and the classification of hazard-generating mechanisms when heterogeneous mechanisms are concerned.