Abstract
In recent decades, extreme precipitation events have increased in frequency and intensity in Greece and across regions of the Mediterranean, with significant environmental and socioeconomic impacts. Therefore, extensive statistical analysis of the extreme rainfall characteristics on a dense temporal scale is crucial for areas with important economic activity. For this reason, this paper uses the daily precipitation measurements of four meteorological stations in a mining area of northeastern Chalkidiki peninsula from 2006 to 2021. Three statistical approaches were carried out to develop the best-fitting probability distribution for annual extreme precipitation conditions, using the maximum likelihood method for parameter estimation: the block maxima of the generalized extreme value (GEV) distribution and the peak over threshold of the generalized Pareto distribution (GPD) based on extreme value theory (EVT), and the gamma distribution. Based upon this fitting distribution procedure, return periods for the extreme precipitation values were calculated. Results indicate that EVT distributions satisfactorily fit extreme precipitation, with GPD being the most appropriate, and lead to similar conclusions regarding extreme events.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Extreme value theory (EVT) states that extreme events have extremely small or large values. More specifically, extremes can be maxima or minima, regarded as rare events, defined by their magnitude or based on their socioeconomic impact (Towler et al. 2020; Coles et al. 2001). Due to the rarity of events and the small sample size, it is difficult to study extremes in a complex, interacting hydro-climatology system (Davison et al. 2013; Coles et al. 2001).
Probabilistic forecasting of extreme events through threshold and peak values has been a very active field of research in risk assessment. The likelihood of the occurrence of extreme events through a probability density function is estimated for different event intensity and persistence values, making probabilistic forecasting possible. The stationary assumption may not be viable in light of the changes due to anthropogenic and climatic stressors causing nonstationary conditions. Yet, non-stationarity should be used cautiously (Salas et al. 2018; Serinaldi and Kilsby 2014). Nonstationary extreme value distributions have become widely used over the last 10 years to account for observed hydro-climatic changes. However, there is still no agreement on a framework for performing extreme frequency analysis under nonstationary conditions or whether the extreme value analysis should be performed in a stationary or nonstationary context (Cancelliere 2017).
Two key techniques that are suitable for the fields of hydrology and risk assessment engineering are block maxima (BM) and peak over threshold (POT) (Katz et al. 2002; Tabari 2021) of generalized extreme value (GEV) distribution and generalized Pareto distribution (GPD), respectively. However, both stationary and nonstationary distributions can be used with BM and POT (Salas et al. 2018). Numerous studies for extreme events have used POT or BM, and the most suitable one is selected based on goodness-of-fit criteria (Nerantzaki and Papalexiou 2022). POT and BM predominate in fundamental hydro-climatological parameter estimation methods for nonstationary extremes using maximum likelihood estimation (MLE) for the estimation of distribution parameters (Nerantzaki and Papalexiou 2019; Salas et al. 2018). Extreme rainfall events can be defined as records significantly higher than the typical values for the region when considering long-term series and statistical inferences of anomalies (Lima et al. 2021; Jackson 2013). Several studies have investigated the POT and BM approaches for extreme rainfall value assessments depending on the context and goals of each case study (Serinaldi and Kilsby 2014; Wang and Holmes 2020; Vrban et al. 2018; Nagy et al. 2017).
Other works have employed the gamma distribution to predict extreme rainfall (Soleh et al. 2016; McBride et al. 2022). Rainfall extremes have also been studied using variable statistical analysis methods for flood risk assessment, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), which were employed to determine the best probability distribution (Laio et al. 2009), probable maximum precipitation (PMP), nonstationary analysis, intensity–duration–frequency curves (IDF), uncertainty in extreme precipitation frequency analysis, and spatial variability of extreme precipitation as additional techniques (Gu et al. 2022). Other approaches have involved polynomial trends in the cumulative distribution functions retrieved from hourly precipitation measurements proposing a new threshold-determining method concerning the regression lines of the cumulative probability distributions (Douka and Karacostas 2018), extrapolating marginal distributions and summarizing the spatial bivariate dependence of extremes (Sebille et al. 2017), graphical methods such as the so-called mean excess function (Nerantzaki and Papalexiou 2019), neural network models using multivariate time series data (de Sousa Araújo et al. 2022), and percentile indices that are commonly used to assess trends and projections of extreme precipitation events (Schär et al. 2016).
All the abovementioned techniques are suitable for evaluating precipitation extremes, but BM, POT, and gamma distribution are the most appropriate, providing optimal modeling results (Tabari 2021; Nerantzaki and Papalexiou 2022; Rahimpour et al. 2016). Modeling extreme precipitation data using the most suitable techniques in a mining area is crucial for several reasons, including safety, environmental impact assessment, natural resources protection, and operational planning. Innovations in this field can help improve the accuracy of predictions, enhance risk management, and ensure sustainable mining practices. An extension of extreme precipitation analysis is its geostatistical analysis, essential for understanding the spatial distribution and characteristics of heavy rainfall events that can significantly impact infrastructure and public safety. It provides valuable insights for climate adaptation and disaster risk reduction efforts. Correctly identifying the probability distribution of extreme events provides insights relative to the theory of random fields concerning probability distributions that can enable the appropriate covariance determination to enhance the geostatistical analysis (Emery 2008). Spatial mapping of extreme rainfall characteristics includes spatial patterns of maximum rainfall intensity, exceedance probabilities, and return levels. This information comes from distribution fitting, and such maps can help identify areas at higher risk of extreme rainfall events.
This work concerns the assessment of precipitation extremes in a mining-metallurgical area in northern Greece, using statistical analysis of extreme rainfall values from four local stations. The region is of great ore-mining interest due to the large deposits of basic and precious metals. The centuries-old mining and metallurgical activity in the broader study area are evidenced by the number of ancient mining works and the slag heaps which come from the exploitation of the gold-bearing manganese and mixed sulfide silver–gold ores of the area found on the slopes of the local mountain. The wider area has important activity on a national and international scale, as it redefines for the region, in whole or in part, the economic characteristics and the development model through the full exploitation of the registered deposits and the parallel research for the expansion of the known deposits and the identification of new ones. Therefore, this study is vital for the sustainability assessment of the area, given that high flooding risk could drastically affect the area’s economic potential.
This work aims to examine and compare three statistical methodologies and to study extreme rainfall in an area of significant mining activity and economic importance. The distributions of the EVT (GEV, GPD) and the gamma distribution are used to find the best-fitting one to the examined data and the return levels of each distribution in the short and long term. Additionally, based on the return levels estimated for each distribution and the existing geomorphological conditions of the area, the station with the highest flooding risk will be identified, indicating the need for protective measures in the area of interest.
The following sections of the work include a case study and available data description, a methodological description of the applied techniques for modeling extreme precipitation events, the results of the study, and a general discussion of the results highlighting its important aspects. Finally, the concluding remarks of the work present the important points inferred from the results.
2 Study Area and Data
The study area is located on the eastern coast of the Chalkidiki peninsula, northern Greece, covering an area of 264 km2. The climate in the study area is characterized as transitional between the continental climate of Central Europe and the Mediterranean climate. The area primarily belongs to the mild Mediterranean type of bioclimate, while the most mountainous parts belong to the sub-Mediterranean type. In terms of bioclimatic layers, the wider area includes three zones: the humid bioclimatic layer with mild winter in the coastal zone, the humid bioclimatic layer with cold winter in the interior, and the humid bioclimatic layer with severe winter in the mountainous zone.
Most flooding events result from climatic conditions; however, they are also affected by the geology and geomorphology of the area, topography, and hydrology, as well as by human activities and structures. Flood risk management is a key issue at regional and local levels worldwide, affecting human lives and activities (Dung et al. 2022). Furthermore, the analysis of the area’s geological, geomorphological, and soil conditions aids in taking appropriate measures to deal with the flood risk. Thus, the study area belongs to the Serbo-Macedonian Massif, which geologically is mainly composed of crystalline schist Paleozoic rocks and younger Tertiary igneous intrusions. Based on the soil map of Greece, there are mainly deep soils in the study area, with a depth greater than 30 cm, as the area has suffered a weak anthropogenic impact and has yet to be eroded (Kroll et al. 2002). In the present study, the data were extracted from four local meteorological stations in northeastern Chalkidiki, Greece (Fig. 1), and consist of daily rainfall data for the period from 2006 to 2021 (Lagouvardos et al. 2017; Special Secretariat for Water of Greece 2021); however, one station contains data for a shorter period (from 2009). Their geographic coordination, time period, and size are presented in Table 1.
The operation of the four meteorological stations was interrupted during the period from 2002 to 2006, and operation (with some interruptions in between) resumed from the end of 2006 until today. Statistical data for several precipitation parameters for this last period are given in Table 2. It should be noted that the annual rainfall values have been calculated based on the average values for each month and should be considered indicative only. As a first insight, the precipitation of each station and its extremes can be observed more simply by an exploratory graph, showing the weather behavior in time series form. An example of such a graph is shown in Fig. 2, and the plots for all stations are presented in Fig. 5.
The graph in Fig. 2 shows the observed precipitation behavior at station S03 over the years, and it shows that maximum rainfall in the area occurred in 2021, while 2014 and 2016 were the driest years.
3 Methodology
3.1 Mathematical Background of Distributions
The main objective of EVT is to describe the tail of the distributions of random variables, that is, to know or predict the statistical probabilities of events that have rarely been observed. The statistical analysis of extreme values has been developed to study future precipitation levels, potentially leading to flooding. The mathematical background of the three distributions GEV, GPD, and gamma is discussed in the following.
The cumulative distribution function (CDF) of the GEV distribution is expressed as
where \(b\in R\) is the location parameter, a > 0 is the scale parameter, and k is the shape parameter.
The shape parameter strictly affects the shape of the distribution and determines the heaviness of each tail (Lazoglou et al. 2019). Depending on the value of the k parameter, the GEV distribution family is divided into three individual distribution families (Coles et al. 2001): if k > 0, GEV takes the form of Fréchet distribution with a heavy tail; if k < 0, GEV takes the form of Weibull distribution, with a light tail with finite upper bound; if k → 0, GEV takes the form of the Gumbel distribution with an exponential tail.
The CDF of the GPD is given by the relationship
where b \(\in R\) is the location parameter, a > 0 is the scale parameter, and k is the shape parameter. Depending on the value of the k parameter, the GPD distribution family is divided into three individual distribution families: if k > 0, the GPD belongs to the class of Pareto II distributions, with light tail and finite right bound in the function b + a/k, (b ≤ x ≤ b + a/k); if k < 0, GPD takes the form of a Pareto distribution, with a heavy tail (b ≤ x ≤ ∞); if k → 0, GPD takes the form of exponential distribution with a normal tail.
As the shape parameter derives from the skewness representing where most of the data lie, it tests whether a distribution is appropriate for characterizing the data set. In addition, the thresholds applied and the results reported define the risk assessment for the extended area based on the flooding forecast, indicating the future rainfall behavior.
The gamma distribution is a continuous probability distribution that models right-skewed data models and sums of exponentially distributed random variables and generalizes both the chi-square and exponential distributions. The gamma distribution can model the elapsed time between various numbers of events (McBride et al. 2022). Conversely, the exponential distribution can model only the time until the next event, such as the next incident over a threshold (Zorzetto et al. 2016). Empirically, gamma distribution fits the rainfall data well, and thus it is added to the analysis methods. The CDF of the gamma distribution is expressed as
where Γ(α) is the gamma function, γ(a, x) is the lower incomplete gamma function, γ(α, x) = \({\int }_{0}^{x}{t}^{a-1}{e}^{-t}{\text{d}}t\), a > 0 is the shape parameter, and b > 0 is the scale parameter. Methods for fitting the data to these three distributions are analyzed in the following.
3.2 Description of Extreme Rainfall Event Modeling Methods
The two methods for extreme value analysis are now presented in further detail. The first method refers to modeling using the GEV distribution, where the data are studied based on maximum observations per group (BM). This method requires the availability of data on the studied event or the maximum observations per period (e.g., maximum observations per year) (Nerantzaki and Papalexiou 2022). The second method is the modeling using the GPD, which aims to study the low-probability tail of the distribution representing the rare events based on the exceedance over a specified value (POT) (Tabari 2021). The MLE calculation method is also presented for estimating the GEV and GPD parameters (location, shape, scale), based on both graphical and statistical goodness-of-fit tests, as well as the gamma distribution, widely applied in rainfall modeling.
3.2.1 Block Maxima (BM) Method
The BM approach for modeling extremes of a (time) series of observations is based on utilizing maximum or minimum values of these observations within a certain sequence of constant length. For a sufficiently large number n of established blocks, the resulting peak values of these n blocks of equal length can be used to fit a suitable distribution to these data. While the block is flexible in its size selection, a trade-off has to be made between bias (small blocks) and variance (large blocks). The length of the sequences is usually chosen to correspond to a certain familiar period, in most cases a year. The resulting vector of annual maxima (or minima) is called the annual maxima (minima) series or simply AMS. According to the Fisher–Tippett–Gnedenko theorem, the distribution of BM can be approximated by a GEV distribution (Coles et al. 2001).
Let X1, …, Xm (m = nk) independent random variables come from an unknown distribution function F. The observations are divided into k blocks and the blocks by n observations each. The maximum values in each k subset, namely BM, are denoted as Y1, …, Yk. More specifically,
The random variables X1, …, Xm do not need to be known. It is sufficient to know the BM. If n is large enough, then it can be assumed that the normalized Yi (i.e., by the maximum of maxima) follows the GEV distribution. The latter implies that Yi also follows GEV, since
and then
where G* is another distribution from the GEV distribution family and, if they exist, the sequences dn, cn, namely the location and scale parameters, respectively (Coles et al. 2001).
Estimating the k parameter of G*(z) and the range of the confidence interval, it can be determined which of the three extreme value distributions (Gumbel, Fréchet, Weibull) the BM follows. Therefore, applying the method presupposes receiving maximum observations (BM) by equalized subsets of data and their adjustment to GEV. Blocks selection is crucial to fit the model in certain data, that is, to calibrate k, a, and b parameters. If blocks are small (small n), then the adjustment of their maximum in GEV will not be satisfactory, resulting in significant errors in parameter estimates. On the other hand, if the blocks are large, the result will be very few BM, and the estimations of GEV parameters b, a, and k will present high variance.
A serious problem with this method is that many extreme events can be neglected due to their exclusion from the largest values of the selected block (Lazoglou et al. 2019). The selection of block size is also important because a very small block can create biases, while if blocks are too large, only a few extreme values can be selected (Coles et al. 2001) that should also respect the data nature. Since rainfall data are studied, it would not be wise to use seasons (trimesters) as periods because, in this case, the maximum precipitation of winter months would be much greater than that for summer months. Th is would cause the BM not to follow the same distribution, which is the essential condition for being able to apply EVT. The other essential condition, namely the independence of the BM, is a logical approach, as a series of independent random variables are initially used to define the BM (Katz et al. 2002).
3.2.2 Peak over Threshold (POT) Method
The POT model of exceedance over a threshold was developed for studying the behavior of a distribution’s right or left tail, that is, one of its basic characteristics. The POT method can be used for the estimation of the distribution of the exceedance of a selected limit, as well as the estimation of the tail shape. This method also involves uncertainty in selecting the optimal threshold u since it is usually chosen subjectively. When a very high value of u is selected, the number of values exceeding this threshold tends to be very small. Consequently, the estimations will have considerable variance. Conversely, if a relatively high threshold is selected, the total number of observations (exceedance) increases, making the estimations more accurate. At the same time, the estimations have significant bias for a minimal value of u. If the initial distribution F, which the observations come from, is known, then it is convenient to define a high threshold
where \(y = X{-}u\).
The threshold selection is highly significant because differences in bias and variance could arise. Some techniques have been developed for optimal threshold selection based on suitable graphs (Langousis et al. 2016; Mascaro 2018). By carefully combining these techniques, we can select the appropriate threshold to examine the GPD. Those used in this study are presented in the following.
3.2.2.1 Mean Excess (Mean Residual Life) Plot
One method for selecting threshold u is through the linearity of the mean excess function e(u) for the GPD. If a random variable X \(\sim \) GPD (b, a, k), then the mean excess function will be (Davison and Smith 1990)
where k < 1 (if k ≥ 1, then the mean excess goes to infinity), u > u0, where u0 is the point to the right of which the graph becomes linear. In practice, we can estimate e(u) from the sampling mean of the Xi exceedances over u, that is, from the empirical mean excess function (or mean residual life plot, which aids in the selection of a threshold for the GPD)
where nu is the number of Xi that exceed u, and we select the threshold u to the right of which the graphs become approximately linear. If a degressive trend is presented in the graph, then the tail of the distribution will be light and the parameter k will be less than 0, while an increasing trend implies a heavy tail, and therefore k > 0 is expected. The empirical mean excess function consists of points that are in the set
where nu is the number of Xi observations exceeding the threshold u, \({x}_{i,{n}_{u}}\) is the ith observation that exceeds the threshold u, and xmax is the maximum number of observations. The confidence intervals can be added to this graph, such as the empirical mean excess, which assumes that the data are normally distributed (central limit theorem). However, the normality is invalid for high thresholds, as many instances of exceedance exist (Nerantzaki and Papalexiou 2019).
3.2.2.2 Threshold Choice Plot
This technique estimates the parameters k and a* of the GPD for the various values of threshold u. The estimation of k should not be affected by u, while the estimation of a* should vary linearly with the threshold.
Let X \(\sim \) GPD (a0, b0, k0) and b1 the mean value of another threshold, with b1 > b0. The random variable X|X > b1 also follows the GPD, with parameters a1 = a0 + k0 (b0 − b1) and k1 = k0. Let us assume that
is the new customization, with a* independent of b1. Then, the estimations of a* and k1 are stable for all b1 > b0, with b0 being the appropriate threshold for the asymptotic approach. The selection limits of the threshold represent the points defined by
where xmax is the maximum value of x observations (Curceac et al. 2020).
3.2.2.3 Q–Q Plot
A quantile–quantile (Q–Q) plot is a graphical technique used to determine whether two data sets have a common distribution or whether a data set follows the pattern of a certain distribution. One main advantage of Q–Q plots is that the data sets being compared can have different sizes. This is important for the present investigation, as the size of the extreme values data set defined by the BM method is often different from that of the POT even though they characterize the same original data set (Lazoglou et al. 2019).
The tail behavior of the distribution can be checked graphically through a Q–Q plot against the GPD considering the specified threshold. This graph allows one to check whether the threshold exceedance distribution has a heavy or light tail by selecting k = 0. In this case, the Q–Q plot of the tail’s observations is against the exponential distribution. If the points in the Q–Q plot form a straight line, then the respective family of distributions (exponential distribution) provides a good fit to the data. If the exceedances of the threshold u follow a distribution with a light tail, then the Q–Q plot will be approximately linear; if a heavy tail, then the Q–Q plot will present a hollow deviation from the adaptation straight line; if a moderate tail, the Q–Q plot will present a curved deviation from the straight line (Curceac et al. 2020).
3.2.3 Method for Calculating GEV and GPD Parameters (Maximum Likelihood Estimation)
Extreme events may have significant impacts and severe consequences. It is difficult to evaluate them due to the scarcity of observations. As a natural and precise model for rare events, extreme value statistics exhibit strong potential in investigating such events. Based on semi-parametric models proposed by EVT, a major issue in extreme value statistics is estimating the parameters in the semi-parametric model. The most important parameter is the so-called extreme value index, which indicates the tail shape of a distribution function (Zhou 2010).
Thus, the characterization of distribution assessments is identified from their parameters, namely location, shape, and scale, which can be estimated using various methods, such as MLE. The MLE procedure has been adopted for evaluating the extreme value index and exhibits good performance in practice (Zhou 2010; Nerantzaki and Papalexiou 2022). The theoretical properties of MLE for the extreme value index have been gradually established over the last decade. Theoretically, it is necessary to clarify the extent to which the maximum likelihood estimator is a proper estimator for the extreme value index. In statistics, MLE is a method for estimating the parameters of a probability distribution by maximizing a likelihood function so that under the assumed statistical model, the observed data are most likely (Gori et al. 2022).
The likelihood function expresses the relevant likelihood of the existing observations as a function of the parameters \(\theta ,\)
The parameter value θ maximizes the above likelihood function using the MLE method. MLE is one of the most widely used methods due to its reliability for large samples and its ability to easily adapt to more complicated models. It is also advantageous as this method is considered a relatively simple and direct procedure to estimate unknown parameters. The estimator is unbiased, completely capable, and normally distributed in the asymptotic sense (Zhou 2010).
3.2.4 Gamma Distribution Method
There are two versions of Gamma distribution. It involves a maximum of three parameters: shape, scale, and threshold, depending on the case.
(i) Threshold parameter: The threshold defines the smallest value in a gamma distribution. Compared with previous methods, this parameter is now called the location, not rainfall. All values must be greater than the threshold. Negative threshold values allow the distribution to handle both negative and positive values. Zero allows only positive values. A two-parameter gamma distribution has the threshold set to zero.
(ii) Shape parameter (α): The shape parameter for the gamma distribution specifies the number of events modeled. The shape must be positive, but it must not be an integer. Statisticians denote the shape parameter using alpha (α). When the shape of a gamma distribution is an integer, it is known as an Erlang distribution.
(iiia) Scale parameter (b): The scale parameter for the gamma distribution represents the mean time between events. Statisticians denote this parameter using beta (b).
(iiib) Rate parameter (λ): The rate form of the scale parameter, lambda (λ), for the gamma distribution is the mean rate of occurrence during one unit of time in the Poisson distribution. The equations (reciprocals) used to convert between the scale (b) and rate (λ) forms are b = 1/λ and λ = 1/b.
As mentioned above, the gamma distribution mainly predicts the time until future events occur at certain thresholds. At the same time, it generally applies to rainfall studies as well as to extreme events. There are two versions of this distribution. The three-parameter gamma distribution has three parameters: shape, scale, and threshold (location). The parameterization only with a and b appears to be more common in econometrics and certain other applied fields, so it is used for this study (McBride et al. 2022).
4 Results and Discussion
4.1 Block Maxima
The EVT, with application to GEV distribution, requires the existence of maximum observations (BM) from equally sized data subsets and their adjustment to the specific distribution. Therefore, the rainfall data analysis for the four Chalkidiki stations will be divided into blocks, where each block is the maximum annual observation per year as a time period. Figure 3 shows the time series of the annual maxima for each of those four stations. In this figure, the yearly maximum precipitation for each area is shown, providing insight into its behavior, including the years with the most extreme precipitation measurements and the intervals between these maximum values. For instance, the S01 station measures high rainfall approximately every 4 years. However, the S04 station has shown intense rainfall in the last few years.
The analysis of the blocks of each station, applying the extreme value methodology employing the GEV distribution using MLE (Nerantzaki and Papalexiou 2022), is presented in Table 3.
The return level (in this case, precipitation) is expected to be exceeded on average once every m time points (in this case, years). The return period is the expected length of time before the exceedance of a particular return level. Since we have defined the GEV distribution parameters that adjust better to the data, the return level estimation for specific years with the corresponding confidence intervals can be estimated (Table 4) (Tabari 2021). For example, the return level for station S01 shows that with probability p = 1/T = 1/10 = 0.1, the estimated precipitation level is 135.61 (in 95 % CI from 57.68 to 213.83 mm; note that the return value is defined as a value that is expected to be exceeded on average once every interval of time T [with a probability of 1/T]). Thus, an annual maximum observation (block) is estimated to exceed the rainfall value of 135.61 mm with probability p = 0.1 or equivalent, exceeding the value of 135.61 mm on average every 10 years.
The data fitting, that is, applying the BM, is done with knowledge of the parent distribution (GEV). Thus, according to Table 4, the following observations may arise: The standard errors of a and b parameters are relatively small with MLE, which reduces the uncertainty. The value k is slightly negative and close to zero, implying that it takes the form of a Gumbel distribution with an exponential tail. Thus, there is a low probability of observing extreme events. In addition, according to the 95% confidence interval for this parameter, it is observed that the parameter is negative and close to zero on the lower limit of the interval, so it is almost certain that the data will adjust to the Gumbel distribution (Lee et al. 2015).
Conclusions regarding the spatiality can be derived by observing the shape parameter. The areas closer to each other also have comparable parameter values (column 5 of Table 3). This may mean that location plays a vital role in the occurrence of an extreme event and that close areas present similar behaviors. In Table 4 we observe that the higher predicted rainfall in the short term and long term is approximately the same among the stations except for station S04, which presents much higher quantities of rainfall in the future.
The check on the extent of the data fit to the model is given in Fig. 6 in Appendix A, from which it is estimated that the data adjustment is relatively satisfactory in all the stations. The plots should not deviate much from the straight line, and the histogram should match the curve. The return level plot implies the expected return level for each return period. More specifically, the quantile plot compares the model quantiles against the data (empirical) quantiles. A quantile plot that deviates significantly from a straight line suggests that the model assumptions may be invalid for the data plotted. The return level plot shows the return period against the return level and delivers an estimated 95% confidence interval. The probability–probability (P–P plot) and the percentage points graphs (Q–Q plot) have a good adjustment to the model without deviating much from the straight line. The same conclusion can be drawn from the density diagram. Therefore, the parameter estimators can be reliably used to extract conclusions (i.e., not biased) through the return level estimation.
4.2 POT Method
One of the most difficult practical problems in applying the POT method is the selection of the appropriate threshold over which the GPD adjusts best to the data. In this study, we select the threshold u using the mean residual life plot and the threshold choice plot (see Sect. 3). As already mentioned, the correct selection of u is particularly significant since a very small value of u could cause an alteration of the asymptotic process. In contrast, a large value of u could give excessive variance values due to the small number of cases that exceed the selected threshold. The idea is to find the lowest threshold where the plot is nearly linear, taking into account the 95% confidence bounds.
The analysis of the threshold selection will be presented for station S01 (Fig. 4). First, the mean residual life plot (mrlplot), which includes all station rainfall data, is examined to identify the range of candidate thresholds. Then a range of several thresholds is found, which are applied to prepare the threshold choice plot. The mean residual life plot plots the average excess value over a given threshold for a series of thresholds. The goal is to find the lowest threshold such that the graph is linear with increasing thresholds within uncertainty. The graph is approximately linear on the interval [20, 40]. For greater thresholds, the graph shows considerable instability, which is due to the small size of the data set.
In Fig. 4a, a visual analysis of the data is performed, selecting a threshold limit range from which the plot becomes approximately linear (see Table 5 below). After finding the range of the threshold for the threshold choice plot (Fig. 4b), the k (Fig. 4c) and a* (Sect. 3.2.2.2) parameters are estimated for various threshold values. The minimum threshold value of the approximate linear interval (Fig. 4b) is then selected (Table 5) so that it fulfills certain requirements of the method: the k and a* parameters should ensure linear stability of the threshold within the corresponding confidence interval (95% in our case) (Gilleland et al. 2005). The plots for the other three stations used for the threshold selection are shown in Fig. 7 in Appendix A.
As explained previously, applying the specific thresholds selected for data modeling exceeding a given threshold level, the GPD parameters are estimated with MLE (Table 6). In addition, Table 7 shows the return rainfall levels for specific years with the corresponding confidence intervals. The return rainfall levels that are exceeded on average once every T observations (years) were calculated using the determined GPD distribution parameters (Table 6) and the specific return period (1/T) using the probability of occurrence of rainfall above a threshold (Martins et al. 2020).
From Table 6 it is observed that the shape parameter, \(\widehat{k}\), and the corresponding lower 95% confidence interval is close to zero. Based on Table 7, heavy rainfall is projected to occur primarily in the S02 and S04 stations in the long term. The evaluation of the extent of the data fit to the model is given in Table 6 and Fig. 8 in Appendix A. The log-likelihood is a quantitative measure of model fit. Higher likelihoods correspond to a higher probability of the model producing the observed data (the data fit the model well), considering the relative p values (Kim et al. 2017). It can be deduced from Fig. 9 that the data adjustment is quite satisfactory in all representative areas under visual inspection of the plots generated (Hamdi et al. 2014). This can be seen by the probability graphs (P–P plot) and the percentage-point graphs (Q–Q plot), as they have very good adjustments in the model without significant deviations from the straight line. Therefore, the parameter estimations can be reliably applied to extract conclusions through the return period estimation.
4.3 Gamma Distribution
This study uses the two-parameter version of the gamma distribution, and the following results are derived after applying it with MLE (Table 8). Based on Fig. 9 in Appendix A, the data seem to adjust to a gamma distribution.
Fitting the gamma distribution to the data, it is observed that the shape parameter a is relatively small, and the scale parameter b is close to zero. Table 9 displays the return levels and accompanying confidence intervals for the various years.
According to Table 9, stations S02 and S04 will most likely see significant precipitation levels in the long term (return period of 50 years). Compared with the GPD method, GPD predicts higher precipitation levels in the future. However, Table 9 shows similar increasing trends for individual stations at the specified return periods with the GEV (Table 4) and GPD (Table 7). Station S01 shows a smoother increase over the years and lower levels than the other stations. That may be due to the geomorphology and the location of the station. The same behavior is shown by S01 with the GPD fitting (Table 7). On the other hand, the S03 station has a similar trend in gamma and GEV methods and a lower long-term precipitation return level with the GEV distribution (Table 4). Higher short- and long-term precipitation levels for this station are projected by the GPD than by the gamma and GEV distributions.
5 Conclusions
The present study examines different methods for the statistical analysis of extreme precipitation in a small region of great ore-mining interest. For this purpose, daily precipitation values were used from four meteorological stations around the study area. The data cover a long period of approximately 15 years. To investigate the extreme precipitation events, the EVT (BM and POT techniques) and the gamma distribution were applied. Graphical and statistical goodness-of-fit tests were performed to determine the most appropriate distribution for the data characterization. Then, the GEV, GPD, and gamma distributions were fitted in the data sets using the MLE method as a parameter estimator. Finally, the return levels of extreme rainfall for the area were assessed.
The precipitation levels for the chosen return periods for the three distributions were calculated along with their corresponding 95% confidence intervals. Using the MLE parameter estimation method, it was concluded that it could satisfactorily describe the precipitation data set, and the estimation with the highest probability density of the shape parameter k, which defined the behavior of the tail of the distribution, was positive but close to zero in the GEV and GPD models. Therefore, the GEV took the form of a Gumbel distribution with an exponential tail, while GPD took the form of an exponential distribution with a normal tail and parameter \(\frac{1}{{{{a}}}^{*}}\). Furthermore, the location parameter of the GEV distribution showed some correlation between the areas regarding their spatiality: the closer the areas, the closer their parameters. Since the meteorological stations are all in the mining area, the b parameter did not show obvious differences.
With regard to selecting the optimal distribution for the precipitation data, the gamma distribution has difficulties describing extreme events. In addition, GEV and GPD were proposed as the most suitable distributions for extreme precipitation events in a series of similar works (Roth et al. 2014; Dyrrdal et al. 2015; Anagnostopoulou and Tolika 2012; Kyselý 2010). According to the goodness-of-fit tests performed in this study, although the GEV and GPD distributions describe the extremes satisfactorily, the GPD provides higher log-likelihood values. This is because the GPD is applied on data sets after setting a threshold (POT methodology), while GEV uses one annual value with maximum precipitation (BM).
The analysis also indicated that the stations could be classified with similar characteristics since those with the highest daily extreme rainfall amounts (S02 and S04) also showed the largest precipitation amounts for the return periods. This can be explained by the climatological conditions and geomorphology of the stations' location, as the study area is primarily mountainous.
Finally, conclusions concerning the extreme values could also be derived from previous analysis. Considering that the analysis was conducted using daily and not hourly rainfall data, we could never be sure about the rainfall duration. Rainfall of over 50 mm/h is a statistically rare phenomenon, dangerous and often devastating due to flooding, and it is officially characterized as an extreme weather event. However, rainfall of over 100 mm/day is undoubtedly a significantly high amount of rainfall in 24 h. Scientists argue that Greece will suffer from extensive droughts interspersed with heavy rains over very short periods of a matter of hours (Zerefos et al. 2011). Indeed, Medicane IANOS (Androulidakis et al. 2023) and Storm Daniel (He et al. 2023) were two such extreme events. Such phenomena are dangerous because the rainwater system cannot absorb huge volumes of water in such a short time, along with a potential overflow of rivers, and thus the flooding of areas is becoming a certainty. Storm Daniel affected the area of Chalkidiki, where the study area is located. Specifically, in the area where station S01 is situated, around 50 mm of rainfall over 24 h was recorded during 4 and 5 September 2023. The rainfall level is included in the prediction interval of the BM method using the GEV distribution for a 2-year return period, indicating its suitability for modeling extreme events.
Moreover, the models predict a high probability of extreme rainfall of over 100 mm/day in the study area over the short term. The stations with the highest risk of extensive flooding are S02 and S04, based on the return rainfall levels. Therefore, adaptive measures based on the area’s geology must be taken to prevent life-threatening situations and economic disasters due to climate change.
References
Anagnostopoulou C, Tolika K (2012) Extreme precipitation in Europe: statistical threshold selection based on climatological criteria. Theoret Appl Climatol 107(3):479–489
Androulidakis Y, Makris C, Mallios Z, Pytharoulis I, Baltikas V, Krestenitis Y (2023) Storm surges and coastal inundation during extreme events in the Mediterranean Sea: the IANOS Medicane. Nat Hazards 117(1):939–978
Cancelliere A (2017) Non stationary analysis of extreme events. Water Resour Manag 31(10):3097–3110
Coles S, Bawa J, Trenner L, Dorazio P (2001) An introduction to statistical modeling of extreme values, vol 208. Springer, London
Curceac S, Atkinson PM, Milne A, Wu L, Harris P (2020) An evaluation of automated GPD threshold selection methods for hydrological extremes across different scales. J Hydrol 585:124845
Davison AC, Smith RL (1990) Models for exceedances over high thresholds. J R Stat Soc Ser B (Methodol) 52(3):393–425
Davison AC, Huser R, Thibaud E (2013) Geostatistics of dependent and asymptotically independent extremes. Math Geosci 45(5):511–529
de Sousa Araújo A, Silva AR, Zárate LE (2022) Extreme precipitation prediction based on neural network model—a case study for southeastern Brazil. J Hydrol 606:127454
Douka M, Karacostas T (2018) Statistical analyses of extreme rainfall events in Thessaloniki, Greece. Atmos Res 208:60–77
Dung NB, Long NQ, Goyal R, An DT, Minh DT (2022) The role of factors affecting flood hazard zoning using analytical hierarchy process: a review. Earth Syst Environ 6(3):697–713
Dyrrdal AV, Lenkoski A, Thorarinsdottir TL, Stordal F (2015) Bayesian hierarchical modeling of extreme hourly precipitation in Norway. Environmetrics 26(2):89–106
Emery X (2008) Substitution random fields with Gaussian and gamma distributions: theory and application to a pollution data set. Math Geosci 40(1):83–99
Gilleland E, Katz RW, Young G (2005) Extremes toolkit (extRemes): weather and climate applications of extreme value statistics. A portable document file (pdf). National Center for Atmospheric Research, Boulder, CO, U.S.A
Gori A, Lin N, Xi D, Emanuel K (2022) Tropical cyclone climatology change greatly exacerbates US extreme rainfall–surge hazard. Nat Clim Change 12(2):171–178
Gu X, Ye L, Xin Q, Zhang C, Zeng F, Nerantzaki SD, Papalexiou SM (2022) Extreme precipitation in China: a review on statistical methods and applications. Adv Water Resour 163:104144
Hamdi Y, Bardet L, Duluc CM, Rebour V (2014) Extreme storm surges: a comparative study of frequency analysis approaches. Nat Hazard 14(8):2053–2067
He K, Yang Q, Shen X, Dimitriou E, Mentzafou A, Papadaki C, Stoumboudi M, Anagnostou EN (2023) Brief communication: Storm Daniel flood impact in Greece 2023: mapping crop and livestock exposure from SAR. Nat Hazards Earth Syst Sci Discuss 2023:1–16
Jackson LE (2013) Frequency and magnitude of events. In: Bobrowsky PT (ed) Encyclopedia of natural hazards. Springer, Dordrecht, pp 359–363
Katz RW, Parlange MB, Naveau P (2002) Statistics of extremes in hydrology. Adv Water Resour 25(8–12):1287–1304
Kim H, Kim S, Shin H, Heo J-H (2017) Appropriate model selection methods for nonstationary generalized extreme value models. J Hydrol 547:557–574
Kroll T, Müller D, Seifert T, Herzig PM, Schneider A (2002) Petrology and geochemistry of the shoshonite-hosted Skouries porphyry Cu–Au deposit, Chalkidiki, Greece. Miner Depos 37:137–144
Kyselý J (2010) Coverage probability of bootstrap confidence intervals in heavy-tailed frequency models, with application to precipitation data. Theoret Appl Climatol 101(3):345–361
Lagouvardos K, Kotroni V, Bezes A, Koletsis I, Kopania T, Lykoudis S, Mazarakis N, Papagiannaki K, Vougioukas S (2017) The automatic weather stations NOANN network of the National Observatory of Athens: operation and database. Geosci Data J 4(1):4–16
Laio F, Di Baldassarre G, Montanari A (2009) Model selection techniques for the frequency analysis of hydrological extremes. Water Resour Res. https://doi.org/10.1029/2007WR006666
Langousis A, Mamalakis A, Puliga M, Deidda R (2016) Threshold detection for the generalized Pareto distribution: review of representative methods and application to the NOAA NCDC daily rainfall database. Water Resour Res 52(4):2659–2681
Lazoglou G, Anagnostopoulou C, Tolika K, Kolyva-Machera F (2019) A review of statistical methods to analyze extreme precipitation and temperature events in the Mediterranean region. Theoret Appl Climatol 136(1):99–117
Lee S, Won J-S, Jeon SW, Park I, Lee MJ (2015) Spatial landslide hazard prediction using rainfall probability and a logistic regression model. Math Geosci 47(5):565–589
Lima AO, Lyra GB, Abreu MC, Oliveira-Júnior JF, Zeri M, Cunha-Zeri G (2021) Extreme rainfall events over Rio de Janeiro State, Brazil: characterization using probability distribution functions and clustering analysis. Atmos Res 247:105221
Martins ALA, Liska GR, Beijo LA, de Menezes FS, Cirillo MÂ (2020) Generalized Pareto distribution applied to the analysis of maximum rainfall events in Uruguaiana, RS Brazil. SN Appl Sci 2(9):1479
Mascaro G (2018) On the distributions of annual and seasonal daily rainfall extremes in central Arizona and their spatial variability. J Hydrol 559:266–281
McBride CM, Kruger AC, Dyson L (2022) Changes in extreme daily rainfall characteristics in South Africa: 1921–2020. Weather Clim Extrem 38:100517
Nagy B, Mohssen M, Hughey K (2017) Flood frequency analysis for a braided river catchment in New Zealand: comparing annual maximum and partial duration series with varying record lengths. J Hydrol 547:365–374
Nerantzaki SD, Papalexiou SM (2019) Tails of extremes: advancing a graphical method and harnessing big data to assess precipitation extremes. Adv Water Resour 134:103448
Nerantzaki SD, Papalexiou SM (2022) Assessing extremes in hydroclimatology: a review on probabilistic methods. J Hydrol 605:127302
Rahimpour V, Zeng Y, Mannaerts CM, Su Z (2016) Attributing seasonal variation of daily extreme precipitation events across The Netherlands. Weather Clim Extrem 14:56–66
Roth M, Buishand TA, Jongbloed G, Klein Tank AMG, van Zanten JH (2014) Projections of precipitation extremes based on a regional, non-stationary peaks-over-threshold approach: a case study for the Netherlands and north-western Germany. Weather Clim Extrem 4:1–10
Salas J, Obeysekera J, Vogel R (2018) Techniques for assessing water infrastructure for nonstationary extreme events: a review. Hydrol Sci J 63(3):325–352
Schär C, Ban N, Fischer EM, Rajczak J, Schmidli J, Frei C, Giorgi F, Karl TR, Kendon EJ, Tank AM, O’Gorman PA (2016) Percentile indices for assessing changes in heavy precipitation events. Clim Change 137(1):201–216
Sebille Q, Fougères A-L, Mercadier C (2017) Modeling extreme rainfall A comparative study of spatial extreme value models. Spat Stat 21:187–208
Serinaldi F, Kilsby CG (2014) Rainfall extremes: toward reconciliation after the battle of distributions. Water Resour Res 50(1):336–352
Soleh AM, Wigena AH, Djuraidah A, Saefuddin A (2016) Gamma distribution linear modeling with statistical downscaling to predict extreme monthly rainfall in Indramayu. In: 2016 12th International conference on mathematics, statistics, and their applications (ICMSA), pp 134–138. https://doi.org/10.1109/ICMSA.2016.7954325
Special Secretariat for Water of Greece (2021) Development of the river basin management plan of the river basins of central Macedonia river basin district (GR10). Ministry of Environment, Energy and Climate Change, Athens, p 247
Tabari H (2021) Extreme value analysis dilemma for climate change impact assessment on global flood and extreme precipitation. J Hydrol 593:125932
Towler E, Llewellyn D, Prein A, Gilleland E (2020) Extreme-value analysis for the characterization of extremes in water resources: a generalized workflow and case study on New Mexico monsoon precipitation. Weather Clim Extrem 29:100260
Vrban S, Wang Y, McBean EA, Binns A, Gharabaghi B (2018) Evaluation of stormwater infrastructure design storms developed using partial duration and annual maximum series models. J Hydrol Eng 23(12):04018051
Wang C-H, Holmes JD (2020) Exceedance rate, exceedance probability, and the duality of GEV and GPD for extreme hazard analysis. Nat Hazards 102(3):1305–1321
Zerefos C, Repapis C, Giannakopoulos C, Kapsomenakis J, Papanikolaou D, Papanikolaou M, Poulos S, Vrekoussis M, Philandras C, Teslioudis G (2011) The environmental, economic and social impacts of climate change on Greece. National Bank of Greece, Athens
Zhou C (2010) The extent of the maximum likelihood estimator for the extreme value index. J Multivar Anal 101(4):971–983
Zorzetto E, Botter G, Marani M (2016) On the emergence of rainfall extremes from ordinary events. Geophys Res Lett 43(15):8076–8082
Acknowledgements
This research project was implemented in the framework of the Hellenic Foundation for Research and Innovation (H.F.R.I.), call “Basic Research Financing (Horizontal support for all Sciences)” under the National Recovery and Resilience Plan “Greece 2.0” funded by the European Union NextGenerationEU program (H.F.R.I. Project Number 16537).
Funding
Open access funding provided by HEAL-Link Greece.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lymperi, OA., Varouchakis, E.A. Modeling Extreme Precipitation Data in a Mining Area. Math Geosci (2024). https://doi.org/10.1007/s11004-023-10126-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11004-023-10126-1