Generalized Pareto distribution applied to the analysis of maximum rainfall events in Uruguaiana, RS, Brazil

Martins, Amanda Larissa Alves; Liska, Gilberto Rodrigues; Beijo, Luiz Alberto; Menezes, Fortunato Silva de; Cirillo, Marcelo Ângelo

doi:10.1007/s42452-020-03199-8

Generalized Pareto distribution applied to the analysis of maximum rainfall events in Uruguaiana, RS, Brazil

Research Article
Published: 05 August 2020

Volume 2, article number 1479, (2020)
Cite this article

Download PDF

SN Applied Sciences Aims and scope Submit manuscript

Generalized Pareto distribution applied to the analysis of maximum rainfall events in Uruguaiana, RS, Brazil

Download PDF

4439 Accesses
16 Citations
Explore all metrics

Abstract

The rainfall monitoring allows us to understand the hydrological cycle that not only influences the ecological and environmental dynamics, but also affects the economic and social activities. These sectors are greatly affected when rainfall occurs in amounts greater than the average, called extreme event; moreover, statistical methodologies based on the mean occurrence of these events are inadequate to analyze these extreme events. The Extreme Values Theory provides adequate theoretical models for this type of event; therefore, the Generalized Pareto Distribution (Henceforth GPD) is used to analyze the extreme events that exceed a threshold. The present work has applied both the GPD and its nested version, the Exponential Distribution, in monthly rainfall data from the city of Uruguaiana, in the state of Rio Grande do Sul in Brazil, which calculates the return levels and probabilities for some events of practical interest. To support the results, the goodness of fit criteria is used, and a Monte Carlo simulation procedure is proposed to detect the true probability distribution in each month analyzed. The results show that the GPD and Exponential Distribution fits to the data in all months. Through the simulation study, we perceive that the GPD is more suitable in the months of September and November. However, in January, March, April, and August the, Exponential Distribution is more appropriate, and in the other months, we can use either one.

A review of statistical methods to analyze extreme precipitation and temperature events in the Mediterranean region

Article 14 April 2018

Modeling Extreme Precipitation Data in a Mining Area

Article Open access 31 January 2024

Estimating return periods for daily precipitation extreme events over the Brazilian Amazon

Article 14 August 2015

1 Introduction

Rainfall is vital for life on Earth [1], but its occurrence in high magnitude can cause damage and losses, usually causing flooding, destruction of buildings and crops, soil erosion, breaches of dikes and dams, among others [2, 3]. Damage in cities tends to be more severe because of the rapid urbanization and installation of complex infrastructure [4]. In addition, the frequency of extreme weather events has shown an increasing trend in various regions of the planet [2, 5]. In addition, the frequency of extreme weather events has shown an increasing trend in several regions of the planet [6,7,8], and the southern region of Brazil has suffered from the occurrence of these events [2, 5].

To minimize negative impacts or avoid economic, social and environmental losses, it is necessary to plan activities and constructions based on the probabilistic forecast of the occurrence of maximum precipitation in a given location [9]. For the forecasting process the fit of mathematical statistical models to the data, which can study the phenomena with different approaches, as well as the occurrence of extreme values, temporal distribution, spatial distribution, the intensity of the phenomenon, among others [10,11,12].

Statistical approaches based on the analysis of extreme values have shown promising results in the forecasting of these events in several areas of science [13,14,15,16]. One of the models extensively employed, for this purpose, in various scientific fields such as insurance, finance, meteorology, and the environment is the Generalized Pareto Distribution [17, 18].

Given the use of probabilistic models, assessing their goodness of fit is an equally important task. In the analysis of extreme events, this stage is practically not taken into account, even when it is a very consolidated methodology. Goodness of fit tests such as Kolmogorov-smirnov, chi-squared, and likelihood ratio are widely used [17, 19, 20]. However, as recommended by [21], the fit of the distributions using estimates of the parameters of the fitted distributions can lead to the occurrence of type II error, and, to circumvent this fact, proposes a simulation study. In general, these simulation studies are based on Monte Carlo procedures [22, 23].

Hence, the present work aims to fit the Generalized Pareto Distribution to the maximum monthly rainfall in the city of Uruguaiana, Rio Grande do Sul state, Brazil, as well as to calculate the probability of some extreme events occurring, calculate return levels of extreme rainfall events and its confidence intervals in periods of 2, 5, 10, 30, 50 and 100 years.

2 Methodology

The data set was obtained from the meteorological database for teaching and research (BDMEP), from January 1961 to April 2019, made available by the National Institute of Meteorology (INMET) and registered at the Uruguaiana—Rio Grande do Sul state weather station. The data are grouped in monthly periods and in each month the threshold method is used. Consequently, the highest values of rainfall above a sufficiently high threshold have been estimated according to the POT (peaks over threshold) methodology. As a result, they are analyzed by Generalized Pareto Distribution.

According to Coles [24], as well as Generalized Extreme Values (Henceforth GEV) distribution is the limit distribution of the block maxima, and the GPD appears as the parametric form for limit distribution for threshold excesses, whose probability density function is given by

$$\begin{aligned} f\left( {x\left| {\xi ,\sigma ,u} \right. } \right) = \left\{ \begin{array}{l} \frac{1}{\sigma }\left[ {1 + \xi \left( {\frac{{x - u}}{\sigma }} \right) } \right] ^{ - \left( {1 + \frac{1}{\xi }} \right) } ,\,x \ge 0\,{\mathrm{{if}}}\,\xi \ne 0 \\ \frac{1}{\sigma }\exp \left( { - \frac{{x - u}}{\sigma }} \right) ,\,0 \le x \le \frac{1}{{\left| \xi \right| }}\,{\mathrm{{if}}}\,\xi \rightarrow 0 \\ \end{array} \right. \end{aligned}$$

(1)

The distribution function is given by

$$\begin{aligned} F\left( {x\left| {\xi ,\sigma ,u} \right. } \right) = \left\{ \begin{array}{l} 1 - \left[ {1 + \xi \left( {\frac{{x - u}}{\sigma }} \right) } \right] ^{ - \frac{1}{\xi }} ,\,\,\xi \ne 0 \\ 1 - \exp \left( { - \frac{{x - u}}{\sigma }} \right) ,\,\xi \rightarrow 0 \\ \end{array} \right. \end{aligned}$$

(2)

where u is the threshold, $\sigma $ is the scale parameter and $\xi $ the shape parameter. In priori, the threshold should be known and it is described in Sect. 2.1. The parameters $\sigma $ and $\xi $ must be estimated from the data and it is described in Sect. 2.2. Through the GPD distribution, three classes of standard distributions can be obtained: Type I: Exponential ($\mathop {\lim }\limits _{\xi \rightarrow 0} F\left( {x\left| {\xi ,\sigma ,u} \right. } \right) $), Type II: Pareto ($\xi >0$) and Type III: Beta or ordinary Pareto ($\xi <0$).

2.1 Threshold selection

To choose the appropriate threshold value, an exploratory graphical tool was used based on the linearity of the mean excesses function. This plot consists of the mean excesses above several thresholds with the threshold itself (Fig. 1). This plot is also known as mean residual life plot [25].

On the other hand, the mean residual life plot can be difficult to interpret as a threshold selection method. A complementary technique is employed, and it is based on fitting the GPD at a variety of thresholds, and on looking at the stability of the parameter estimates [24]. This plot is known as threshold choice plot (Fig. 2).

The choice of the very high threshold may result in a small number of observations, influencing the variance of the estimators. However, a threshold that does not satisfy the theoretical assumptions may result in distorted estimates. Thus, one should choose the threshold that makes the mean residual life plot and the functions of the parameters $\sigma $ and $\xi $ more or less linear [26].

2.2 Parameter estimation

After selection of the threshold, the GPD parameters were estimated by the maximum likelihood method. The maximum likelihood estimators maximize the log-likelihood function. Suppose $y_1,\ldots ,y_k$ are the k excesses of a threshold u [24]. For $\xi \ne 0$

$$\begin{aligned} l\left( {\sigma ,\xi } \right) = - k\log \left( \sigma \right) - \left( {1 + \frac{1}{\xi }} \right) \sum \limits _{i = 1}^k {\log \left( {1 + \xi \frac{{x_i }}{\sigma }} \right) }, \end{aligned}$$

(3)

where $({1 + \sigma ^{ - 1} \xi x_i }) > 0\,{\mathrm{{for}}}\,i = 1,\ldots ,k$; in other way, $l( {\sigma ,\xi }) = - \infty $. In the $\xi \rightarrow 0$ case, the log-likelihood function is given by

$$\begin{aligned} l\left( \sigma \right) = - k\log \left( \sigma \right) - \frac{1}{\sigma }\sum \limits _{i = 1}^k {x_i }. \end{aligned}$$

(4)

The maximum likelihood estimators of parameters $\sigma $ and $\xi $ are obtained through the solution of the homogeneous equations, given by partial derivatives of log-likelihood with respect each parameter. The estimation of $\sigma $ and $\xi $ requires the use of a numerical maximization, usually any method for this works, like Newton–Raphson, Simulated Annealing, Fisher’s scoring or its variations [27].

2.3 Hypothesis testing

With the parameters estimated, goodness of fit criteria of the GPD model were evaluated. The Kolmogorov Smirnov (KS) test was used to compare the theoretical cumulative distribution and the empirical cumulative distribution [28]. The Ljung Box (LB) independence test, whose statistics are compared with the $\alpha $-th quantile of the chi-squared distribution with one degree of freedom. The Mann-Kendall test was used to determine if the series has a statistically significant time trend [29]. When very small values of p-value are found, it indicates evidence in favor of the alternative hypothesis, that is, there is some tendency to modify the behavior of the analyzed series.

For the maximum likelihood estimates, one can test if $\xi $ is statistically null. Then, to test the null hypothesis that the extremes distributions is exponential, we use the likelihood ratio test (LT), whose test statistic is

$$\begin{aligned} \Lambda = 2\left[ {l\left( {{\hat{\sigma }} ,{\hat{\xi }} } \right) - l\left( {{\hat{\sigma }} } \right) } \right] , \end{aligned}$$

(5)

where ${l\left( {{\hat{\sigma }} } \right) }$ and ${l\left( {{\hat{\sigma }} ,{\hat{\xi }} } \right) }$ represent the log-likelihoods respectively using the Exponential and GPD densities with the respective maximum likelihood estimates [26]. Thus, the null hypothesis that $\xi = 0 $ is rejected if $\Lambda $ is greater than the $\alpha $-th quantile of the chi-squared distribution with 1 degree of freedom. Alternatively, if the p-value of the test is less than the significance level, the null hypothesis is rejected. For all tests we adopt $1\%$ as significance level

2.4 Probability of excesses and return levels

According to Eq. 2 in the $\xi \ne 0$ case, to estimate the probability of occurrence of precipitation above a threshold, we have that

$$\begin{aligned} \Pr \left[ {X> x\left| {X > u} \right. } \right] = \left[ {1 + \xi \left( {\frac{{x - u}}{\sigma }} \right) } \right] ^{ - \frac{1}{\xi }}. \end{aligned}$$

(6)

However, in equation 6 it calculates the probability of occurrence of a given maximum precipitation that is higher than the adopted threshold. It is desired to calculate the probability of occurrence of precipitation above a maximum value. Therefore, equation 6 is simplified in

$$\begin{aligned} \Pr \left[ {X > x} \right] = \lambda \left[ {1 + \xi \left( {\frac{{x - u}}{\sigma }} \right) } \right] ^{ - \frac{1}{\xi }}, \end{aligned}$$

(7)

where $\lambda = \Pr \left[ {X > u} \right] $. Hence, the level $x_m$ that is exceeded on average once every m observations is the solution of

$$\begin{aligned} \lambda \left[ {1 + \xi \left( {\frac{{x_m - u}}{\sigma }} \right) } \right] ^{ - \frac{1}{\xi }} = \frac{1}{{m}}. \end{aligned}$$

(8)

Therefore, the equation 8 leads to the m-observation return level. For representation, it is often more convenient to give return levels on an annual scale, so that the N-year return level is the level expected to be exceeded once every N years. If there are $n_x$ observations per year, this corresponds to the m-observations return level, where $m = N \times n_x$ [24]. Hence, the N-Year return level is defined by

$$\begin{aligned} {\widehat{z}}_N = {\widehat{u}} + \frac{{{\widehat{\sigma }} }}{{{\widehat{\xi }} }}\left[ {\left( {Nn_x {\hat{\lambda }} } \right) ^{{\widehat{\xi }} } - 1} \right] \end{aligned}$$

(9)

where $n_x$ is the number of days to be analyzed. We analyzed monthly rainfall data, so $n_x = 31, 30, 28$ days, according to month. If $\xi \rightarrow 0$, the return level is defined by

$$\begin{aligned} {\widehat{z}}_n = {\widehat{u}} + {\widehat{\sigma }} \,\log \left( {Nn_x {\hat{\lambda }} } \right) . \end{aligned}$$

(10)

For the estimates of return level, we need to know the estimates of the parameters of the GPD. As a result, to estimate the probabilities and return level, the maximum likelihood estimates will be used, as described in the previous sections. Thus, an estimate for $\lambda $ is required, which has the following natural estimator

$$\begin{aligned} {\hat{\lambda }} = \frac{k}{n} \end{aligned}$$

(11)

corresponding to the proportion of the sample points exceeding u. In addition to the return level estimates, the confidence intervals with confidence coefficient $(1-\alpha )\times 100\%$, associated with the return periods of 2, 5, 10, 30, 50 and 100 years, were constructed using the delta method, as described in Coles [24]. Since the number of excesses of u follows a binomial distribution, ${\hat{\lambda }}$ is also the maximum likelihood estimate of $\lambda $. The confidence intervals for ${\widehat{z}}_N$ can be obtained by the delta method, but the uncertainty in the estimate of ${\hat{\lambda }}$ should also be included in the calculation. From the standard properties of the binomial distribution, $Var\left( {\hat{\lambda }} \right) \approx {\hat{\lambda }} {{\left( {1 - {\hat{\lambda }} } \right) } / n}$, then the complete variance-covariance matrix is approximately

$$\begin{aligned} V = \left[ {\begin{array}{*{20}c} {{\hat{\lambda }} {{\left( {1 - {\hat{\lambda }} } \right) } / n}} &{} 0 &{} 0 \\ 0 &{} {v_{1,1} } &{} {v_{1,2} } \\ 0 &{} {v_{2,1} } &{} {v_{2,2} } \\ \end{array}} \right] \end{aligned}$$

(12)

where $v_{i,j}$, represents the term (i, j) of the variance-covariance matrix of ${\widehat{\sigma }}$ and ${\widehat{\xi }}$. Thus by the delta method,

$$\begin{aligned} Var\left( {{\widehat{z}}_N } \right) \approx \nabla z_N^T \,V\,\nabla z_N \end{aligned}$$

(13)

where

$$\begin{aligned} \nabla z_N^T = \left[ {\frac{{\partial z_N }}{{\partial \lambda }},\frac{{\partial z_N }}{{\partial \sigma }},\frac{{\partial z_N }}{{\partial \xi }}} \right] \end{aligned}$$

(14)

evaluated in $\left( {{\widehat{\lambda }} ,{\widehat{\sigma }} ,{\widehat{\xi }} } \right) $. Therefore, the confidence interval $(1 - \alpha )\times 100\%$ for ${{\widehat{z}}_N }$ is given by

$$\begin{aligned} CI_{(1 - \alpha )\times 100\% }\left( {{\widehat{z}}_N }\right) = {\widehat{z}}_N \pm z_{\frac{\alpha }{2}} \sqrt{Var\left( {\widehat{z}_N } \right) }, \end{aligned}$$

(15)

where $z_{\frac{\alpha }{2}}$ is the $\frac{\alpha }{2}$-th quantile of the standard normal distribution.

2.5 Simulation study to evaluate goodness of fit for extreme values distributions

A computational simulation study was conducted with the purpose of evaluating the performance of the distributions in each month. For this, the Monte Carlo simulation method was used, which consists of making several achievements of a phenomenon according to pre-established parameters. At the end of these simulations, we can calculate the mean and standard deviation of the simulations and these represent measures of accuracy and precision, respectively [30, 31]. For each month, the data series was divided into a training series, comprising 30 years (1961–1991), and a test series, comprising 29 years (1992–2019). Thus, two scenarios are considered: (1) the first scenario generates samples of the Exponential distribution with the estimated parameters, and (2) the second scenario generates samples of the GPD with the estimated parameters.

Each scenario $[(k = (1),(2)]$ is repeated 10000 times, according to the Monte Carlo simulation procedure, following the steps described below:

(i)
With the training sample, generate a sample of the same size (n) according to the probability distribution of scenario k;
(ii)
Estimate the parameters of the Exponential and GPD distributions using the maximum likelihood method, described in Sect. 2.2;
(iii)
Perform the likelihood ratio test of step (ii);
(iv)
For the return periods of 2, 5, 10, 15, 20, 25, 28 years, calculate the respective return level with the probability distributions and their respective parameters estimated in step (ii);
(v)
With the test sample, obtain the observed return levels for the return periods of 2, 5, 10, 15, 20, 25, 28 years. Calculate the Mean Absolute Percentage Error (MAPE) and the Root Mean Squared Error (RMSE), given by equations 16 and 17 , respectively.
$$\begin{aligned} RMSE= & {} \sqrt{\frac{{\sum \nolimits _{i = i}^{n_z } {\left( {z_{N_i } - {\hat{z}}_{N_i } } \right) ^2 } }}{{n_z }}} \end{aligned}$$
(16)
$$\begin{aligned} MAPE= & {} \frac{1}{{n_z }}\sum \limits _{i = 1}^{n_z } {\left| {\frac{{z_{N_i } - {\hat{z}}_{N_i } }}{{z_{N_i } }}} \right| } \times 100 \end{aligned}$$
(17)

Steps from (i) to (v) are repeated 10000 times. After that, we obtain the Monte Carlos average from MAPE and RMSE. In addition, the following were calculated: the proportion of which the LT, in step (iii), resulted in a p-value higher than the significance level of $1\%$, denoted by ${\hat{p}}_{LT}$; the proportion of which the MAPE of the GPD is greater than the MAPE of the Exponential distribution, denoted by ${\hat{p}}_{MAPE}$; and the proportion of which the RMSE of the GPD is greater than the RMSE of the Exponential distribution, denoted by ${\hat{p}}_{RMSE}$. It should be noted that the adopted return times, 2, 5, 10, 15, 20, 25, 28 years, $n_z = 7$, comprise the time of the test series.

Finalizing the proposed methodology, we used the R software [32] and the evd package [33].

3 Discussion and results

Table 1 shows that in all months the exponential distribution ($\xi \rightarrow 0$) performs better by the likelihood ratio test. The Mann-Kendall test indicated no trend in all months of the year, since the p-values showed results higher than 0.01. That is, there are statistical indications that each series of monthly rainfall ceilings does not have a trend over the years. Furthermore, the series of monthly highs are independent, with $1\%$ level of significance. We should highlight that we have used these tests to verify the assumptions of the Extreme Value Theory models, but that they could be used for other interests, such as [2, 29, 34] in the trend analysis of hydro-climatic series. In addition, the Kolmogorov-Smirnov test states that both distributions were fitted in all months and the QQ plots corroborate the results (Fig. 3). Satisfactory adjustment of the GPD distribution was also found by Lazoglou [35], Salleh and Hassan [36], Wan et al. [37], Zahid et al. [38].

Table 1 Threshold (${\hat{u}}$) selected by procedure described in Sect. 2.1, parameter estimates and Hypothesis tests (p-value) of the Generalized Pareto (GPD) and Exponential distributions for monthly maximum rainfall data of the city of Uruguaiana, RS, Brazil

Full size table

Table 2 Probability ($\%$) of rainfall occurrence by the probability distributions for monthly maximum rainfall data of the city of Uruguaiana, RS, Brazil

Full size table

From the fit of the exponential distribution, we verify, in Table 2 that in the months of October to February and April to May, amounts of rainfall above 50 mm are recorded, with a probability of occurrence greater than 60%. The table also shows that the probability of rainfall above 150 mm is higher in April and May than in other months of the year.

Rain volumes between 100 mm and 180 mm in a few hours can lead to landslides and flooding. One example occurred in the city of Rolante, metropolitan region of Porto Alegre, which has an average rainfall of 180 mm accumulated. landslides caused by a flood reached an area of 230 hectares and more than 6,600 inhabitants, and mud were dragged by the river, causing a cutoff of the water supply in eight municipalities of the region [39].

Herrmann [40] reported that in November 1991 there was precipitation in only two days with accumulated above 400 mm in São José / SC. There were numerous landslides and deaths in the eastern mountain range of Santa Catarina since houses crashed down and several sections of the highway BR 101 were blocked by the collapse of barriers. In December 1995, heavy rainfall resulted in 29 deaths, causing 29 municipalities in the mesoregion of southern Santa Catarina to declare a state of calamity.

Table 3 Return Levels estimates (mm) by the probability distributions for monthly maximum rainfall data of the city of Uruguaiana, RS, Brazil

Full size table

Table 3 presents estimates of maximum rainfall return levels for periods of 2 to 100 years for each month. We monitored that by means of the fit of the GPD and exponential distributions, that the precipitation estimates increase as the time of return increases. This fact is already expected and is in agreement with Zahid et al. [38].

In the period from September to May, rainfall above 50 mm is recorded, which depending on the hourly intensity may cause erosive processes in the soil, which can become harmful in order to contribute to the removal of essential nutrients for the development of the crop [25].

In March, it is expected that the maximum rainfall return level of 154.01 mm is exceeded once in 50 years by the Exponential distribution. Medeiros et al. [41] found for the same month a return level of 124.33 mm by the Gumbel distribution in the municipality of Jataí-Goiás and report that high levels of precipitation daily can cause intense rainfall and that estimates of precipitation in different return periods can be useful for assist professionals involved with planning and execution of hydraulic structure projects in decision making in control of floods.

Zahid et al. [38] conducted a study on temperatures return levels in concluded that extreme temperatures can affect yields. The crops are very sensitive to temperature variations in the order of 1 $^\circ $C, according to Hatfield & Prueger [42]. Every harvest has a certain temperature tolerance limit. When the temperature exceeds this limit, the yield of the harvest is drastically reduced. The same goes for extreme rainfall.

The results indicate that the month of April presented the highest rainfall return levels, whose expected level is 156.96 mm in an average period of 50 years. As a way of providing greater precision in the results, Beijo et al. [43] calculated the maximum rainfall return levels in Lavras, Minas Gerais state, by type I extreme values distribution (Gumbel), and found that for an average period of 50 years, expected level is 148 mm and with a $95\%$ confidence that varies between 131 mm and 164 mm. These authors also recommend that, in the analysis of maximum precipitation, if the interest is in the maximum extreme event, it is suggested that the upper limit of the interval be used as a reference value. In this sense, the Fig. 4 shows the behavior of the return levels and their 95 $\%$ confidence intervals.

Rain shall be considered erosive and individual as long as they are greater than or equal to 10 mm or greater or equal to 6.0 mm, provided that they occur in a maximum of 15 minutes and separated from each other by a period of at least six hours with a rainfall of 1.0 mm or less [44].

As seen in Table 1, the likelihood ratio test attests that the Exponential distribution is sufficient to model rainfall data and in a few months the Kolmogorov Smirnov test indicated that the GPD distribution is more appropriate, by comparing its p-values. If two probability distributions from the same family fit a set of data, the one with the least number of parameters is preferable [45]. This fact is important when there are problems in estimating the parameters of models, which can occur in methods based on likelihood [13, 46, 47]. In our study, this fact did not occur, which allows us to conduct the simulation study referred to in the Sect. 2.5. We conclude that there are months in which the Exponential distribution is more adequate, as in the months of January, March, April and August, since most of the comparison criteria used are favorable to this distribution. In September and November, most criteria indicated that the GPD distribution is more appropriate (Tables 4 and 5 ).

Table 4 Results of scenario 1 for the Monte Carlo simulation in 10000 replicates for each month of the year for the Exponential and GPD distributions of monthly maximum rainfall data in Uruguaiana-RS

Full size table

Table 5 Results of scenario 2 for the Monte Carlo simulation in 10000 replicates for each month of the year for the Exponential and GPD distributions of monthly maximum rainfall data in Uruguaiana-RS

Full size table

In the months of February, May, June, July, October and December, the result was inconclusive, as there was no unanimity between the two distributions in the two scenarios evaluated (Tables 4 and 5). In that case, we can use any of the distributions. We should emphasize that the Exponential distribution is expected to present a better result in the first scenario and GPD in the second scenario. When this does not occur, there is a strong indication that the true distribution in that month is that which was unanimously elected by the adopted criteria.

Regarding simulation studies involving distributions of extreme values, Xavier et al. [48] have reported in their simulation studies involving the generalized extreme values distribution, in the presence of covariates to model trend or temporal effect. The one that is more parsimonious is preferable and that according to the subject of study, the method used to select models is an important issue. In the same sense, Kim et al. [49] have showed by Monte Carlos simulation that the model comparison methods behave differently in the evaluation of stationary and nonstationary GEV models. For the nonstationary case, the Akaike information criteria showed better results and in the stationary case the likelihood ratio test was superior in detecting the most appropriate model. Our study used stationary GPD and we showed by Monte Carlo simulation that there are months when the most adequate distribution is different from that chosen in the Table 1. We intend to extend this study to other probability distributions.

Beijo et al. [50] stresses the importance of obtaining accurate estimates for rainfall. From a practical point of view, accuracy is important in terms of safety and economy, because when, in a shorter period, there is greater rainfall than expected, this can cause serious damage. In the case of the construction of a contour line, it would not support the volume of water and, consequently, would cause soil erosion and burial of plantations, causing serious damage to the environment and to the owners. Thus, and in accordance with the results of the Tables 4 and 5 , we provide the QQplots and confidence intervals for return levels according to the most accurate probability distribution.

4 Conclusions

The Generalized Pareto distribution was satisfactorily fitted in all months and can be used to provide maximum rainfall extreme levels. No positive trend and temporal dependence of monthly maximum rainfall was found.

The rainfall estimates from January to December were calculated for the return periods of 2, 5, 10, 30, 50 and 100 years. The highest estimate was observed in April (with rainfall above 170 mm every 100 years and with 95$\%$ confident interval of 140 mm to 220 mm, approximately) and the lowest return level was in July (with rainfall near from 90 mm every 100 years).

By comparing the distributions by computer simulation, it was possible to identify the true probability distribution of extreme values of the excess of a threshold. We chose three measures of fit quality to make the comparisons, and the measures ${\hat{p}}_{MAPE}$ and ${\hat{p}}_{RMSE}$ are obtained as a result. The proposed algorithm could be adapted for other measures of fit quality, such as the Akaike (AIC) information criterion, its corrected version (AICc), or Bayesian (BIC), among others. The length of the training and testing series is another issue that can be discussed. The original series should be as large as possible, but not less than 30 years. It is essential to have a balance between the sizes of the training and test series, so that if the training series is very long, the adjusted model can generalize well and, if the test set is long, the sample used to fit the model may be insufficient to reproduce the test series. In our work, for simulation, we divided the series into 30 years to adjust the model and 29 years to carry out the calculations of the appropriate quality measures, totaling 59 years of time series. The more extended set allows greater flexibility between the training and test series, and care has to be taken for short series, usually less than 30 years.

The results have practical implications for assessing the risk of extreme rain events in Uruguaiana, Brazil. The graphics are prepared to guide the local administration to support adaptations, such as the preparation of baseline contingency plans to deal with the maximum rainfall based on the current climatology. Studies like this are not yet available in this municipality. Our results will contribute to regional planning and may also be useful for ongoing economic and environmental projects in southern Brazil, as well as for a better understanding of the Pampa biome.

References

da Pereira Britto FDR, Barletta R, Mendonça M (2006) Regionalização sazonal e mensal da precipitação pluvial máxima no estado do Rio Grande do Sul. Rev Bras de Climatol 2:35–52
Google Scholar
Sá EAS, de Moura CN, Padilha VL, Campos CGC (2018) Trends in daily precipitation in highlands region of Santa Catarina, southern Brazil. Ambiente e Agua—an Interdiscip J Appl Sci 13:1–13
Google Scholar
Batool N, Shah SA, Dar SN, Skinder S (2019) Rainfall variability and dynamics of cropping pattern in Kashmir Himalayas: a case study of climate change and agriculture. SN Appl Sci 1:606
Google Scholar
Willems P, Arnbjerg-Nielsen K, Olsson J, Nguyen V (2012) Climate change impact assessment on urban rainfall extremes and urban drainage: methods and shortcomings. Atmos Res 103:106–118
Google Scholar
Berlato MA, Cordeiro APA (2018) Signs of global and regional climate changes, projections for the twenty-first century and trends observed in Rio Grande do Sul state, Brazil: a review. Agrometeoros 25:273–302
Google Scholar
Stocker T, Qin D, Plattner G-K, Tignor M, Allen S, Boschung J, Nauels A, Xia Y, Bex V, Midgley PIPCC (2013) Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change Cambridge University Press 2018:1585
Asadieh B, Krakauer NY (2015) Global trends in extreme precipitation: climate models versus observations. Hydrol Earth Syst Sci 19:877–891
Google Scholar
Worku G, Teferi E, Bantider A, Dile YT (2019) Observed changes in extremes of daily rainfall and temperature in Jemma Sub-Basin, Upper Blue Nile Basin. Ethiopia. Theor Appl Climatol 135:839–854
Google Scholar
Beharry SL, Gabriels D, Lobo D, Clarke RM (2019) A 35-year meteorological drought analysis in the Caribbean Region: case study of the small island state of Trinidad and Tobago. SN Appl Sci 1:1256
Google Scholar
Ferreira HA, Liska GR, Cirillo MA, Borém FM, Ribeiro DE, Cortez RM, Guiraldeli CH (2016) Selecting a probabilistic model applied to the sensory analysis of specialty coffees performed with Consumer. IEEE Lat Am Trans 14:1507–1512
Google Scholar
Sansigolo CA (2008) Distribuiç ões de extremos de precipitação diária, temperatura máxima e mínima e velocidade do vento em Piracicaba, SP (1917–2006). Rev Brasil de Meteorol 23:341–346
Google Scholar
Pereira Britto F, Barletta R, Mendonça M (2008) Variabilidade espacial e temporal da precipitação pluvial no rio grande do sul: influência do fenômeno el niño oscilação sul. Rev Bras de Climatol 3:37–48
Google Scholar
Butturi-Gomes D, Beijo LA, Avelar FG (2019) On modeling the maximum duration of dry spells: a simulation study under a Bayesian approach. Theor Appl Climatol 137:1337–1346
Google Scholar
Byström HN (2005) Extreme value theory and extremely large electricity price changes. Int Rev Econ Finance 14:41–55
Google Scholar
Cotta HHA, Correa WDSC, Albuquerque TTdA (2016) Gumbel distribution application for values of extreme precipitation in municipality of Vitória-ES. Rev Bras de Climatol 19:203–217
Google Scholar
Thomas M et al (2016) Applications of extreme value theory in public health. PLoS One 11:1–7
Google Scholar
Alam M, Emura K, Farnham C, Yuan J (2018) Best-fit probability distributions and return periods for maximum monthly rainfall in bangladesh. Climate 6:9
Google Scholar
Salles TT et al (2019) Bayesian approach and extreme value theory in economic analysis of forestry projects. For Policy Econ 105:64–71
Google Scholar
Yuan J, Emura K, Farnham C, Alam MA (2018) Frequency analysis of annual maximum hourly precipitation and determination of best fit probability distribution for regions in Japan. Urban Clim 24:276–286
Google Scholar
dos Reis CJ, Beijo LA, Avelar FG (2017) Temperatura mínima esperada para Piracicaba-SP via distribuições de valores extremos. Rev Bras de Agric Irrig 11:1639–1650
Google Scholar
Bautista EAL, Zocchi SS, Angelocci LR (2004) A distribuição generalizada de valores extremos aplicada ao ajuste dos dados de velocidade máxima do vento em Piracicaba, São Paulo. Brasil. Rev Mat Estat 22:95–111
Google Scholar
Blain GC (2014) Revisiting the critical values of the Lilliefors test: towards the correct agrometeorological use of the Kolmogorov-Smirnov framework. Bragantia 73:192–202
Google Scholar
Yao L, Dongxiao W, Zhenwei Z, Weihong H, Hui S (2014) A Monte Carlo simulation of multivariate general Pareto distribution and its application. Ocean Sci Discuss 11:2733–2753
Google Scholar
Coles S (2001) An introduction to statistical modeling of extreme values. Springer, Great Britain
MATH Google Scholar
Silva AT, Portela MM, Naghettini M (2013) Análise de frequência de máximos anuais baseada em séries de duração parcial. Combinação das distribuições de Poisson inflacionada de zeros e generalizada de Pareto, modelo ZIP-GP. Rev Recur Hídricos 34:5–12
Google Scholar
DE Mendes BV M (2004) Introdução à análise de eventos extremos. E-papers Serviços Editoriais Ltda, Rio de Janeiro
Castillo E, Hadi AS, Balakrishnan N, Sarabia JM (2004) Extreme value and related models with applications in engineering and science. Wiley, New York, p 362
MATH Google Scholar
Blain GC (2014) Dry months in the agricultural region of Ribeirão Preto, state of São Paulo-Brazil: an study based on the extreme value theory. Eng Agrícola 34:992–1000
Google Scholar
Salviano MF, Groppo JD, Pellegrino GQ (2016) Análise de tendências em Dados de precipitação e temperatura no Brasil. Rev Bras de Meteorol 31:64–73
Google Scholar
Rizzo ML (2007) Statistical computing with R. Chapman and Hall, CRC, p 416
Google Scholar
Robert C, Casella G (2010) Introducing Monte Carlo methods with R. Springer, New York
MATH Google Scholar
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Sthephenson AG (2002) Evd: extreme value distributions. R News 2:31–32
Google Scholar
Tan ML, Samat N, Chan NW, Lee AJ, Li C (2019) Analysis of precipitation and temperature extremes over the Muda River Basin. Malaysia. Water 11:283
Google Scholar
Lazoglou, G. & Anagnostopoulou, C. An Overview of Statistical Methods for Studying the Extreme Rainfalls in Mediterranean. Proceedings, 2017, 1, 681
Salleh NHM, Hasan H (2018) Generalized Pareto distribution for extreme temperatures in peninsular Malaysia. Sci Int (Lahore) 30:63–67
Google Scholar
Wan Zin WZ, Jemain AA, Ibrahim K (2009) The best fitting distribution of annual maximum rainfall in Peninsular Malaysia based on methods of L-moment and LQ-moment. Theor Appl Climatol 96:337–344
Google Scholar
Zahid M, Blender R, Lucarini V, Bramati MC (2017) Return levels of temperature extremes in southern Pakistan. Earth Syst Dynam 8:1263–1278
Google Scholar
G1. Deslizamentos de terra em Rolante atingiram 230 hectares. Avaiable in:http://glo.bo/2kvctLE. Accessed 20 May 2020
Herrmann MLP (2006) Atlas de desastres naturais do Estado de Santa Catarina. IOESC, Florianópolis, 1, 146
de Medeiros ES, Alves MA, de Souza SA (2019) Return level of estimation of maximum daily precipitation in the municipality of Jataí. Goiás. Ciência e Nat 41:e36
Google Scholar
Hatfield JL, Prueger JH (2015) Temperature extremes: effect on plant growth and development. Weather Clim Extremes 10:4–10
Google Scholar
Beijo LA, Muniz JA, Castro Neto P (2005) Maximum rainfall return period by extreme values type I distribution in Lavras, Minas Gerais state. Brazil Ciência e Agrotecnol 29:657–667
Google Scholar
Wishcmeier WH, Smith DD (1978) Predicting rainfall erosion losses: a guide to conservation planning. Departament of Agriculture, U.S, p 67
Google Scholar
Emiliano PC, MáJ Vivanco, de Menezes FS (2014) Information criteria: How do they behave in different models? Comput Stat Data Anal 69:141–153
MathSciNet MATH Google Scholar
Delicado P, Goria M (2008) A small sample comparison of maximum likelihood, moments and L-moments methods for the asymmetric exponential power distribution. Comput Stat Data Anal 52:1661–1673
MathSciNet MATH Google Scholar
S̃imková T, Picek J (2017) A comparison of L-, LQ-, TL-moment and maximum likelihood high quantile estimates of the GPD and GEV distribution. Commun Stat Simul Comput 46:5991–6010
MathSciNet MATH Google Scholar
Xavier ACF, Blain GC, de Morais MVB, Sobierajski GdR (2019) Selecting “the best” nonstationary generalized extreme value (GEV) distribution: on the influence of different numbers of GEV-models. Bragantia 78:606–621
Google Scholar
Kim H, Kim S, Shin H, Heo J-H (2017) Appropriate model selection methods for nonstationary generalized extreme value models. J Hydrol 547:557–574
Google Scholar
Beijo LA, Vivanco MJF, Muniz JA (2009) Bayesian analysis for estimating the return period of maximum precipitation at Jaboticabal São Paulo state. Brazil. Ciência e Agrotecnol 33:261–270
Google Scholar

Download references

Acknowledgements

The authors thank the Submission Editor at Springer Nature for their help in choosing the journal, the anonymous reviewers and the Rio Grande do Sul Research Support Foundation (FAPERGS) for their research Grant.

Funding

The work received financial assistance in the form of a scientific initiation scholarship by the Research Foundation of the State of Rio Grande do Sul (FAPERGS).

Author information

Authors and Affiliations

Academic of the Interdisciplinary Degree in Science and Technology, Federal University of Pampa, Itaqui, RS, Brazil
Amanda Larissa Alves Martins
Department of Agroindustrial Technology and Rural Socioeconomics, Federal University of São Carlos, Araras, SP, Brazil
Gilberto Rodrigues Liska
Federal University of Alfenas, Alfenas, MG, Brazil
Luiz Alberto Beijo
Federal University of Lavras, Lavras, MG, Brazil
Fortunato Silva de Menezes & Marcelo Ângelo Cirillo

Authors

Amanda Larissa Alves Martins
View author publications
You can also search for this author in PubMed Google Scholar
Gilberto Rodrigues Liska
View author publications
You can also search for this author in PubMed Google Scholar
Luiz Alberto Beijo
View author publications
You can also search for this author in PubMed Google Scholar
Fortunato Silva de Menezes
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Ângelo Cirillo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gilberto Rodrigues Liska.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martins, A.L.A., Liska, G.R., Beijo, L.A. et al. Generalized Pareto distribution applied to the analysis of maximum rainfall events in Uruguaiana, RS, Brazil. SN Appl. Sci. 2, 1479 (2020). https://doi.org/10.1007/s42452-020-03199-8

Download citation

Received: 23 March 2020
Accepted: 09 July 2020
Published: 05 August 2020
DOI: https://doi.org/10.1007/s42452-020-03199-8

Generalized Pareto distribution applied to the analysis of maximum rainfall events in Uruguaiana, RS, Brazil

Abstract

Similar content being viewed by others

A review of statistical methods to analyze extreme precipitation and temperature events in the Mediterranean region

Modeling Extreme Precipitation Data in a Mining Area

Estimating return periods for daily precipitation extreme events over the Brazilian Amazon

1 Introduction