Problems that occur when common methods (e.g. maximum likelihood and L-moments) for fitting a generalised Pareto (GP) distribution are applied to discrete (rounded) data sets are revealed by analysing the real, dry spell duration series. The analysis is subsequently performed on generalised Pareto time series obtained by systematic Monte Carlo (MC) simulations. The solution depends on the following: (1) the actual amount of rounding, as determined by the actual data range (measured by the scale parameter, σ) vs. the rounding increment (Δx), combined with; (2) applying a certain (sufficiently high) threshold and considering the series of excesses instead of the original series. For a moderate amount of rounding (e.g. σ/Δx ≥ 4), which is commonly met in practice (at least regarding the dry spell data), and where no threshold is applied, the classical methods work reasonably well. If cutting at the threshold is applied to rounded data—which is actually essential when dealing with a GP distribution—then classical methods applied in a standard way can lead to erroneous estimates, even if the rounding itself is moderate. In this case, it is necessary to adjust the theoretical location parameter for the series of excesses. The other solution is to add an appropriate uniform noise to the rounded data (“so-called” jittering). This, in a sense, reverses the process of rounding; and thereafter, it is straightforward to apply the common methods. Finally, if the rounding is too coarse (e.g. σ/Δx~1), then none of the above recipes would work; and thus, specific methods for rounded data should be applied.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Anagnostopolou C, Tolika K (2012) Extreme precipitation in Europe: statistical threshold selection based on climatological criteria. Theor Appl Climatol 107:479–489. https://doi.org/10.1007/s00704-011-0487-8
Bai Z, Zheng S, Zhang B, Hu G (2009) Statistical analysis for rounded data. J Stat Plan Inference 139:2526–2542
Begueria S (2005) Uncertainties in partial duration series modelling of extremes related to the choice of the threshold value. J Hydrol 303:215–230. https://doi.org/10.1016/j.jhydrol.2004.07.015
Cindrić K, Pasarić Z, Gajić-Čapka M (2010) Spatial and temporal analysis of dry spells in Croatia. Theor Appl Climatol 102:171–184. https://doi.org/10.1007/s00704-010-0250-6
Coles S (2001) An introduction to statistical Modelling of extreme values. Springer-Verlag, London
Coles S, Pericchi LR, Sisson S (2003) A fully probabilistic approach to extreme rainfall modelling. J Hydrol 273:35–50. https://doi.org/10.1016/S0022-1694(02)00353-0
Deidda R, Puliga M (2006) Sensitivity of goodness-of-fit statistics to rainfall data rounding off. Phys Chem Earth 31:1240–1251. https://doi.org/10.1016/j.pce.2006.04.041
Deidda R (2007) An efficient rounding-off rule estimator: application to daily rainfall time series. Water Resour Res 43:W12405. https://doi.org/10.1029/2006WR005409
Deidda R (2010) A multiple threshold method for fitting the generalized Pareto distribution to rainfall time series. Hydrol Earth Syst Sci 14:2559–2575. https://doi.org/10.5194/hess-14-2559-2010
de Zea Bermudez P, Kotz S (2010a) Parameter estimation of the generalized Pareto distribution—part I. J Stat Plan Inference 140:1353–1373. https://doi.org/10.1016/j.jspi.2008.11.019
de Zea Bermudez P, Kotz S (2010b) Parameter estimation of the generalized Pareto distribution—part II. J Stat Plan Inference 140:1374–1388. https://doi.org/10.1016/j.jspi.2008.11.020
Heitjan DF (1989) Inference from grouped continuous data: a review. Stat Sci 4(2):164–183
Hogg RV, McKean J, Craig AT (2012) Introduction to mathematical statistics. Pearson, Boston
Hosking JRM (1990) L-moments: analysis and estimation of distributions using linear combinations of order statistics. J R Statist Soc B 52(1):105–124
Hosking JRM, Wallis JR (1997) Regional frequency analysis. An approach based on L-moments. Cambridge University Press, London
Lana X, Martínez MD, Burgueño A, Serra C, Martín-Vide J, Gómez L (2006) Distribution of long dry spells in the Iberian peninsula, years 1951-1990. Int J Climatol 26:1999–2021. https://doi.org/10.1002/joc.1354
Lang M, Ouarda TBMJ, Bobee B (1999) Towards operational guidelines for over-threshold modeling. J Hydrol 225:103–117. https://doi.org/10.1016/S0022-1694(99)00167-5
Langousis A, Mamalakis A, Puliga M, Deidda R (2016) Threshold detection for the generalized Pareto distribution: review of representative methods and application to the NOAA NCDC daily rainfall database. Water Resour Res 52(4):2659–2681. https://doi.org/10.1002/2015WR018502
Madsen H, Pearson CP, Rosbjerg D (1997a) Comparison of annual maximum series and partial duration methods for modeling extreme hydrologic events. 1. At-site modeling. Water Resour Res 33:759–769. https://doi.org/10.1029/96WR03848
Madsen H, Pearson CP, Rosbjerg D (1997b) Comparison of annual maximum series and partial duration methods for modeling extreme hydrologic events. 2. Regional modeling. Water Resour Res 33:771–790. https://doi.org/10.1029/96WR03849
Mudelsee M (2014) Climate time series analysis: classical statistical and bootstrap methods. Springer International Publishing, Switzerland
Mudelsee M, Bermejo MA (2017) Optimal heavy tail estimation—part 1: order selection. Nonlin Process Geophys 24:737–744. https://doi.org/10.5194/npg-24-737-2017
Naveau P, Huser R, Ribereau P, Hannart A (2016) Modeling jointly low, moderate, and heavy rainfall intensities without a threshold selection. Water Resour Res 52:2753–2769. https://doi.org/10.1002/2015WR018552
Prieto F, Gómez-Déniz E, Sarabia JM (2014) Modelling road accident blackspots data with the discrete generalized Pareto distribution. Acid Anal Prev 71:38–49. https://doi.org/10.1016/j.aap.2014.05.005
Reiss RD, Thomas M (2007) Statistical analysis of extreme values. Birkhäuser, Basel
Serra C, Lana X, Burgueño A, Martínez MD (2016) Partial duration series distributions of the European dry spell lengths for the second half of the twentieth century. Theor Appl Climatol 123:63–81. https://doi.org/10.1007/s00704-014-1337-2
Smith RL (2003) Statistics of extremes, with applications in environment, insurance and finance. In: Finkenstadt B (ed) Extreme values in finance, telecommunications, and the environment. Chapman and Hall/CRC Press, London
Vicente-Serrano SM, Begueria-Portugues S (2003) Estimating extreme dry spell-risk in the middle Ebro valley (NE Spain): a comparative analysis of partial duration series with a general Pareto distribution and annual maxima series with a Gumbel distribution. Int J Climatol 23:1103–1118. https://doi.org/10.1002/joc.934
The constructive comments from two anonymous reviewers are gratefully acknowledged.
This work has been supported in part by the Croatian Science Foundation under the project 2831. K. Cindrić received funding from the European Union’s Horizon 2020 research and innovation program under the grant agreement no. 653824/EU-CIRCLE.
About this article
Cite this article
Pasarić, Z., Cindrić, K. Generalised Pareto distribution: impact of rounding on parameter estimation. Theor Appl Climatol 136, 417–427 (2019). https://doi.org/10.1007/s00704-018-2494-5