Skip to main content

Generalised Pareto distribution: impact of rounding on parameter estimation

Abstract

Problems that occur when common methods (e.g. maximum likelihood and L-moments) for fitting a generalised Pareto (GP) distribution are applied to discrete (rounded) data sets are revealed by analysing the real, dry spell duration series. The analysis is subsequently performed on generalised Pareto time series obtained by systematic Monte Carlo (MC) simulations. The solution depends on the following: (1) the actual amount of rounding, as determined by the actual data range (measured by the scale parameter, σ) vs. the rounding increment (Δx), combined with; (2) applying a certain (sufficiently high) threshold and considering the series of excesses instead of the original series. For a moderate amount of rounding (e.g. σx ≥ 4), which is commonly met in practice (at least regarding the dry spell data), and where no threshold is applied, the classical methods work reasonably well. If cutting at the threshold is applied to rounded data—which is actually essential when dealing with a GP distribution—then classical methods applied in a standard way can lead to erroneous estimates, even if the rounding itself is moderate. In this case, it is necessary to adjust the theoretical location parameter for the series of excesses. The other solution is to add an appropriate uniform noise to the rounded data (“so-called” jittering). This, in a sense, reverses the process of rounding; and thereafter, it is straightforward to apply the common methods. Finally, if the rounding is too coarse (e.g. σx~1), then none of the above recipes would work; and thus, specific methods for rounded data should be applied.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. Anagnostopolou C, Tolika K (2012) Extreme precipitation in Europe: statistical threshold selection based on climatological criteria. Theor Appl Climatol 107:479–489. https://doi.org/10.1007/s00704-011-0487-8

    Article  Google Scholar 

  2. Bai Z, Zheng S, Zhang B, Hu G (2009) Statistical analysis for rounded data. J Stat Plan Inference 139:2526–2542

    Article  Google Scholar 

  3. Begueria S (2005) Uncertainties in partial duration series modelling of extremes related to the choice of the threshold value. J Hydrol 303:215–230. https://doi.org/10.1016/j.jhydrol.2004.07.015

    Article  Google Scholar 

  4. Cindrić K, Pasarić Z, Gajić-Čapka M (2010) Spatial and temporal analysis of dry spells in Croatia. Theor Appl Climatol 102:171–184. https://doi.org/10.1007/s00704-010-0250-6

    Article  Google Scholar 

  5. Coles S (2001) An introduction to statistical Modelling of extreme values. Springer-Verlag, London

    Book  Google Scholar 

  6. Coles S, Pericchi LR, Sisson S (2003) A fully probabilistic approach to extreme rainfall modelling. J Hydrol 273:35–50. https://doi.org/10.1016/S0022-1694(02)00353-0

    Article  Google Scholar 

  7. Deidda R, Puliga M (2006) Sensitivity of goodness-of-fit statistics to rainfall data rounding off. Phys Chem Earth 31:1240–1251. https://doi.org/10.1016/j.pce.2006.04.041

    Article  Google Scholar 

  8. Deidda R (2007) An efficient rounding-off rule estimator: application to daily rainfall time series. Water Resour Res 43:W12405. https://doi.org/10.1029/2006WR005409

    Article  Google Scholar 

  9. Deidda R (2010) A multiple threshold method for fitting the generalized Pareto distribution to rainfall time series. Hydrol Earth Syst Sci 14:2559–2575. https://doi.org/10.5194/hess-14-2559-2010

    Article  Google Scholar 

  10. de Zea Bermudez P, Kotz S (2010a) Parameter estimation of the generalized Pareto distribution—part I. J Stat Plan Inference 140:1353–1373. https://doi.org/10.1016/j.jspi.2008.11.019

    Article  Google Scholar 

  11. de Zea Bermudez P, Kotz S (2010b) Parameter estimation of the generalized Pareto distribution—part II. J Stat Plan Inference 140:1374–1388. https://doi.org/10.1016/j.jspi.2008.11.020

    Article  Google Scholar 

  12. Heitjan DF (1989) Inference from grouped continuous data: a review. Stat Sci 4(2):164–183

    Article  Google Scholar 

  13. Hogg RV, McKean J, Craig AT (2012) Introduction to mathematical statistics. Pearson, Boston

    Google Scholar 

  14. Hosking JRM (1990) L-moments: analysis and estimation of distributions using linear combinations of order statistics. J R Statist Soc B 52(1):105–124

    Google Scholar 

  15. Hosking JRM, Wallis JR (1997) Regional frequency analysis. An approach based on L-moments. Cambridge University Press, London

    Book  Google Scholar 

  16. Lana X, Martínez MD, Burgueño A, Serra C, Martín-Vide J, Gómez L (2006) Distribution of long dry spells in the Iberian peninsula, years 1951-1990. Int J Climatol 26:1999–2021. https://doi.org/10.1002/joc.1354

    Article  Google Scholar 

  17. Lang M, Ouarda TBMJ, Bobee B (1999) Towards operational guidelines for over-threshold modeling. J Hydrol 225:103–117. https://doi.org/10.1016/S0022-1694(99)00167-5

    Article  Google Scholar 

  18. Langousis A, Mamalakis A, Puliga M, Deidda R (2016) Threshold detection for the generalized Pareto distribution: review of representative methods and application to the NOAA NCDC daily rainfall database. Water Resour Res 52(4):2659–2681. https://doi.org/10.1002/2015WR018502

    Article  Google Scholar 

  19. Madsen H, Pearson CP, Rosbjerg D (1997a) Comparison of annual maximum series and partial duration methods for modeling extreme hydrologic events. 1. At-site modeling. Water Resour Res 33:759–769. https://doi.org/10.1029/96WR03848

    Article  Google Scholar 

  20. Madsen H, Pearson CP, Rosbjerg D (1997b) Comparison of annual maximum series and partial duration methods for modeling extreme hydrologic events. 2. Regional modeling. Water Resour Res 33:771–790. https://doi.org/10.1029/96WR03849

    Article  Google Scholar 

  21. Mudelsee M (2014) Climate time series analysis: classical statistical and bootstrap methods. Springer International Publishing, Switzerland

    Google Scholar 

  22. Mudelsee M, Bermejo MA (2017) Optimal heavy tail estimation—part 1: order selection. Nonlin Process Geophys 24:737–744. https://doi.org/10.5194/npg-24-737-2017

    Article  Google Scholar 

  23. Naveau P, Huser R, Ribereau P, Hannart A (2016) Modeling jointly low, moderate, and heavy rainfall intensities without a threshold selection. Water Resour Res 52:2753–2769. https://doi.org/10.1002/2015WR018552

    Article  Google Scholar 

  24. Prieto F, Gómez-Déniz E, Sarabia JM (2014) Modelling road accident blackspots data with the discrete generalized Pareto distribution. Acid Anal Prev 71:38–49. https://doi.org/10.1016/j.aap.2014.05.005

    Article  Google Scholar 

  25. Reiss RD, Thomas M (2007) Statistical analysis of extreme values. Birkhäuser, Basel

    Google Scholar 

  26. Serra C, Lana X, Burgueño A, Martínez MD (2016) Partial duration series distributions of the European dry spell lengths for the second half of the twentieth century. Theor Appl Climatol 123:63–81. https://doi.org/10.1007/s00704-014-1337-2

    Article  Google Scholar 

  27. Smith RL (2003) Statistics of extremes, with applications in environment, insurance and finance. In: Finkenstadt B (ed) Extreme values in finance, telecommunications, and the environment. Chapman and Hall/CRC Press, London

    Google Scholar 

  28. Vicente-Serrano SM, Begueria-Portugues S (2003) Estimating extreme dry spell-risk in the middle Ebro valley (NE Spain): a comparative analysis of partial duration series with a general Pareto distribution and annual maxima series with a Gumbel distribution. Int J Climatol 23:1103–1118. https://doi.org/10.1002/joc.934

    Article  Google Scholar 

Download references

Acknowledgments

The constructive comments from two anonymous reviewers are gratefully acknowledged.

Funding

This work has been supported in part by the Croatian Science Foundation under the project 2831. K. Cindrić received funding from the European Union’s Horizon 2020 research and innovation program under the grant agreement no. 653824/EU-CIRCLE.

Author information

Affiliations

Authors

Corresponding author

Correspondence to K. Cindrić.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pasarić, Z., Cindrić, K. Generalised Pareto distribution: impact of rounding on parameter estimation. Theor Appl Climatol 136, 417–427 (2019). https://doi.org/10.1007/s00704-018-2494-5

Download citation