Abstract
Extreme value theory motivates estimating extreme upper quantiles of a distribution by selecting some threshold, discarding those observations below the threshold and fitting a generalized Pareto distribution to exceedances above the threshold via maximum likelihood. This sharp cutoff between observations that are used in the parameter estimation and those that are not is at odds with statistical practice for analogous problems such as nonparametric density estimation, in which observations are typically smoothly downweighted as they become more distant from the value at which the density is being estimated. By exploiting the fact that the order statistics of independent and identically distributed observations form a Markov chain, this work shows how one can obtain a natural weighted composite log-likelihood function for fitting generalized Pareto distributions to exceedances over a threshold. A method for producing confidence intervals based on inverting a test statistic calibrated via parametric bootstrapping is proposed. Some theory demonstrates the asymptotic advantages of using weights in the special case when the shape parameter of the limiting generalized Pareto distribution is known to be 0. Methods for extending this approach to observations that are not identically distributed are described and applied to an analysis of daily precipitation data in New York City. Perhaps the most important practical finding is that including weights in the composite log-likelihood function can reduce the sensitivity of estimates to small changes in the threshold.
Similar content being viewed by others
Availability of supporting data
The precipitation data is available from the National Centers for Environmental Information at https://www.ncdc.noaa.gov/cdo-web/datasets/GHCND/stations/GHCND:USW00094728/detail.
References
Balakrishnan, N., Zhao, P.: Ordering properties of order statistics from heterogeneous populations: a review with an emphasis on some recent developments. Probab. Eng. Inf. Sci. 27(4), 403–443 (2013). https://doi.org/10.1017/S0269964813000156
Beirlant, J., Caeiro, F., Gomes, M.: An overview and open research topics in statistics of univariate extremes. REVSTAT 10, 1–31 (2012)
Bon, J.L., Păltănea, E.: Comparison of order statistics in a random sequence to the same statistics with i.i.d. variables. ESAIM: PS 10, 1–10 (2006). https://doi.org/10.1051/ps:2005020
Caeiro, F., Henriques-Rodrigues, L., Prata Gomes, D.: A simple class of reduced bias kernel estimators of extreme value parameters. Computational and Mathematical Methods 1(3), e1025 (2019). https://doi.org/10.1002/cmm4.1025
Carpenter, J.: Test inversion bootstrap confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61(1), 159–172 (1999)
Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London (2001)
Csörgǒ, S., Deheuvels, P., Mason, D.: Kernel estimates of the tail index of a distribution. The Annals of Statistics 13(3), 1050–1077 (1985). http://www.jstor.org/stable/2241125
David, H.A., Nagaraja, H.N.: Order Statistics, 3rd edn. Wiley-Interscience, Hoboken, NJ (2003)
Davis, R.A., Mikosch, T., Zhao, Y.: Measures of serial extremal dependence and their estimation. Stoch. Process Their Appl. 123(7), 2575–2602 (2013). https://www.sciencedirect.com/science/article/pii/S0304414913000781, a Special Issue on the Occasion of the 2013 International Year of Statistics
Davison, A.C., Smith, R.L.: Models for exceedances over high thresholds. J. Roy. Stat. Soc.: Ser. B (Methodol.) 52(3), 393–425 (1990). https://doi.org/10.1111/j.2517-6161.1990.tb01796.x
de Haan, L., Ferreira, A.: Extreme Value Theory: An Introduction. Springer, New York (2006)
de Haan, L., Zhou, C.: Trends in extreme value indices. J. Am. Stat. Assoc. 116(535), 1265–1279 (2021). https://doi.org/10.1080/01621459.2019.1705307
Devroye, L., Györfi, L.: Nonparametric density estimation: the L1 view. John Wiley & Sons, New York (1985)
Drees, H.: On smooth statistical tail functionals. Scand. J. Stat. 25(1), 187–210 (1998). https://doi.org/10.1111/1467-9469.00097
Drees, H., Ferreira, A., De Haan, L.: On maximum likelihood estimation of the extreme value index. Ann. Appl. Probab. 1179–1201 (2004)
Einmahl, J.H.J., de Haan, L., Zhou, C.: Statistics of heteroscedastic extremes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78(1), 31–51 (2016). https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12099
Epanechnikov, V.A.: Non-parametric estimation of a multivariate probability density. Theory Probab. its Appl. 14(1), 153–158 (1969). https://doi.org/10.1137/1114019
Falk, M.: Asymptotic normality of the kernel quantile estimator. Ann. Stat. 13(1), 428–433 (1985). https://doi.org/10.1214/aos/1176346605
Fawcett, L., Walshaw, D.: Improved estimation for temporally clustered extremes. Environmetrics 18(2), 173–188 (2007). https://doi.org/10.1002/env.810
Fawcett, L., Walshaw, D.: Estimating return levels from serially dependent extremes. Environmetrics 23(3), 272–283 (2012). https://doi.org/10.1002/env.2133
Ferro, C.A.T., Segers, J.: Inference for clusters of extreme values. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(2), 545–556 (2003). https://doi.org/10.1111/1467-9868.00401
Gamet, P., Jalbert, J.: A flexible extended generalized Pareto distribution for-tail estimation. Environmetrics e2744 (2022). https://onlinelibrary.wiley.com/doi/abs/10.1002/env.2744
Gilleland, E.: Bootstrap methods for statistical inference. part ii: Extreme-value analysis. J. Atmos. Ocean Technol. 37(11), 2135–2144 (2020a). https://journals.ametsoc.org/view/journals/atot/37/11/JTECH-D-20-0070.1.xml
Gilleland, E.: R package extRemes. (2020b). https://cran.r-project.org/web/packages/extRemes/
Groeneboom, P., Lopuhaä, H., de Wolf, P.: Kernel-type estimators for the extreme value index. Ann. Stat. 31(6), 1956–1995 (2003). https://doi.org/10.1214/aos/1074290333
Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Stat. 3(5), 1163–1174 (1975). http://www.jstor.org/stable/2958370
Huang, W.K., Nychka, D.W., Zhang, H.: Estimating precipitation extremes using the log-histospline. Environmetrics 0(0), e2543 (2018). https://doi.org/10.1002/env.2543
Huber, P.J.: Robust Statistics. John Wiley & Sons, Hoboken, NJ (2004)
Jones, M.C., Signorini, D.F.: A comparison of higher-order bias kernel density estimators. J. Am. Stat. Assoc. 92(439), 1063–1073 (1997). https://doi.org/10.1080/01621459.1997.10474062
Koenker, R.: R package quantreg. (2021). https://cran.r-project.org/web/packages/quantreg
Krock, M., Bessac, J., Stein, M.L., Monahan, A.H.: Nonstationary seasonal model for daily mean temperature distribution bridging bulk and tails. Weather and Climate Extremes 36, 100438 (2022)
Levin, Z., Cotton, W.R.: Aerosol pollution impact on precipitation: a scientific review. Springer Science & Business Media (2008)
Li, B., Babu, G.: A Graduate Course in Statistical Inference. Springer, New York, NY (2019)
Menne, M.J., Durre, I., Vose, R.S., Gleason, B.E., Houston, T.G.: An overview of the global historical climatology network-daily database. J. Atmos. Oceanic Tech. 29, 897–910 (2012). https://doi.org/10.1175/JTECH-D-11-00103.1
Naveau, P., Huser, R., Ribereau, P., Hannart, A.: Modeling jointly low, moderate, and heavy rainfall intensities without a threshold selection. Water Resour. Res. 52(4), 2753–2769 (2016). https://doi.org/10.1002/2015WR018552
Pace, L., Salvan, A., Sartori, N.: Adjusting composite likelihood ratio statistics. Statistica Sinica 129–148 (2011)
Papastathopoulos, I., Tawn, J.A.: Extended generalised Pareto models for tail estimation. Journal of Statistical Planning and Inference 143(1), 131–143 (2013). https://doi.org/10.1016/j.jspi.2012.07.001
Rao, T.S.: The fitting of non-stationary time-series models with time-dependent parameters. J. Roy. Stat. Soc.: Ser. B (Methodol.) 32(2), 312–322 (1970). https://doi.org/10.1111/j.2517-6161.1970.tb00844.x
Rosenblatt, M.: Curve estimates. Biometrika 42(6), 1815–1842 (1971)
Scarrott, C.: Univariate extreme value mixture modeling. In: Dey, D., Yan, J. (eds.) Extreme Value Modeling and Risk Analysis: Methods and Applications, CRC Press, Boca Raton, FL, chap. 3, pp. 41–67. (2016)
Scarrott, C., MacDonald, A.: A review of extreme value threshold estimation and uncertainty quantification. REVSTAT 10, 33–60 (2012)
Schendel, T., Thongwichian, R.: Confidence intervals for return levels for the peaks-over-threshold approach. Adv. Water Resour. 99, 53–59 (2017). https://www.sciencedirect.com/science/article/pii/S0309170816306960
Sheather, S.J.: Density estimation. Stat. Sci. 19(4), 588–597 (2004). http://www.jstor.org/stable/4144429
Stein, M.L.: Parametric models for distributions when interest is in extremes with an application to daily temperature. Extremes (2020). https://doi.org/10.1007/s10687-020-00378-z
Stein, M.L.: A parametric model for distributions with flexible behavior in both tails. Environmetrics 32(2), e2658 (2021). https://doi.org/10.1002/env.2658
Varin, C., Reid, N., Firth, D.: An overview of composite likelihood methods. Stat. Sin. 21(1), 5–42 (2011). http://www.jstor.org/stable/24309261
Weissman, I.: Estimation of parameters and large quantiles based on the k largest observations. J. Am. Stat. Assoc. 73(364), 812–815 (1978)
Acknowledgements
The author would like to thank the Associate Editor and referees for their many helpful comments on the substance and presentation of this work and Mitchell Krock for providing the software to carry out the Minneapolis temperature simulations.
Funding
This material was based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) under Contract DE-AC02-06CH11347.
Author information
Authors and Affiliations
Contributions
Michael Stein carried out all of the research described in this manuscript and did all of the writing.
Corresponding author
Ethics declarations
Ethical approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Human and animal ethics
Not applicable
Competing interests
I declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Stein, M.L. A weighted composite log-likelihood approach to parametric estimation of the extreme quantiles of a distribution. Extremes 26, 469–507 (2023). https://doi.org/10.1007/s10687-023-00466-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10687-023-00466-w
Keywords
- Extreme value theory
- Generalized Pareto distribution
- Order statistics
- Kernel density estimation
- Test inversion bootstrapping