Conditional heavy-tail behavior with applications to precipitation and river flow extremes

Abstract

This article deals with the right-tail behavior of a response distribution \(F_Y\) conditional on a regressor vector \({\mathbf {X}}={\mathbf {x}}\) restricted to the heavy-tailed case of Pareto-type conditional distributions \(F_Y(y|\ {\mathbf {x}})=P(Y\le y|\ {\mathbf {X}}={\mathbf {x}})\), with heaviness of the right tail characterized by the conditional extreme value index \(\gamma ({\mathbf {x}})>0\). We particularly focus on testing the hypothesis \({\mathscr {H}}_{0,tail}:\ \gamma ({\mathbf {x}})=\gamma _0\) of constant tail behavior for some \(\gamma _0>0\) and all possible \({\mathbf {x}}\). When considering \({\mathbf {x}}\) as a time index, the term trend analysis is commonly used. In the recent past several such trend analyses in extreme value data have been published, mostly focusing on time-varying modeling of location or scale parameters of the response distribution. In many such environmental studies a simple test against trend based on Kendall’s tau statistic is applied. This test is powerful when the center of the conditional distribution \(F_Y(y|{\mathbf {x}})\) changes monotonically in \({\mathbf {x}}\), for instance, in a simple location model \(\mu ({\mathbf {x}})=\mu _0+x\cdot \mu _1\), \({\mathbf {x}}=(1,x)'\), but the test is rather insensitive against monotonic tail behavior, say, \(\gamma ({\mathbf {x}})=\eta _0+x\cdot \eta _1\). This has to be considered, since for many environmental applications the main interest is on the tail rather than the center of a distribution. Our work is motivated by this problem and it is our goal to demonstrate the opportunities and the limits of detecting and estimating non-constant conditional heavy-tail behavior with regard to applications from hydrology. We present and compare four different procedures by simulations and illustrate our findings on real data from hydrology: weekly maxima of hourly precipitation from France and monthly maximal river flows from Germany.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Angrist J, Chernozhukov V, Fernández-Val I (2006) Quantile regression under misspecification, with an application to the US wage structure. Econometrica 74(2):539–563. http://www.jstor.org/stable/3598810

  2. Beirlant J, Goegebeur Y, Segers J, Teugels J (2006) Statistics of extremes: theory and applications. Wiley, Hoboken

    Google Scholar 

  3. Bernard E, Naveau P, Vrac M, Mestre O (2013) Clustering of maxima: spatial dependencies among heavy rainfall in France. J Clim 26(20):7929–7937. doi:10.1175/JCLI-D-12-00836.1

    Article  Google Scholar 

  4. Bickel PJ, Lehmann EL (1975) Descriptive statistics for nonparametric models ii. Location. Ann Stat 3(5):1045–1069. doi:10.1214/aos/1176343240

    Article  Google Scholar 

  5. Bucher A, Kinsvater P, Kojadinovic I (2015) Detecting breaks in the dependence of multivariate extreme-value distributions. ArXiv 1505:00954

    Google Scholar 

  6. Chavez-Demoulin V, Davison AC (2005) Generalized additive modelling of sample extremes. J R Stat Soc Ser C 54(1):207–222. doi:10.1111/j.1467-9876.2005.00479.x

    Article  Google Scholar 

  7. Chebana F, Ouarda TB, Duong TC (2013) Testing for multivariate trends in hydrologic frequency analysis. J Hydrol 486:519–530. doi:10.1016/j.jhydrol.2013.01.007; http://www.sciencedirect.com/science/article/pii/S00221694130

  8. Chernozhukov V, Fernndez-Val I, Galichon A (2010) Quantile and probability curves without crossing. Econometrica 78(3):1093–1125. doi:10.3982/ECTA7880

    Article  Google Scholar 

  9. Cunnane C (1973) A particular comparison of annual maxima and partial duration series methods of flood frequency prediction. J Hydrol 18(3):257–271. doi:10.1016/0022-1694(73)90051-6; http://www.sciencedirect.com/science/article/pii/002216947390

  10. de Haan L, Tank A, Neves C (2015) On tail trend detection: modeling relative risk. Extremes 18(2):141–178. doi:10.1007/s10687-014-0207-8

    Article  Google Scholar 

  11. de Haan L, Ferreira A (2006) Extreme value theory: an introduction. Springer, Zurich

    Google Scholar 

  12. Dierckx G (2011) Trends and change points in the tail behaviour of a heavy tailed distribution. In: Proceedings of 58th world statistical congress (ISI2011), Dublin. pp 290–299

  13. Dierckx G, Teugels JL (2010) Change point analysis of extreme values. Environmetrics 21(7–8):661–686. doi:10.1002/env.1041

    Article  Google Scholar 

  14. Dupuis DJ, Sun Y, Wang HJ (2015) Detecting change-points in extremes. Stat Interface 8(1):19–31. doi:10.4310/SII.2015.v8.n1.a3

    Article  Google Scholar 

  15. Einmahl JHJ, de Haan L, Zhou C (2016) Statistics of heteroscedastic extremes. J R Stat Soc Ser B 78(1):31–51. doi:10.1111/rssb.12099

    Article  Google Scholar 

  16. Gardes L, Girard S (2010) Conditional extremes from heavy-tailed distributions: an application to the estimation of extreme rainfall return levels. Extremes 13(2):177–204. doi:10.1007/s10687-010-0100-z

    Article  Google Scholar 

  17. Gomes MI, Pestana D (2007) A sturdy reduced-bias extreme quantile (var) estimator. J Am Stat Assoc 102(477):280–292. doi:10.1198/016214506000000799

    CAS  Article  Google Scholar 

  18. Hill BM (1975) A simple general approach to inference about the tail of a distribution. Ann Stat 3(5):1163–1174. doi:10.1214/aos/1176343247

    Article  Google Scholar 

  19. Jarušková D, Rencová M (2008) Analysis of annual maximal and minimal temperatures for some European cities by change point methods. Environmetrics 19(3):221–233. doi:10.1002/env.865

    Article  Google Scholar 

  20. Kendall MG (1948) Rank correlation methods. Charles Griffin, London

    Google Scholar 

  21. Kim M, Lee S (2009) Test for tail index change in stationary time series with pareto-type marginal distribution. Bernoulli 15(2):325–356. doi:10.3150/08-BEJ157

    Article  Google Scholar 

  22. Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge

    Google Scholar 

  23. Koenker R, Bassett JG (1978) Regression quantiles. Econometrica 46(1):33–50. http://www.jstor.org/stable/1913643

  24. Kojadinovic I, Naveau P (2015) Nonparametric tests for change-point detection in the distribution of block maxima based on probability weighted moments. ArXiv 1507:06121

    Google Scholar 

  25. Lekina A, Chebana F, Ouarda T (2014) Weighted estimate of extreme quantile: an application to the estimation of high flood return periods. Stoch Environ Res Risk Assess 28(2):147–165. doi:10.1007/s00477-013-0705-2

    Article  Google Scholar 

  26. Madsen H, Rosbjerg D (1997) The partial duration series method in regional index-flood modeling. Water Resour Res 33(4):737–746. doi:10.1029/96WR03847

    Article  Google Scholar 

  27. Mediero L, Santillán D, Garrote L, Granados A (2014) Detection and attribution of trends in magnitude, frequency and timing of floods in spain. J Hydrol 517:1072–1088. doi:10.1016/j.jhydrol.2014.06.040; http://www.sciencedirect.com/science/article/pii/S00221694140

  28. Mu Y, He X (2007) Power transformation toward a linear regression quantile. J Am Stat Assoc 102(477):269–279. http://www.jstor.org/stable/27639838url

  29. Renard B, Lang M, Bois P (2006) Statistical analysis of extreme events in a non-stationary context via a bayesian framework: case study with peak-over-threshold data. Stoch Environ Res Risk Assess 21(2):97–112. doi:10.1007/s00477-006-0047-4

    Article  Google Scholar 

  30. Resnick SI (2007) Heavy-tail phenomena: probabilistic and statistical modeling. Springer, New York

    Google Scholar 

  31. Ribatet M, Sauquet E, Grésillon JM, Ouarda TBMJ (2007) A regional Bayesian POT model for flood frequency analysis. Stoch Environ Res Risk Assess 21(4):327–339. doi:10.1007/s00477-006-0068-z

    Article  Google Scholar 

  32. Roth M, Jongbloed G, Buishand T (2016) Threshold selection for regional peaks-over-threshold data. J Appl Stat 43(7):1291–1309. doi:10.1080/02664763.2015.1100589

    Article  Google Scholar 

  33. Rulfov Z, Buishand A, Roth M, Kysel J (2016) A two-component generalized extreme value distribution for precipitation frequency analysis. J Hydrol 534:659–668. doi:10.1016/j.jhydrol.2016.01.032; http://www.sciencedirect.com/science/article/pii/S0022169416000500

  34. Schumann A (2005) Hochwasserstatistische bewertung des augusthochwassers 2002 im einzugsgebiet der mulde unter anwendung der saisonalen statistik. Hydrologie und Wasserbewirtschaftung 49(4):200–206

    Google Scholar 

  35. Silva AT, Naghettini M, Portela MM (2016) On some aspects of peaks-over-threshold modeling of floods under nonstationarity using climate covariates. Stoch Environ Res Risk Assess 30(1):207–224. doi:10.1007/s00477-015-1072-y

    Article  Google Scholar 

  36. Strupczewski WG, Kochanek K, Bogdanowicz E, Markiewicz I (2012) On seasonal approach to flood frequency modelling. Part i: two-component distribution revisited. Hydrol Process 26(5):705–716. doi:10.1002/hyp.8179

    Article  Google Scholar 

  37. Tabari H, Taye MT, Willems P (2015) Statistical assessment of precipitation trends in the upper Blue Nile river basin. Stoch Environ Res Risk Assess 29(7):1751–1761. doi:10.1007/s00477-015-1046-0

    Article  Google Scholar 

  38. Teugels JL, Vanroelen G (2004) Box-cox transformations and heavy-tailed distributions. J Appl Probab 41:213–227. http://www.jstor.org/stable/3215978

  39. van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes—Springer series in statistics. Springer, New York

    Google Scholar 

  40. Wang HJ, Li D, He X (2012) Estimation of high conditional quantiles for heavy-tailed distributions. J Am Stat Assoc 107(500):1453–1464. doi:10.1080/01621459.2012.716382

    CAS  Article  Google Scholar 

  41. Wang HJ, Li D (2013) Estimation of extreme conditional quantiles through power transformation. J Am Stat Assoc 108(503):1062–1074. doi:10.1080/01621459.2013.820134

    CAS  Article  Google Scholar 

  42. Wang H, Tsai CL (2009) Tail index regression. J Am Stat Assoc 104(487):1233–1240. doi:10.1198/jasa.2009.tm08458

    CAS  Article  Google Scholar 

  43. Weissman I (1978) Estimation of parameters and large quantiles based on the k largest observations. J Am Stat Assoc 73(364):812–815. doi:10.1080/01621459.1978.10480104

    Google Scholar 

  44. Wi S, Valdés JB, Steinschneider S, Kim TW (2016) Non-stationary frequency analysis of extreme precipitation in South Korea using peaks-over-threshold and annual maxima. Stoch Environ Res Risk Assess 30(2):583–606. doi:10.1007/s00477-015-1180-8

    Article  Google Scholar 

  45. Yue S, Pilon P, Cavadias G (2002) Power of the Mann-Kendall and Spearman’s rho tests for detecting monotonic trends in hydrological series. J Hydrol 259(1–4):254–271. doi:10.1016/S0022-1694(01)00594-7, http://www.sciencedirect.com/science/article/pii/S00221694010

Download references

Acknowledgements

We would like to thank Professor Andreas Schumann from the Department of Civil Engineering, Ruhr-University Bochume, Germany, for providing us hydrological data and for helpful discussions. We are also grateful to two anonymous referees and an Associate Editor for their constructive comments on an earlier version of our work. The financial support of the Deutsche Forschungsgemeinschaft (SFB 823, “Statistical modelling of nonlinear dynamic processes”) is gratefully acknowledged.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Paul Kinsvater.

Appendices

Appendix: Quantile regression process

Let Y denote a random variable called response and \({\mathbf {X}}=(1,X_1,\ldots ,X_d)'\) a random vector called regressor with support covered by a compact set \({\mathscr {X}}\subset {\mathbb {R}}^{d+1}\). Throughout this section we suppose that the conditional distribution \(F(y|{\mathbf {x}})=P(Y\le y|\,{\mathbf {X}}={\mathbf {x}})\) of Y given \({\mathbf {X}}={\mathbf {x}}\) satisfies

$$F^{-1}(p|\,{\mathbf {x}})=\inf \{y:\,F(y|{\mathbf {x}})\ge p\}={\mathbf {\varvec{}x}}'{\varvec{\beta}} {}_p$$
(13)

for all \({\mathbf {x}}\in {\mathscr {X}}\), probabilities \(p\in I\subset [\varepsilon ,1-\varepsilon ]\) and an unknown vector-valued function \(p\mapsto {\varvec{\beta }}_p\), \(p\in I\), with \({\varvec{\beta }}_p\in {\mathbb {R}}^{d+1}\) called p-th regression quantile (Koenker and Bassett 1978). The left-hand side of (13) is called generalized inverse or quantile of \(F(\cdot |{\mathbf {x}})\) in \(p\in I\). It coincides with the usual inverse of a function, provided the inverse exists. Theoretical aspects and many applications of linear quantile regression are presented in Koenker (2005).

Let \((Y_i,{\mathbf {X}}_i)\), \(i=1,\ldots ,n\), denote independent copies of \((Y,{\mathbf {X}})\). Estimator

$${\hat{\varvec{\beta}}}_p=\underset{{\mathbf {b}}\in {\mathbb {R}}^{d+1}}{{{\text{arg}}}\,{{\text {min}}}}\;\sum _{i=1}^n\rho _p\left( Y_i-{\mathbf {X}}_i'\cdot {\mathbf {b}}\right)$$

with \(\rho _p(y)=y\cdot (p-{\mathbf {1}}_{\{y\le 0\}})\) is called empirical regression quantile. The following result establishes asymptotic normality of \(\sqrt{n}\left(\hat{\varvec{{{\beta}}}}_p-{\varvec{\beta }}_p\right)\) uniformly in \(p\in I\), i.e., in the function space \(\left( \ell ^\infty (I)\right) ^{d+1}\) (van der Vaart and Wellner 1996).

Theorem 1

Suppose that, uniformly in \({{\mathbf {x}}}\in {\mathscr {X}}\) , the conditional density \(f(y|{\mathbf {x}})\) exists, is bounded and uniformly continuous in y. Suppose further that \({\mathbb {E}}\Vert {\mathbf {X}}\Vert ^{2+\delta }<\infty\) for some \(\delta >0\) and that

$$J={\mathbb {E}}\left[ {\mathbf {X}}{\mathbf {X}}'\right] \,{\text { and }}\,H_p={\mathbb {E}}\left[ {\mathbf {X}}{\mathbf {X}}'\cdot f(F^{-1}(p|{\mathbf {X}})|{\mathbf {X}})\right]$$
(14)

exist with \(H_p\) positive definite for all \(p\in I\). Then, for \(n\rightarrow \infty\) , we have that

$$\left( H_p\sqrt{n}\left( {\hat{{\varvec{\beta }}}}_p-{\varvec{\beta }}_p\right) \right) _{p\in I}\xrightarrow{D}{\mathbb {Z}}$$
(15)

in \(\left( \ell ^\infty (I))\right) ^{d+1}\) , where \({\mathbb {Z}}\) is a centered Gaussian process with \({\mathbb {E}}[{\mathbb {Z}}(p){\mathbb {Z}}(q)']=(p\wedge q-p\cdot q)\cdot J\).

The previous result allows us to estimate the joint distribution of several empirical regression quantiles. Let \({\mathbf {p}}=\{p_1,\ldots ,p_\ell \}\subset I\) denote a set of probabilities. Then, for \(n\rightarrow \infty\) , we immediately obtain that

$$\sqrt{n}\left( {\hat{\varvec{\beta}}}_{p_1}-\varvec{\beta }_{p_1},\ldots ,{\hat{\varvec{\beta}}}_{p_\ell }-{\varvec{\beta }}_{p_\ell }\right) '{\mathop {\longrightarrow }\limits^{D}} {\mathscr {N}}\left( 0,\Sigma _{{\mathbf {p}}}\right) ,$$

where \(\Sigma _{{\mathbf {p}}}\) is defined piecewise through

$$\lim _{n\rightarrow \infty }{{{\mathrm{Cov}}}}\left[ \sqrt{n}\left( \hat{{\varvec{\beta }}}_{p_i}-{\varvec{\beta }}_{p_i}\right) ,\,\sqrt{n}\left( \hat{{\varvec{\beta }}}_{p_j}-{\varvec{\beta }}_{p_j}\right) \right] =(p_i\wedge p_j-p_i\cdot p_j)\cdot H_{p_i}^{-1}JH_{p_j}^{-1}.$$

This result is used to prove Proposition 1.

Conditional heavy-tail behavior: competing methods

Tail index regression (TIR) by Wang and Tsai (2009)

Wang and Tsai (2009) study model (3) with \(\alpha ({\mathbf {x}})=1/\gamma ({\mathbf {x}})=\exp ({\mathbf {x}}'\theta )\) for some unknown parameter vector \(\theta \in {\mathbb {R}}^{d+1}\). They propose the estimator

$$\hat{\theta }_{u_n}=\underset{\theta \in {\mathbb {R}}^{d+1}}{{{\text {arg}}}\,{{\text {min}}}}\;\sum _{i=1}^n\left[ \exp ({\mathbf {X}}_i'\theta )\cdot \log (Y_i/u_n)-{\mathbf {X}}_i'\theta \right] \cdot {\mathbf {1}}(Y_i>u_n)$$
(16)

with regressor independent threshold \(u_n\rightarrow \infty\) for \(n\rightarrow \infty\). (16) can be viewed as an approximate maximum likelihood approach based on the weak approximation of \(\log (Y/u_n)\) given \({\mathbf {X}}={\mathbf {x}}\) and \(Y>u_n\) to an exponential distribution with mean \(1/\alpha ({\mathbf {x}})\). Let \(k=\sum _{i=1}^n{\mathbf {1}}(Y_i>u_n)\) be the effective sample size in (16) and \(\hat{\Sigma }_{u_n}=\frac{1}{k}\sum _{i=1}^\mathbf {X}n{}_i{\mathbf {X}}_i'{\mathbf {1}}(Y_i>u_n)\). Under certain technical assumptions, Wang and Tsai (2009) prove

$$\sqrt{k}\cdot \hat{\Sigma }_{u_n}^{1/2}\cdot \left( \hat{{\varvec{\theta }}}-{\varvec{\theta }}\right) {\mathop {\longrightarrow }\limits^{D}} {\mathscr {N}}\left( {\mathbf {h}},I_{d+1}\right)$$
(17)

for some vector \({\mathbf {h}}\) and \((d+1)\)-dimensional identity matrix \(I_{d+1}\). The estimation of the bias \({\mathbf {h}}\) requires detailed information on the tail, which is hardly available and thus set to zero in applications.

However, Wang and Tsai (2009) do not consider regressor dependent thresholds \(u_n\) like in Sect. 2.1, which in practice is important to account for regression effects in e.g. the center of the distribution. In order to reduce this problem, we suggest to apply their estimation procedure on the sample \((Z_{k,j},{\mathbf {X}}_{k,j})\), \(j=1,\ldots ,k\), as given in Sect. 2.1. That is, replace \(\hat{\theta }_{u_n}\) by

$$\hat{{\varvec{\theta }}}_{k,n}^{TIR}=\underset{{\varvec{\theta }}\in {\mathbb {R}}^{d+1}}{{{\text {arg}}}\,{{\text {min}}}}\;\sum _{j=1}^k\left[ \exp ({\mathbf {X}}_{k,j}'{\varvec{\theta }})\cdot \log (Z_{k,j})-{\mathbf {X}}_{k,j}'{\varvec{\theta }}\right]$$

and \(\hat{\Sigma }_{u_n}\) by \(\hat{\Sigma }_{k,n}=\frac{1}{k}\sum _{j=1}^k{\mathbf {X}}_{k,j}{\mathbf {X}}_{k,j}'\).

Three-stage procedure by Wang and Li (2013)

An alternative regression approach focusing on high conditional quantiles \(F^{-1}_Y(p|\ {\mathbf {x}})\), \(p\in [1-\varepsilon ,1)\), for some small number \(\varepsilon >0\) is proposed in Wang and Li (2013). Their method is based on the assumption that

$$F^{-1}_{g_\lambda (Y)}(p|\ {\mathbf {x}})={\mathbf {x}}'\beta _p$$

holds for some \(\lambda \in {\mathbb {R}}\), Box-Cox transformation \(g_\lambda\), regression quantiles \(\beta _p\mathbb {R}in {}^{d+1}\) and all \(p\in [1-\varepsilon ,1)\). They propose an estimator of \(\gamma ({\mathbf {x}})\) based on a three-stage procedure:

  1. 1.

    Set \(p=p_{k,n}=\frac{n-k}{n+1}\) and compute \(\hat{\lambda }\) as in Sect. 2.1.

  2. 2.

    Let \(p_{n-j,n}=\frac{j}{n+1}\) for \(j=1,\ldots ,m\) with \(m=n-\lfloor {n}^{\eta} \rfloor\) and \(\eta =0.1\). For \(j=1,\ldots ,m\), estimate \(F_Y^{-1}(p_{n-j,n}|\ {\mathbf {x}})\) by the right hand side of (6) with \(g=g_{\hat{\lambda }}\) and \(p=p_{n-j,n}\). Denote these estimates by \(\hat{q}_j({\mathbf {x}})\), \(j=1,\ldots ,m\). If \(\hat{q}_j({\mathbf {x}})\) is not increasing in j, apply the rearrangement procedure of Chernozhukov et al. (2010).

  3. 3.

    For some integer \(k<m\), estimate \(\gamma ({\mathbf {x}})\) by

    $$\hat{\gamma }_{k,n}({\mathbf {x}})=\frac{1}{k-\lfloor n^\eta \rfloor }\sum _{j=\lfloor n^\eta \rfloor }^k\log (\hat{q}_{n-j})-\log (\hat{q}_{n-k}).$$

Thus \(\hat{\gamma }_{k,\mathbf {x}n}({})\) is Hill’s estimator (Hill 1975) applied to the sample of \(\hat{q}({\mathbf {x}})\) values, which can be seen as pseudo observations from \(F_Y(\ \cdot \ |\ {\mathbf {x}})\). Wang and Li (2013) also propose a test statistic

$$T_n=\frac{1}{n}\sum _{i=1}^n\left( \hat{\gamma }_{k,n}({\mathbf {X}}_i)-\hat{\gamma }_p\right) ^2,\ \hat{\gamma }_p=\frac{1}{n}\sum _{i=1}^n\hat{\gamma }({\mathbf {X}}_i),$$
(18)

as a test for hypothesis \({\mathscr {H}}_{0,tail}\) in (4). If \({\mathscr {H}}_{0,tail}\), \(E({\mathbf {X}})=(1,0,\ldots ,0)'\in {\mathbb {R}}^{d+1}\) and either \(\gamma ^*({\mathbf {x}})=0\) or a certain homogeneity assumption are met, Wang and Li (2013) show under additional technical assumptions that \(kT_n\xrightarrow{D}\gamma ^2\chi _d^2\) holds. They also derive the limiting distribution under heterogeneity, which in practice involves the estimation of additional parameters. For more details we refer to Wang and Li (2013, Th. 3.3 and Cor. 3.1).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kinsvater, P., Fried, R. Conditional heavy-tail behavior with applications to precipitation and river flow extremes. Stoch Environ Res Risk Assess 31, 1155–1169 (2017). https://doi.org/10.1007/s00477-016-1345-0

Download citation

Keywords

  • Heavy tails
  • Extreme value index
  • Regression model
  • Relative excesses
  • Flood frequency
  • Precipitation