Abstract
This paper derives finite sample results to assess the consistency of Generalized Pareto regression trees introduced by Farkas et al. (Insur. Math. Econ. 98:92–105, 2021) as tools to perform extreme value regression for heavy-tailed distributions. This procedure allows the constitution of classes of observations with similar tail behaviors depending on the value of the covariates, based on a recursive partition of the sample and simple model selection rules. The results we provide are obtained from concentration inequalities, and are valid for a finite sample size. A misspecification bias that arises from the use of a “Peaks over Threshold” approach is also taken into account. Moreover, the derived properties legitimate the pruning strategies, that is the model selection rules, used to select a proper tree that achieves a compromise between simplicity and goodness-of-fit. The methodology is illustrated through a simulation study, and a real data application in insurance for natural disasters.
Similar content being viewed by others
Availability of supporting data
Since the data were provided by a private partnership with the Mission Risques Naturels, the data are not publicly available.
References
Allen, D.M.: The relationship between variable selection and data agumentation and a method for prediction. Technometrics 16(1), 125–127 (1974). https://doi.org/10.1080/00401706.1974.10489157
Allouche, M., Girard S., Gobet E.: Estimation of extreme quantiles from heavy-tailed distributions with neural networks. Working paper or preprint (2022). https://hal.science/hal-03751980
Balkema, A.A., de Haan L.: Residual life time at great age. Ann. Probab. p 792–804 (1974). https://doi.org/10.1214/aop/1176996548
Barlow, A.M., Mackay, E., Eastoe, E., Jonathan, P.: A penalised piecewise-linear model for non-stationary extreme value analysis of peaks over threshold. Ocean Eng. 267, 113265 (2023)
Beirlant, J., Goegebeur, Y.: Local polynomial maximum likelihood estimation for Pareto-type distributions. J. Multivar. Anal. 89(1), 97–118 (2004). https://doi.org/10.1016/S0047-259X(03)00125-8
Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.L.: Statistics of extremes: Theory and Applications. John Wiley & Sons (2004). ISBN 978-0-471-97647-9
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC press (1984)
Catastrophe naturelle, assurance et prévention. Technical report, Mission Risques Naturels (2016). https://www.mrn.asso.fr/wp-content/uploads/2019/03/190603_mrn_guidecatnat_15x21cm_ecran.pdf
Carreau, J., Vrac, M.: Stochastic downscaling of precipitation with neural network conditional mixture models. Water Resour. Res. 47(10) (2011)
Charpentier, A., Barry, L., James, M.R.: Insurance against natural catastrophes: balancing actuarial fairness and social solidarity. Geneva Pap. Risk Insur. Issues Pract. (2021). ISSN 1018-5895, 1468-0440. https://doi.org/10.1057/s41288-021-00233-7
Chaudhuri, P.: Asymptotic consistency of median regression trees. J. Stat. Plan. Infer. 91(2), 229–238 (2000). https://doi.org/10.1016/S0378-3758(00)00180-4
Chaudhuri, P., Loh, W.-Y.: Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli p 561–576, (2002).
Chavez-Demoulin, V., Embrechts, P., Hofert, M.: An extreme value approach for modeling operational risk losses depending on covariates. J. Risk Insur. 83(3), 735–776 (2015). https://doi.org/10.1111/jori.12059
Chernozhukov, V.: Extremal quantile regression. Ann. Stat. 33(2), 806–839 (2005)
Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, London (2001)
Davison, A.C., Smith, R.L.: Models for exceedances over high thresholds. J. R. Stat. Soc. Series B Methodol 52(3), 393–425 (1990). https://doi.org/10.1111/j.2517-6161.1990.tb01796.x
De’ath, G., Fabricius, K.E.: Classification and regression trees: A powerful yet simple technique for ecological data analysis. Ecology 81(11), 3178–3192 (2000). https://doi.org/10.1890/0012-9658(2000)081. [3178:CARTAP] 2.0. CO;2
Einmahl, U., Mason, D.M.: Uniform in bandwidth consistency of kernel-type function estimators. Ann. Stat. 33(3), 1380–1403 (2005). https://doi.org/10.1214/009053605000000129
Embrechts, P., Klüppelberg, C., Mikosch, T.: Modelling extremal events: for insurance and finance, volume 33. Springer Science & Business Media (2013)
Farkas, S., Lopez, O., Thomas, M.: Cyber claim analysis using generalized pareto regression trees with applications to insurance. Insur. Math. Econ. 98, 92–105 (2021). https://doi.org/10.1016/j.insmatheco.2021.02.009
Gardes, L., Stupfler, G.: An integrated functional weissman estimator for conditional extreme quantiles. REVSTAT-Stat. J. 17(1), 109–144 (2019)
Gey, S., Nedelec, E.: Model selection for cart regression trees. IEEE Trans. Inf. Theory 51(2), 658–670 (2005). https://doi.org/10.1109/TIT.2004.840903
Gnecco, N., Terefe, E.M., Engelke, S.: Extremal random forests (2022). arXiv preprint arXiv:2201.12865
González, C., Mira-McWilliams, J., Juárez, I.: Important variable assessment and electricity price forecasting based on regression tree models: Classification and regression trees, Bagging and Random Forests. IET Gener. Transm. Distrib. 9(11), 1120–1128 (2015). https://doi.org/10.1049/iet-gtd.2014.0655
Huang, W.K., Nychka, D.W., Zhang, H.: Estimating precipitation extremes using the log-histospline. Environmetrics 30(4), e2543 (2019)
Katz, R.W., Parlange, M.B., Naveau, P.: Statistics of extremes in hydrology. Adv. Water Resour. 25(8–12), 1287–1304 (2002). https://doi.org/10.1016/S0309-1708(02)00056-8
Loh, W.-Y.: Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(1), 14–23 (2011). https://doi.org/10.1002/widm.8
Loh, W.-Y.: Fifty years of classification and regression trees. Int. Stat. Rev. 82(3), 329–348 (2014). https://doi.org/10.1111/insr.12016
Lopez, O., Milhaud, X., Thérond, P.-E.: Tree-based censored regression with applications in insurance. Electron. J. Stat. 10(2), 2685–2716 (2016). https://doi.org/10.1214/16-EJS1189
Pasche, O.C., Engelke, S.: Neural networks for extreme quantile regression with an application to forecasting of flood risk (2022). arXiv preprint arXiv:2208.07590
Pickands, J.: Statistical inference using extreme order statistics. Ann. Stat. 3(1), 119–131 (1975)
Richards, J., Huser, R.: A unifying partially-interpretable framework for neural network-based extreme quantile regression (2022). arXiv preprint arXiv:2208.07581
Rietsch, T., Naveau, P., Gilardi, N., Guillou, A.: Network design for heavy rainfall analysis. J. Geophys. Res. Atmos. 118(23), 13–075 (2013)
Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., Chica-Rivas, M.: Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 71, 804–818 (2015). https://doi.org/10.1016/j.oregeorev.2015.01.001
Ross, E., Sam, S., Randell, D., Feld, G., Jonathan, P.: Estimating surge in extreme north sea storms. Ocean Eng. 154, 430–444 (2018)
Scarrott, C., MacDonald, A.: A review of extreme value threshold estimation and uncertainty quantification. REVSTAT-Stat. J. 10(1), 33–60 (2012)
Smith, R.L.: Threshold methods for sample extremes. In Statistical extremes and applications (1984). p 621–638. Springer
Smith, R.L.: Extreme value analysis of environmental time series: an application to trend detection in ground-level ozone. Stat. Sci. p 367–377 (1989)
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Series B Methodol. 36(2), 111–133 (1974)
Su, X., Wang, M., Fan, J.: Maximum likelihood regression trees. J. Comput. Graph. Stat. 13(3), 586–598 (2004). https://doi.org/10.1198/106186004X2165
Talagrand, M.: Sharper bounds for gaussian and empirical processes. Ann. Probab. p 28–76 (1994)
Tencaliec, P., Favre, A.-C., Naveau, P., Prieur, C., Nicolet, G.: Flexible semiparametric generalized pareto modeling of the entire range of rainfall amount. Environmetrics 31(2), e2582 (2020)
van der Vaart, A.W.: Asymptotic statistics, volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press (1998)
Velthoen, J., Cai, J.-J., Jongbloed, G., Schmeits, M.: Improving precipitation forecasts using extreme quantile regression. Extremes 22(4), 599–622 (2019). https://doi.org/10.1007/s10687-019-00355-1
Velthoen, J., Dombry, C., Cai, J.-J., Engelke, S.: Gradient boosting for extreme quantile regression (2021). arXiv preprint arXiv:2103.00808
Wang, H.J., Li, D., He, X.: Estimation of high conditional quantiles for heavy-tailed distributions. J. Am. Stat. Assoc. 107(500), 1453–1464 (2012). https://doi.org/10.1080/01621459.2012.716382
Youngman, B.D.: Generalized additive models for exceedances of high thresholds with an application to return level estimation for us wind gusts. J. Am. Stat. Assoc. 114(528), 1865–1879 (2019)
Acknowledgements
The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR) under reference ANR-20-CE40-0025-01 (T-REX project).
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
All the authors wrote the main manuscript text, and the supplementary material All the authors prepared the all figures and tables All the authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethical approval and Consent to participate
All the authors approve and consent to participate.
Human and animal ethics
Not applicable.
Consent for publication
All the authors consent for publication.
R codes
The R codes are publicly available at https://github.com/antoine-heranval/Generalized-Pareto-Regression-Trees-for-extreme-event-analysis.
Competing interests
The authors have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Proofs
In this Section, we present in details the proof of the results presented throughout the paper. Concentration inequalities required to obtain the results are presented in Section 1. These inequalities are used to obtain deviation bounds in Section 2, which are the key ingredients of the proof of Theorem 1 (Section 3), Corollary 8 (Section 2), and Theorem 3 (Section 5). Section 2 shows some results on covering numbers that are required to control the complexity of some classes of functions considered in the proofs. Some technical lemmas are gathered in Section 3.
1.1 Concentration inequalities
The proofs of the main results are mostly based on concentration inequalities. The following inequality was proved initially Talagrand (1994), (see also (Einmahl and Mason 2005)).
Proposition 4
Let \((\textbf{V}_i)_{1\le i \le n}\) denote i.i.d. replications of a random vector \(\textbf{V},\) and let \((\varepsilon _i)_{1\le i \le n}\) denote a vector of i.i.d. Rademacher variables (that is, \(\mathbb {P}(\varepsilon _i=-1)=\mathbb {P}(\varepsilon _i=1)=1/2)\) independent from \((\textbf{V}_i)_{1\le i \le n}.\) Let \(\mathfrak {F}\) be a pointwise measurable class of functions bounded by a finite constant \(M_0.\) Then, for all t,
with \(v_{\mathfrak {F}}=\sup _{\varphi \in \mathfrak {F}}\textrm{Var}(\Vert \varphi (\textbf{V})\Vert _{\infty }),\) and where \(A_1\) and \(A_2\) are universal constants.
The difficulty in using Proposition 4 comes from the need to control the symmetrized quantity \(\mathbb {E}\left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\left\| \sum _{i=1}^n \varphi (\textbf{V}_i)\varepsilon _i\right\| \right] .\) Proposition 5 is due to Einmahl and Mason (2005) and allows this control via some assumptions on the considered class of functions \(\mathfrak {F}\).
We first need to introduce some notations regarding covering numbers of a class of functions. More details can be found for example in ((van der Vaart 1998), Chapter2.6). Let us consider a class of functions \(\mathfrak {F}\) with envelope \(\Phi\) (which means that for (almost) all v, \(\varphi _{\varvec{\theta }}\in \mathfrak {F},\) \(|f(v)|\le \Phi (v)\)). Then, for any probability measure \(\mathbb {Q},\) introduce \(N(\varepsilon ,\mathfrak {F},\mathbb {Q})\) the minimum number of \(L^2(\mathbb {Q})\) balls of radius \(\varepsilon\) to cover the class \(\mathfrak {F}.\) Then, define
Proposition 5
Let \(\mathfrak {F}\) be a point-wise measurable class of functions bounded by \(M_0\) with envelope \(\Phi\) such that, for some constants \(A_3, \alpha \ge 1,\) and \(0\le \sqrt{v} \le M_0,\) we have
-
(i)
\(\mathcal{N}_{\Phi }(\varepsilon ,\mathfrak {F})\le A_3 \varepsilon ^{-\alpha },\) for \(0<\varepsilon <1,\)
-
(ii)
\(\sup _{\varphi \in \mathfrak {F}}\mathbb {E}\left[ \varphi (\textbf{V})^2\right] \le v,\)
-
(iii)
\(M_0\le \frac{1}{4\alpha ^{1/2}}\sqrt{nv/\log (A_4M_0/\sqrt{v}) },\) with \(A_4=\textrm{max}(e,A_3^{1/\alpha }).\)
Then, for some absolute constant \(A_5,\)
1.2 Deviation results
We first introduce some notations that will be used throughout Sections “Deviation results” to “Covering numbers”. In the following, \(\varphi _{\varvec{\theta }}\) is a function indexed by \(\varvec{\theta }=(\sigma ,\gamma )^{t}\) denoting either \(\phi (\cdot ,\varvec{\theta })\), \(\partial _{\sigma } \phi (\cdot ,\varvec{\theta }),\) or \(\partial _{\gamma }\phi (\cdot ,\varvec{\theta }).\)
We consider in the following the class of functions \(\mathfrak {F}\) defined as
By Lemma 11, the functions \(y \mapsto \partial _{\sigma } \phi (y-u,\varvec{\theta })\) and \(y \mapsto \partial _{\gamma } \phi (y-u,\varvec{\theta })\) are uniformly bounded (eventually up to some multiplication by a constant) by \(\Phi (y)=\log (1+wy),\) where \(w=\gamma _{\textrm{max}}/\sigma _{\textrm{min}}\). On the other hand, \(y \mapsto \phi (y-u,\varvec{\theta })\) is bounded by \(\log \sigma _n+\Phi (y)=O(\log (k_n))+\Phi (y).\)
Next, for \(\ell =1,\ldots ,K\), and \(\varvec{\theta }=(\sigma ,\gamma )^t \in \varvec{\Theta }\), let
be the (normalized) negative GP log-likelihood associated with the leaf \(\ell\) of a tree \(T_K\) with set of K leaves \((\mathcal{T}_\ell )_{\ell =1,\ldots ,K}\). Let \(L^{\ell }(\varvec{\theta },u) = \mathbb {E}[L_n^{\ell }(\varvec{\theta },u)]\). The key results behind Theorems 1 and 3 relies on studying the deviations of the processes, indexed by \(\varvec{\theta },\; u\) and \(\ell\),
Let \(M_n = \beta \log k_n \le \beta a_1 \log (n)\) with \(\beta >0\) and \(a_1>0\) (with \(a_1\) defined in Assumption 1). We study the deviations of these processes by decomposing \(\mathcal{W}_i^\ell (\varvec{\theta },u),\) for \(i=0,1,\) (which is a sum of i.i.d. observations) into two sums.
-
the first one gathers observations smaller than some bound (more precisely, such that \(\Phi (Y_i)\le M_n\)), which is considered in Theorem 6. Since these observations are bounded (even if this bound in fact depends on n and can tend to infinity when n grows), we can apply a concentration inequality such as the one of Section 1. Let us stress that \(\sup _{\varphi _{\varvec{\theta }} \in \mathfrak F} \Vert \varphi _{\varvec{\theta }}(y)\textbf{1}_{\Phi (y)\le M_n}\Vert _{\infty } \le M_n\);
-
in the second one (Theorem 7), we consider the observations larger than this bound, and control them through the fact that the function \(\Phi\) has finite exponential moments (see Lemma 11).
Corollary 8, which provides deviation bounds for estimation errors in the leaves of the tree, is then a direct consequence.
Theorem 6
Let
If \(k_n = O(n^{a_1})\) with \(a_1 >0\) (Assumption 1), then, for \(t \ge {\mathfrak c_1} (\log k_n)^{1/2} k_n^{-1/2}\),
Proof
From Proposition 4,
with \(v_{\mathfrak F} = \sup _{\varphi \in \mathfrak F} \textrm{var}\left( \vert \varphi (Y) \vert \right)\). From Lemma 12, \(v_{\mathfrak {F}}\le M_n^2k_nn^{-1},\) which shows that the first exponential term on the right-hand side of (12) is smaller than
We can now apply Proposition 5 (combined with Lemma 10) to this class of functions with \(v=M_n^2k_nn^{-1}\) and \(M_0=M_n.\) Hence,
where \({ A'_6}>0\) and \(\mathfrak {s}_n=\log (\sigma _n^{\alpha } K^{4(d+1)(d+2)}n/k_n)\) (\(\alpha >0\) being defined in Lemma 10). From Assumption 1, we see that \(\mathfrak {s}_n=O(\log (k_n))\) (let us recall that K is necessarily less than n). Whence, if \({ \mathfrak c_1}= 2A_1{A'_6}\), for \(t\ge {\mathfrak c_1} \left\{ \log \left( k_n\right) \right\} ^{1/2} k_n^{-1/2}\),
Equation (11) follows from (12) and (13) with \({{C}_1}=A_2A_1^{-2}/4\) and \({ C_2}=A_2A_1^{-1}/2.\)
Theorem 7
Let
If \(k_n = O(a_1)\) with \(a_1>0\) (Assumption 1), then there exists \(\rho _0>0\) (Lemma 11) such that for \(\beta a_1 \ge 10/\rho _0,\) and \(t\ge {\mathfrak c_2} k_n^{-1/2}\),
Proof
Let \(\beta '=\beta a_2.\) \(\overline{\mathcal{Z}}(M_n)\) is upper-bounded by
A bound for \(E_{1,n}=\mathbb {E} \left[ \Phi (Y) \textbf{1}_{\Phi (Y) \ge M_n}\textbf{1}_{Y\ge u_{\textrm{min}}}\right]\) is obtained from Lemma 13, and \(nE_{1,n}/k_{n}\le \mathfrak {e}_1 k_n^{-1/2}\) if \(\beta ' \ge 2/\rho _0.\)
Next, from Markov inequality,
From Lemma 13, we get
Each of these terms is bounded by \(\textrm{max}(\mathfrak {e}_3,\mathfrak {e}_2 \mathfrak {e}_1,\mathfrak {e}_1^3)k_n^{-5/2}\) for \(\beta ' \ge 10/\rho _0.\) Thus, for \(t\ge 2 \mathfrak {e}_1 k_n^{-1/2}\) and \(\beta ' \ge 10/\rho _0,\)
We now apply these results to deduce deviation bounds on the estimators \(\widehat{\varvec{\theta }}_{\ell }\) in the leaves of the tree.
Corollary 8
Under the assumptions of Theorems 6 and 7 and Assumption 2, for \(t\ge \mathfrak c_3 (\log k_n)^{1/2}k_n^{-1/2},\)
Proof
For \(1 \le \ell \le K\) and \({ u_{\textrm{min}} \le u \le u_{\textrm{max}}}\), let \(\varvec{\theta }=(s,\gamma )^{t}\) and, for \(\ell = 1,\ldots , K\), \(\varvec{\theta }^{*K}_\ell =(s^{*K}_\ell (u),\gamma ^{*K}_\ell (u))^{t},\) and let
From Taylor series,
for some parameters \(\tilde{\sigma }_j\) (resp. \(\tilde{\gamma }_j\)) between \(\sigma\) and \(\sigma ^{*K}_\ell (u)\) (resp. \(\gamma\) and \(\gamma ^{*K}_\ell (u)\)). From Assumption 2, we get, for all \(\ell =1,\ldots , K\),
Hence, for all \(\ell =1,\ldots , K\),
Since for all \(\ell =1,\ldots , K\), \(\nabla _{\varvec{\theta }} L_n^\ell (\widehat{\varvec{\theta }}^K)=0,\) \(\mathcal{W}_1^\ell (\widehat{\varvec{\theta }}^K(u),u)=-\frac{n}{k_n}\nabla _{\varvec{\theta }}L^\ell (\widehat{\varvec{\theta }}^K,u).\) Hence,
and the right-hand side is bounded by
The result follows from Theorem 6 and 7.
1.3 Proof of Theorem 1
The proof of the first part of Theorem 1 then consists in gathering the results on the leaves obtained in Corollary 8. Let \({ u_{\textrm{min}} \le u \le u_{\textrm{max}}}\),
Hence
The results follows from Corollary 8, and from the assumption on \(K\le K_{\textrm{max}}=O(k_n^3)\) (Assumption 1).
To prove the second part of Theorem 1, write
Let \(t_n=c_1 K (\log k_n)k_n^{-1},\) then
We now use Theorem 1 to bound the integral on the right-hand side. Since \(\int _0^{\infty }\exp (-a t)dt=\frac{1}{a},\) \(\int _0^{\infty } \exp (-a^{1/2} t^{1/2})dt=\frac{2}{a},\) and \(\int _1^{\infty }t^{-3/2}dt=2,\) we get
1.4 Proof of Proposition 2
For all \(\textbf{x}\),
Now, from Taylor series, for \(\ell =1,\ldots , K\), conditionally on \(\textbf{X}\in \mathcal{T}_\ell\),
for some parameters \(\tilde{\sigma }_j\) (resp. \(\tilde{\gamma }_j\)) between \(\sigma _0(\textbf{X})\) and \(\sigma ^{*K}_\ell (u)\) (resp. \(\gamma _0(\textbf{X})\) and \(\gamma ^{*K}_\ell (u)\)).
Thus, under Assumption 2,
where Z is a random variable distributed according to the distribution \(F_u\) defined in Section 2.1 with \(\sigma _0(\textbf{X}) =u\gamma _0(\textbf{X})\) and with
Under Assumption 3, we have
and then
Consequently,
and
Hence, conditionally on \(\textbf{X}\in \mathcal{T}_\ell\),
where \(\mathfrak C_2(u)=\frac{1}{\mathfrak C_1}\frac{1}{\gamma _{\textrm{min}}}\textrm{max}\left( 1+\frac{1}{u}+\frac{1}{u\gamma _{\textrm{min}}},1+\frac{1}{\gamma _{\textrm{min}}}+\frac{\gamma _{\textrm{max}}}{\gamma _{\textrm{min}}} \right)\).
Finally, for all \(\textbf{x}\),
1.5 Proof of Theorem 3
First, let us introduce some notations that are needed in the proof.
Define the log-likelihood \(L_n(T_K,u)\) associated with a tree \(T_K\) with K leaves \((\mathcal{T}_{\ell })_{\ell = 1,\ldots , K}\) and with parameters \(\varvec{\theta }(u)=\left( \varvec{\theta }^K_{\ell }(u)\right) _{\ell =1,\ldots ,K}\)
and \(L(T_K,u) = \mathbb {E}[L_n(T_K,u)]\). Finally, for two trees T and \(T'\), \(\Delta L_n(T, T') = L_n(T,u) - L_n(T',u)\) and similarly, \(\Delta L(T, S) = L(T,u) - L(T',u)\).
The following lemma will be needed to prove Theorem 3.
Lemma 9
Let \(\mathfrak D = \inf _u\inf _{K < K^*} \Delta L(T^*,T^*_K)\) and \(u \in [u_{\textrm{min}},u_{\textrm{max}}]\) fixed. Suppose that there exists a constant \(c_2>0\) such that the penalization constant \(\lambda\) satisfies
then, under Assumptions 1 and 2, for \(K> K^*\),
and, for \(K<K^*,\)
Proof
Let \(u \in [u_{\textrm{min}},u_{\textrm{max}}]\) fixed. If \(\widehat{K}=K,\) this means that
Decompose
Since \(L_n(T^*,u)-L_n(T_{K^*},u)<0,\)
For \(K > K^*,\) \(T^*_K=T^*,\) hence,
For \(K>K^*\), a bound is then obtained from Theorems 6 and 7 if \(\lambda (K-K^*) \ge {c_1} \{\log (k_n)\}^{1/2} k_n^{-1/2}\), that is \(\lambda \ge {c_1} \{\log k_n\}^{1/2} k_n^{-1/2}\).
Now, for \(K<K^*,\)
where \(\mathfrak D = \inf _{K<K^*, u\in [u_{\textrm{min}}, u_{\textrm{max}}]} \mathfrak D(K^*,K),\) Hence,
These two probabilities can be bounded using Theorems 6 and 7 provided that, for all \(K<K^*,\)
that is,
We are now ready to prove Theorem 3. Let \(u \in [u_{\textrm{min}} , u_{\textrm{max}}]\) fixed.
Firstly, from Theorem 1,
Secondly, recall that
where \(\mu (\mathcal{T}_\ell ) = \mathbb {P}(\textbf{X}\in \mathcal{T}_\ell )\). Following the same idea as in the proof of Proposition 2, from Taylor series , under Assumptions 2 and 3,
Hence,
Finally,
for some constant \(\mathcal{C}_5.\) .
Appendix B: Covering numbers
Lemma 10
Following the notations of the proof of Theorem 6, the class of functions \(\mathfrak {F}\) satisfies
for some constants \(\mathfrak C_4>0\) and \(\alpha >0\) (not depending on n nor K).
Proof
Let
for \(z>0.\) For \(\varvec{\theta }\) and \(\varvec{\theta }'\) in \(\mathcal{S}\times \Gamma ,\) we have (from a straightforward Taylor expansion),
for some constants C and \(C'.\) More precisely, one can take
Next, observe that
where \(C''=4\gamma ^2_{\textrm{max}}/[\gamma _{\textrm{min}}\sigma ^3].\) Which leads to
for some constant \(C_g>0.\) Similarly,
Next,
where \(C_7=5/(\gamma _{\textrm{min}}\sigma _{\textrm{min}}),\) leading to, for some \(C_h>0,\)
On the other hand,
and
Define \(\mathfrak {F}_1=\{g_{\varvec{\theta }}(\cdot -u):\varvec{\theta } \in \mathcal{S}\times \Gamma , u\in [u_{\textrm{min}},u_{\textrm{max}}]\},\) \(\mathfrak {F}_2=\{h_{\varvec{\theta }}(\cdot -u):\varvec{\theta } \in \mathcal{S}\times \Gamma , u\in [u_{\textrm{min}},u_{\textrm{max}}]\},\) and \(\mathfrak {F}_3=\{\phi (\cdot -u,\varvec{\theta }):\varvec{\theta } \in \mathcal{S}\times \Gamma , u\in [u_{\textrm{min}},u_{\textrm{max}}]\}.\) From ((van der Vaart 1998), Example 19.7), we get, for \(i=1,...,3,\)
for some \(\alpha >0\) and constants \(\varphi _i.\)
On the other hand, let
and
From Lemma 4 in (Lopez et al. 2016), we have \(N(\varepsilon ,\mathfrak {F}_4)\le m^k K^{\alpha _2}\varepsilon ^{-\alpha _2},\) where \(\alpha _2=4(d+1)(d+2),\) and where k is the number of discrete components taking at most m modalities. On the other hand, from Example 19.6 in (van der Vaart 1998), \(N(\varepsilon ,\mathfrak {F}_5)\le 2\varepsilon ^{-2}.\)
From ((Einmahl and Mason 2005), Lemma A.1), we get, for \(i=1,\ldots ,3,\)
Multiplying \(\mathfrak {F}_i\mathfrak {F}_4\mathfrak {F}_5\) by a single indicator function \(\textbf{1}_{\Phi (Y_i)\le M_n}\) does not change the covering number, and the result follows.
Appendix C: Technical Lemmas
Lemma 11
-
1.
The derivatives of the functions \(y\rightarrow \phi (y-u,\varvec{\theta })\) with respect to \(\varvec{\theta }\) are uniformly bounded by
$$\begin{aligned} \Phi (y)=C(1+\log (1+wy)), \end{aligned}$$where C is a constant (not depending on n), and \(w=\gamma _{\textrm{max}}/\sigma _{\textrm{min}}.\)
-
2.
There exists a certain \(\rho _0>0\) such that
$$\begin{aligned} m_{\rho _0} := \mathbb {E}\left[ \exp (\rho _0\Phi (Y))\right] <\infty . \end{aligned}$$
Proof
To proof point 1, it is sufficient to derive the GP likelihood and see that they can be upper-bounded by \(\Phi\).
Now, for point 2, note that for all \(\textbf{x}\), \(\gamma (\textbf{x}) \ge \gamma _{\textrm{min}} >0\), Y is heavy-tailed random variable, then \(\log (Y)\), and thus \(\Phi (Y)\), is a light-tailed random variable. Thus \(\Phi (Y)\) has finite exponential moments.
Lemma 12
With \(v_{\mathfrak {F}}\) defined in Proposition 4,
Proof
We have
Lemma 13
Define, for \(j = 1, 2, 3\),
Under the assumptions of Theorem 7,
Proof
Applying twice Cauchy-Schwarz inequality leads to
Next, from Chernoff inequality,
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Farkas, S., Heranval, A., Lopez, O. et al. Generalized pareto regression trees for extreme event analysis. Extremes (2024). https://doi.org/10.1007/s10687-024-00485-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10687-024-00485-1