Generalized pareto regression trees for extreme event analysis

Farkas, Sébastien; Heranval, Antoine; Lopez, Olivier; Thomas, Maud

doi:10.1007/s10687-024-00485-1

Generalized pareto regression trees for extreme event analysis

Published: 23 March 2024

(2024)
Cite this article

Extremes Aims and scope Submit manuscript

Sébastien Farkas¹,
Antoine Heranval^1,2,3,
Olivier Lopez^1,3 &
…
Maud Thomas¹

74 Accesses
Explore all metrics

Abstract

This paper derives finite sample results to assess the consistency of Generalized Pareto regression trees introduced by Farkas et al. (Insur. Math. Econ. 98:92–105, 2021) as tools to perform extreme value regression for heavy-tailed distributions. This procedure allows the constitution of classes of observations with similar tail behaviors depending on the value of the covariates, based on a recursive partition of the sample and simple model selection rules. The results we provide are obtained from concentration inequalities, and are valid for a finite sample size. A misspecification bias that arises from the use of a “Peaks over Threshold” approach is also taken into account. Moreover, the derived properties legitimate the pruning strategies, that is the model selection rules, used to select a proper tree that achieves a compromise between simplicity and goodness-of-fit. The methodology is illustrated through a simulation study, and a real data application in insurance for natural disasters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

The JASP guidelines for conducting and reporting a Bayesian analysis

Article Open access 09 October 2020

An Introduction to Machine Learning for Panel Data

Article 01 February 2021

Availability of supporting data

Since the data were provided by a private partnership with the Mission Risques Naturels, the data are not publicly available.

References

Allen, D.M.: The relationship between variable selection and data agumentation and a method for prediction. Technometrics 16(1), 125–127 (1974). https://doi.org/10.1080/00401706.1974.10489157
Article MathSciNet Google Scholar
Allouche, M., Girard S., Gobet E.: Estimation of extreme quantiles from heavy-tailed distributions with neural networks. Working paper or preprint (2022). https://hal.science/hal-03751980
Balkema, A.A., de Haan L.: Residual life time at great age. Ann. Probab. p 792–804 (1974). https://doi.org/10.1214/aop/1176996548
Barlow, A.M., Mackay, E., Eastoe, E., Jonathan, P.: A penalised piecewise-linear model for non-stationary extreme value analysis of peaks over threshold. Ocean Eng. 267, 113265 (2023)
Article Google Scholar
Beirlant, J., Goegebeur, Y.: Local polynomial maximum likelihood estimation for Pareto-type distributions. J. Multivar. Anal. 89(1), 97–118 (2004). https://doi.org/10.1016/S0047-259X(03)00125-8
Article MathSciNet Google Scholar
Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.L.: Statistics of extremes: Theory and Applications. John Wiley & Sons (2004). ISBN 978-0-471-97647-9
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and regression trees. CRC press (1984)
Catastrophe naturelle, assurance et prévention. Technical report, Mission Risques Naturels (2016). https://www.mrn.asso.fr/wp-content/uploads/2019/03/190603_mrn_guidecatnat_15x21cm_ecran.pdf
Carreau, J., Vrac, M.: Stochastic downscaling of precipitation with neural network conditional mixture models. Water Resour. Res. 47(10) (2011)
Charpentier, A., Barry, L., James, M.R.: Insurance against natural catastrophes: balancing actuarial fairness and social solidarity. Geneva Pap. Risk Insur. Issues Pract. (2021). ISSN 1018-5895, 1468-0440. https://doi.org/10.1057/s41288-021-00233-7
Chaudhuri, P.: Asymptotic consistency of median regression trees. J. Stat. Plan. Infer. 91(2), 229–238 (2000). https://doi.org/10.1016/S0378-3758(00)00180-4
Article MathSciNet Google Scholar
Chaudhuri, P., Loh, W.-Y.: Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli p 561–576, (2002).
Chavez-Demoulin, V., Embrechts, P., Hofert, M.: An extreme value approach for modeling operational risk losses depending on covariates. J. Risk Insur. 83(3), 735–776 (2015). https://doi.org/10.1111/jori.12059
Article Google Scholar
Chernozhukov, V.: Extremal quantile regression. Ann. Stat. 33(2), 806–839 (2005)
Article MathSciNet Google Scholar
Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, London (2001)
Book Google Scholar
Davison, A.C., Smith, R.L.: Models for exceedances over high thresholds. J. R. Stat. Soc. Series B Methodol 52(3), 393–425 (1990). https://doi.org/10.1111/j.2517-6161.1990.tb01796.x
Article MathSciNet Google Scholar
De’ath, G., Fabricius, K.E.: Classification and regression trees: A powerful yet simple technique for ecological data analysis. Ecology 81(11), 3178–3192 (2000). https://doi.org/10.1890/0012-9658(2000)081. [3178:CARTAP] 2.0. CO;2
Einmahl, U., Mason, D.M.: Uniform in bandwidth consistency of kernel-type function estimators. Ann. Stat. 33(3), 1380–1403 (2005). https://doi.org/10.1214/009053605000000129
Article MathSciNet Google Scholar
Embrechts, P., Klüppelberg, C., Mikosch, T.: Modelling extremal events: for insurance and finance, volume 33. Springer Science & Business Media (2013)
Farkas, S., Lopez, O., Thomas, M.: Cyber claim analysis using generalized pareto regression trees with applications to insurance. Insur. Math. Econ. 98, 92–105 (2021). https://doi.org/10.1016/j.insmatheco.2021.02.009
Gardes, L., Stupfler, G.: An integrated functional weissman estimator for conditional extreme quantiles. REVSTAT-Stat. J. 17(1), 109–144 (2019)
MathSciNet Google Scholar
Gey, S., Nedelec, E.: Model selection for cart regression trees. IEEE Trans. Inf. Theory 51(2), 658–670 (2005). https://doi.org/10.1109/TIT.2004.840903
Article MathSciNet Google Scholar
Gnecco, N., Terefe, E.M., Engelke, S.: Extremal random forests (2022). arXiv preprint arXiv:2201.12865
González, C., Mira-McWilliams, J., Juárez, I.: Important variable assessment and electricity price forecasting based on regression tree models: Classification and regression trees, Bagging and Random Forests. IET Gener. Transm. Distrib. 9(11), 1120–1128 (2015). https://doi.org/10.1049/iet-gtd.2014.0655
Article Google Scholar
Huang, W.K., Nychka, D.W., Zhang, H.: Estimating precipitation extremes using the log-histospline. Environmetrics 30(4), e2543 (2019)
Article MathSciNet Google Scholar
Katz, R.W., Parlange, M.B., Naveau, P.: Statistics of extremes in hydrology. Adv. Water Resour. 25(8–12), 1287–1304 (2002). https://doi.org/10.1016/S0309-1708(02)00056-8
Article Google Scholar
Loh, W.-Y.: Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1(1), 14–23 (2011). https://doi.org/10.1002/widm.8
Article Google Scholar
Loh, W.-Y.: Fifty years of classification and regression trees. Int. Stat. Rev. 82(3), 329–348 (2014). https://doi.org/10.1111/insr.12016
Article MathSciNet Google Scholar
Lopez, O., Milhaud, X., Thérond, P.-E.: Tree-based censored regression with applications in insurance. Electron. J. Stat. 10(2), 2685–2716 (2016). https://doi.org/10.1214/16-EJS1189
Article MathSciNet Google Scholar
Pasche, O.C., Engelke, S.: Neural networks for extreme quantile regression with an application to forecasting of flood risk (2022). arXiv preprint arXiv:2208.07590
Pickands, J.: Statistical inference using extreme order statistics. Ann. Stat. 3(1), 119–131 (1975)
MathSciNet Google Scholar
Richards, J., Huser, R.: A unifying partially-interpretable framework for neural network-based extreme quantile regression (2022). arXiv preprint arXiv:2208.07581
Rietsch, T., Naveau, P., Gilardi, N., Guillou, A.: Network design for heavy rainfall analysis. J. Geophys. Res. Atmos. 118(23), 13–075 (2013)
Article Google Scholar
Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., Chica-Rivas, M.: Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 71, 804–818 (2015). https://doi.org/10.1016/j.oregeorev.2015.01.001
Article Google Scholar
Ross, E., Sam, S., Randell, D., Feld, G., Jonathan, P.: Estimating surge in extreme north sea storms. Ocean Eng. 154, 430–444 (2018)
Article Google Scholar
Scarrott, C., MacDonald, A.: A review of extreme value threshold estimation and uncertainty quantification. REVSTAT-Stat. J. 10(1), 33–60 (2012)
MathSciNet Google Scholar
Smith, R.L.: Threshold methods for sample extremes. In Statistical extremes and applications (1984). p 621–638. Springer
Smith, R.L.: Extreme value analysis of environmental time series: an application to trend detection in ground-level ozone. Stat. Sci. p 367–377 (1989)
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Series B Methodol. 36(2), 111–133 (1974)
MathSciNet Google Scholar
Su, X., Wang, M., Fan, J.: Maximum likelihood regression trees. J. Comput. Graph. Stat. 13(3), 586–598 (2004). https://doi.org/10.1198/106186004X2165
Article MathSciNet Google Scholar
Talagrand, M.: Sharper bounds for gaussian and empirical processes. Ann. Probab. p 28–76 (1994)
Tencaliec, P., Favre, A.-C., Naveau, P., Prieur, C., Nicolet, G.: Flexible semiparametric generalized pareto modeling of the entire range of rainfall amount. Environmetrics 31(2), e2582 (2020)
Article MathSciNet Google Scholar
van der Vaart, A.W.: Asymptotic statistics, volume 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press (1998)
Velthoen, J., Cai, J.-J., Jongbloed, G., Schmeits, M.: Improving precipitation forecasts using extreme quantile regression. Extremes 22(4), 599–622 (2019). https://doi.org/10.1007/s10687-019-00355-1
Article MathSciNet Google Scholar
Velthoen, J., Dombry, C., Cai, J.-J., Engelke, S.: Gradient boosting for extreme quantile regression (2021). arXiv preprint arXiv:2103.00808
Wang, H.J., Li, D., He, X.: Estimation of high conditional quantiles for heavy-tailed distributions. J. Am. Stat. Assoc. 107(500), 1453–1464 (2012). https://doi.org/10.1080/01621459.2012.716382
Article MathSciNet Google Scholar
Youngman, B.D.: Generalized additive models for exceedances of high thresholds with an application to return level estimation for us wind gusts. J. Am. Stat. Assoc. 114(528), 1865–1879 (2019)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR) under reference ANR-20-CE40-0025-01 (T-REX project).

Funding

Not applicable.

Author information

Authors and Affiliations

Laboratoire de Probabilités, Statistique et Modélisation, Sorbonne Université, CNRS, 4 place Jussieu, Paris, F-75005, France
Sébastien Farkas, Antoine Heranval, Olivier Lopez & Maud Thomas
Mission Risques Naturels, 1 rue Jules Lefebvre, Paris, 75009, France
Antoine Heranval
CREST Laboratory, CNRS, Groupe des Écoles Nationales d’Économie et Statistique, Ecole Polytechnique, Institut Polytechnique de Paris, 5 Avenue Henry Le Chatelier, 91120, Palaiseau, France
Antoine Heranval & Olivier Lopez

Authors

Sébastien Farkas
View author publications
You can also search for this author in PubMed Google Scholar
Antoine Heranval
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Lopez
View author publications
You can also search for this author in PubMed Google Scholar
Maud Thomas
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors wrote the main manuscript text, and the supplementary material All the authors prepared the all figures and tables All the authors reviewed the manuscript.

Corresponding author

Correspondence to Maud Thomas.

Ethics declarations

Ethical approval and Consent to participate

All the authors approve and consent to participate.

Human and animal ethics

Not applicable.

Consent for publication

All the authors consent for publication.

R codes

The R codes are publicly available at https://github.com/antoine-heranval/Generalized-Pareto-Regression-Trees-for-extreme-event-analysis.

Competing interests

The authors have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 1.28 MB)

Supplementary file2 (zip 1.91 MB)

Appendices

Appendix A: Proofs

In this Section, we present in details the proof of the results presented throughout the paper. Concentration inequalities required to obtain the results are presented in Section 1. These inequalities are used to obtain deviation bounds in Section 2, which are the key ingredients of the proof of Theorem 1 (Section 3), Corollary 8 (Section 2), and Theorem 3 (Section 5). Section 2 shows some results on covering numbers that are required to control the complexity of some classes of functions considered in the proofs. Some technical lemmas are gathered in Section 3.

1.1 Concentration inequalities

The proofs of the main results are mostly based on concentration inequalities. The following inequality was proved initially Talagrand (1994), (see also (Einmahl and Mason 2005)).

Proposition 4

Let $(\textbf{V}_i)_{1\le i \le n}$ denote i.i.d. replications of a random vector $\textbf{V},$ and let $(\varepsilon _i)_{1\le i \le n}$ denote a vector of i.i.d. Rademacher variables (that is, $\mathbb {P}(\varepsilon _i=-1)=\mathbb {P}(\varepsilon _i=1)=1/2)$ independent from $(\textbf{V}_i)_{1\le i \le n}.$ Let $\mathfrak {F}$ be a pointwise measurable class of functions bounded by a finite constant $M_0.$ Then, for all t,

$$\begin{aligned} \begin{aligned} \mathbb {P}&\left( \sup _{\varphi \in \mathfrak {F}}\left\| \sum _{i=1}^n \{\varphi (\textbf{V}_i)-\mathbb {E}[\varphi (\textbf{V})]\}\right\| _{\infty }>A_1\left\{ \mathbb {E}\left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\left\| \sum _{i=1}^n \varphi (\textbf{V}_i)\varepsilon _i\right\| _{\infty }\right] +t\right\} \right) \\&\le 2\left\{ \exp \left( -\frac{A_2t^2}{nv_{\mathfrak {F}}}\right) +\exp \left( -\frac{A_2t}{M_0}\right) \right\} , \end{aligned} \end{aligned}$$

with $v_{\mathfrak {F}}=\sup _{\varphi \in \mathfrak {F}}\textrm{Var}(\Vert \varphi (\textbf{V})\Vert _{\infty }),$ and where $A_1$ and $A_2$ are universal constants.

The difficulty in using Proposition 4 comes from the need to control the symmetrized quantity $\mathbb {E}\left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\left\| \sum _{i=1}^n \varphi (\textbf{V}_i)\varepsilon _i\right\| \right] .$ Proposition 5 is due to Einmahl and Mason (2005) and allows this control via some assumptions on the considered class of functions $\mathfrak {F}$.

We first need to introduce some notations regarding covering numbers of a class of functions. More details can be found for example in ((van der Vaart 1998), Chapter2.6). Let us consider a class of functions $\mathfrak {F}$ with envelope $\Phi$ (which means that for (almost) all v, $\varphi _{\varvec{\theta }}\in \mathfrak {F},$ $|f(v)|\le \Phi (v)$). Then, for any probability measure $\mathbb {Q},$ introduce $N(\varepsilon ,\mathfrak {F},\mathbb {Q})$ the minimum number of $L^2(\mathbb {Q})$ balls of radius $\varepsilon$ to cover the class $\mathfrak {F}.$ Then, define

$$\mathcal{N}_{\Phi }(\varepsilon ,\mathfrak {F})=\sup _{\mathbb {Q}:\mathbb {Q}(\Phi ^2)<\infty } N(\varepsilon (\mathbb {Q}(\Phi ^2)^{1/2}),\mathfrak {F},\mathbb {Q}).$$

Proposition 5

Let $\mathfrak {F}$ be a point-wise measurable class of functions bounded by $M_0$ with envelope $\Phi$ such that, for some constants $A_3, \alpha \ge 1,$ and $0\le \sqrt{v} \le M_0,$ we have

(i)
$\mathcal{N}_{\Phi }(\varepsilon ,\mathfrak {F})\le A_3 \varepsilon ^{-\alpha },$ for $0<\varepsilon <1,$
(ii)
$\sup _{\varphi \in \mathfrak {F}}\mathbb {E}\left[ \varphi (\textbf{V})^2\right] \le v,$
(iii)
$M_0\le \frac{1}{4\alpha ^{1/2}}\sqrt{nv/\log (A_4M_0/\sqrt{v}) },$ with $A_4=\textrm{max}(e,A_3^{1/\alpha }).$

Then, for some absolute constant $A_5,$

$$\begin{aligned} \mathbb {E}\left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\left\| \sum _{i=1}^n \varphi (\textbf{V}_i)\varepsilon _i\right\| \right] \le A_5\sqrt{\alpha n v \log (A_4M_0/\sqrt{v})}. \end{aligned}$$

1.2 Deviation results

We first introduce some notations that will be used throughout Sections “Deviation results” to “Covering numbers”. In the following, $\varphi _{\varvec{\theta }}$ is a function indexed by $\varvec{\theta }=(\sigma ,\gamma )^{t}$ denoting either $\phi (\cdot ,\varvec{\theta })$, $\partial _{\sigma } \phi (\cdot ,\varvec{\theta }),$ or $\partial _{\gamma }\phi (\cdot ,\varvec{\theta }).$

We consider in the following the class of functions $\mathfrak {F}$ defined as

$$\begin{aligned} \mathfrak F =\left\{ y \mapsto \varphi _{ \varvec{\theta }}(y-u)\textbf{1}_{y\ge u}\textbf{1}_{\textbf{x}\in \mathcal{T}_\ell }, \; \varvec{\theta } \in \varvec{\Theta },\; u\in [u_{\textrm{min}};u_{\textrm{max}}],\ell =1,...,K\right\} . \end{aligned}$$

(10)

By Lemma 11, the functions $y \mapsto \partial _{\sigma } \phi (y-u,\varvec{\theta })$ and $y \mapsto \partial _{\gamma } \phi (y-u,\varvec{\theta })$ are uniformly bounded (eventually up to some multiplication by a constant) by $\Phi (y)=\log (1+wy),$ where $w=\gamma _{\textrm{max}}/\sigma _{\textrm{min}}$. On the other hand, $y \mapsto \phi (y-u,\varvec{\theta })$ is bounded by $\log \sigma _n+\Phi (y)=O(\log (k_n))+\Phi (y).$

Next, for $\ell =1,\ldots ,K$, and $\varvec{\theta }=(\sigma ,\gamma )^t \in \varvec{\Theta }$, let

$$\begin{aligned} L_n^{\ell }(\varvec{\theta },u)=\frac{1}{k_n}\sum _{i=1}^n \phi (Y_i-u,\varvec{\theta })\textbf{1}_{Y_i>u}\textbf{1}_{\textbf{X}_i\in \mathcal{T}_{\ell }}, \end{aligned}$$

be the (normalized) negative GP log-likelihood associated with the leaf $\ell$ of a tree $T_K$ with set of K leaves $(\mathcal{T}_\ell )_{\ell =1,\ldots ,K}$. Let $L^{\ell }(\varvec{\theta },u) = \mathbb {E}[L_n^{\ell }(\varvec{\theta },u)]$. The key results behind Theorems 1 and 3 relies on studying the deviations of the processes, indexed by $\varvec{\theta },\; u$ and $\ell$,

$$\begin{aligned} \mathcal{W}_0^{\ell }(\varvec{\theta },u) = L_n^{\ell }(\varvec{\theta },u)-L^{\ell }(\varvec{\theta },u), \end{aligned}$$

$$\begin{aligned} \mathcal{W}_1^\ell (\varvec{\theta },u) = \nabla _{\varvec{\theta }} L_n^\ell (\varvec{\theta },u)-\nabla _{\varvec{\theta }} L^\ell (\varvec{\theta },u). \end{aligned}$$

Let $M_n = \beta \log k_n \le \beta a_1 \log (n)$ with $\beta >0$ and $a_1>0$ (with $a_1$ defined in Assumption 1). We study the deviations of these processes by decomposing $\mathcal{W}_i^\ell (\varvec{\theta },u),$ for $i=0,1,$ (which is a sum of i.i.d. observations) into two sums.

the first one gathers observations smaller than some bound (more precisely, such that $\Phi (Y_i)\le M_n$), which is considered in Theorem 6. Since these observations are bounded (even if this bound in fact depends on n and can tend to infinity when n grows), we can apply a concentration inequality such as the one of Section 1. Let us stress that $\sup _{\varphi _{\varvec{\theta }} \in \mathfrak F} \Vert \varphi _{\varvec{\theta }}(y)\textbf{1}_{\Phi (y)\le M_n}\Vert _{\infty } \le M_n$;
in the second one (Theorem 7), we consider the observations larger than this bound, and control them through the fact that the function $\Phi$ has finite exponential moments (see Lemma 11).

Corollary 8, which provides deviation bounds for estimation errors in the leaves of the tree, is then a direct consequence.

Theorem 6

Let

$$\begin{aligned} \underline{\mathcal{Z}}(M_n) = \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}} \left|\frac{1}{k_n} \sum _{i=1}^n \left( \varphi _{\varvec{\theta }}(Y_i) \textbf{1}_{\Phi (Y_i)\le M_n} - \mathbb {E}\left[ \varphi _{\varvec{\theta }}(Y_i) \textbf{1}_{\Phi (Y_i) \le M_n}\right] \right) \right|. \end{aligned}$$

If $k_n = O(n^{a_1})$ with $a_1 >0$ (Assumption 1), then, for $t \ge {\mathfrak c_1} (\log k_n)^{1/2} k_n^{-1/2}$,

$$\begin{aligned} \mathbb {P}\left( \underline{\mathcal{Z}}(M_n) \ge t\right) \le 2\left( \exp \left( - \frac{{ C_1} k_n t^2}{\beta ^2 (\log k_n)^2} \right) + \exp \left( -\frac{{ C_2} k_n t}{\beta \log k_n} \right) \right) . \end{aligned}$$

(11)

Proof

From Proposition 4,

$$\begin{aligned} \begin{aligned}&\mathbb {P} \left( \underline{\mathcal{Z}}(M_n) \ge A_1 \left\{ \mathbb {E} \left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\frac{1}{k_n}\left|\sum _{i=1}^n \varphi _{\varvec{\theta }}(Y_i)\textbf{1}_{\Phi (Y_i)\le M_n}\varepsilon _i\right|\right] +t\right\} \right) \\&\le 2 \left( \exp \left( - \frac{A_2 k_n^2 t^2}{nv_{\mathfrak {F}}} \right) + \exp \left( -\frac{A_2 k_n t}{M_n} \right) \right) \, , \end{aligned} \end{aligned}$$

(12)

with $v_{\mathfrak F} = \sup _{\varphi \in \mathfrak F} \textrm{var}\left( \vert \varphi (Y) \vert \right)$. From Lemma 12, $v_{\mathfrak {F}}\le M_n^2k_nn^{-1},$ which shows that the first exponential term on the right-hand side of (12) is smaller than

$$\begin{aligned} \exp \left( - \frac{A_2 k_n t^2}{M_n^2} \right) . \end{aligned}$$

(13)

We can now apply Proposition 5 (combined with Lemma 10) to this class of functions with $v=M_n^2k_nn^{-1}$ and $M_0=M_n.$ Hence,

$$\begin{aligned} \mathbb {E} \left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\frac{1}{k_n}\left|\sum _{i=1}^n \varphi _{\varvec{\theta }}(Y_i)\textbf{1}_{\Phi (Y_i)\le M_n} \varepsilon _i\right|\right] \le \frac{{ A_6}}{k_n}\sqrt{nv \mathfrak {s}_n}={ A_6} \frac{ \mathfrak {s}^{1/2}_n}{k_n^{1/2}} \; , \end{aligned}$$

where ${ A'_6}>0$ and $\mathfrak {s}_n=\log (\sigma _n^{\alpha } K^{4(d+1)(d+2)}n/k_n)$ ($\alpha >0$ being defined in Lemma 10). From Assumption 1, we see that $\mathfrak {s}_n=O(\log (k_n))$ (let us recall that K is necessarily less than n). Whence, if ${ \mathfrak c_1}= 2A_1{A'_6}$, for $t\ge {\mathfrak c_1} \left\{ \log \left( k_n\right) \right\} ^{1/2} k_n^{-1/2}$,

$$\begin{aligned} \mathbb {P}\left( \underline{\mathcal{Z}}(M_n) \ge t\right) \le \mathbb {P} \left( \underline{\mathcal{Z}}(M_n) \ge A_1 \left\{ \mathbb {E} \left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\frac{1}{k_n}\left|\sum _{i=1}^n \varphi _{\varvec{\theta }}(Y_i)\textbf{1}_{\Phi (Y_i)\le M_n}\varepsilon _i\right|\right] + \frac{t}{2A_1}\right\} \right) \; . \end{aligned}$$

Equation (11) follows from (12) and (13) with ${{C}_1}=A_2A_1^{-2}/4$ and ${ C_2}=A_2A_1^{-1}/2.$

Theorem 7

Let

$$\begin{aligned} \overline{\mathcal{Z}}(M_n)=\sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\left|\frac{1}{k_n} \sum _{i=1}^n \left( f(Y_i )\textbf{1}_{\Phi (Y_i)> M_n}\right) - \mathbb {E}\left[ \varphi _{\varvec{\theta }}(Y_i)\textbf{1}_{\Phi (Y_i) > M_n}\right] \right|. \end{aligned}$$

If $k_n = O(a_1)$ with $a_1>0$ (Assumption 1), then there exists $\rho _0>0$ (Lemma 11) such that for $\beta a_1 \ge 10/\rho _0,$ and $t\ge {\mathfrak c_2} k_n^{-1/2}$,

$$\begin{aligned} \mathbb {P}\left( \overline{\mathcal{Z}}(M_n) \ge t\right) \le \frac{{ C_{3}}}{k_n^{5/2} t^3}. \end{aligned}$$

(14)

Proof

Let $\beta '=\beta a_2.$ $\overline{\mathcal{Z}}(M_n)$ is upper-bounded by

$$\begin{aligned} \frac{1}{k_n} \sum _{i=1}^n \left\{ \Phi (Y_i) \textbf{1}_{\Phi (Y_i) \ge M_n}\textbf{1}_{Y_i\ge u_{\textrm{min}}} + \mathbb {E} \left[ \Phi (Y) \textbf{1}_{\Phi (Y) \ge M_n}\textbf{1}_{Y\ge u_{\textrm{min}}}\right] \right\} \, . \end{aligned}$$

A bound for $E_{1,n}=\mathbb {E} \left[ \Phi (Y) \textbf{1}_{\Phi (Y) \ge M_n}\textbf{1}_{Y\ge u_{\textrm{min}}}\right]$ is obtained from Lemma 13, and $nE_{1,n}/k_{n}\le \mathfrak {e}_1 k_n^{-1/2}$ if $\beta ' \ge 2/\rho _0.$

Next, from Markov inequality,

$$\begin{aligned} \begin{aligned} t^3\mathbb {P}\left( \frac{1}{k_n} \sum _{i=1}^n \Phi (Y_i) \textbf{1}_{\Phi (Y_i) \ge M_n}\textbf{1}_{Y_i\ge u_{\textrm{min}}}\ge t\right)&\le \frac{n E_{3,n}}{k_n^3}+\frac{n(n-1)E_{2,n}E_{1,n}}{k_n^3}\\&+\frac{n(n-1)(n-2)E_{1,n}^3}{k_n^3}. \end{aligned} \end{aligned}$$

From Lemma 13, we get

$$\begin{aligned} \begin{aligned} \frac{n E_{3,n}}{k_n^3}&\le \frac{\mathfrak {e}_3 n^{-(\rho _0\beta '/4-1/2)}}{k_n^{5/2}}, \\ \frac{n(n-1)E_{2,n}E_{1,n}}{k_n^3}&\le \frac{\mathfrak {e}_2\mathfrak {e}_1n^{-(\rho _0\beta '/2-3/2)}}{k_n^{5/2}}, \\ \frac{n(n-1)(n-2)E_{1,n}^3}{k_n^3}&\le \frac{\mathfrak {e}_1^3n^{-(\rho _0\beta '/4-5/2)}}{k_n^{5/2}}. \end{aligned} \end{aligned}$$

Each of these terms is bounded by $\textrm{max}(\mathfrak {e}_3,\mathfrak {e}_2 \mathfrak {e}_1,\mathfrak {e}_1^3)k_n^{-5/2}$ for $\beta ' \ge 10/\rho _0.$ Thus, for $t\ge 2 \mathfrak {e}_1 k_n^{-1/2}$ and $\beta ' \ge 10/\rho _0,$

$$\begin{aligned} \begin{aligned}&\mathbb {P}\left( \overline{\mathcal{Z}}_n \ge t \right) \\&\le \mathbb {P} \left( \frac{1}{k_n} \sum _{i=1}^n \Phi (Y_i) \textbf{1}_{\Phi (Y_i) \ge M_n}\textbf{1}_{Y_i\ge u_{\textrm{min}}} \ge \frac{t}{2}\right) + \mathbb {P}\left( \mathbb {E} \left[ \Phi (Y) \textbf{1}_{\Phi (Y) \ge M_n}\textbf{1}_{Y\ge u_{\textrm{min}}}\right] \ge \frac{t}{2}\right) \\&\le \frac{8\textrm{max}(\mathfrak {e}_3,\mathfrak {e}_2 \mathfrak {c}_1,\mathfrak {e}_1^3)}{t^3k_n^{5/2}} \end{aligned} \end{aligned}$$

We now apply these results to deduce deviation bounds on the estimators $\widehat{\varvec{\theta }}_{\ell }$ in the leaves of the tree.

Corollary 8

Under the assumptions of Theorems 6 and 7 and Assumption 2, for $t\ge \mathfrak c_3 (\log k_n)^{1/2}k_n^{-1/2},$

$$\begin{aligned} \begin{aligned} \mathbb {P}\left( \sup _{\begin{array}{c} \ell =1,\ldots , K,\\ { u_{\textrm{min}} \le u \le u_{\textrm{max}}} \end{array}} \Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_\ell \Vert _{\infty }\ge t\right)&\le 2\left( \exp \left( - \frac{{ C_4} k_n t^2}{\beta ^2 (\log k_n)^{2}} \right) + \exp \left( -\frac{{ C_5} k_n t}{\beta \log k_n} \right) \right) \\&+\frac{{C_6}}{k_n^{5/2} t^3}. \end{aligned} \end{aligned}$$

Proof

For $1 \le \ell \le K$ and ${ u_{\textrm{min}} \le u \le u_{\textrm{max}}}$, let $\varvec{\theta }=(s,\gamma )^{t}$ and, for $\ell = 1,\ldots , K$, $\varvec{\theta }^{*K}_\ell =(s^{*K}_\ell (u),\gamma ^{*K}_\ell (u))^{t},$ and let

$$\begin{aligned} \nabla _{\varvec{\theta }}L^\ell (\varvec{\theta },u) = \mathbb {E}\left[ \left( \begin{array}{c} \partial _\sigma \phi (Y-u,\varvec{\theta }) \\ \partial _\gamma \phi (Y-u,\varvec{\theta })\end{array}\right) \textbf{1}_{Y\ge u} \textbf{1}_{\textbf{X}\in \mathcal{T}_\ell } \right] . \end{aligned}$$

From Taylor series,

$$\begin{aligned} \nabla _{\varvec{\theta }}L^\ell (\varvec{\theta },u)=\mathbb {E}\left[ H^{\ell }_{(\tilde{\sigma }_1,\gamma _1),(\sigma _1,\tilde{\gamma }_1),(\tilde{\sigma }_2,\gamma _2),(\sigma _2,\tilde{\gamma }_2)}(Y-u)\textbf{1}_{\textbf{X}\in \mathcal{T}_\ell }\right] (\varvec{\theta }-\varvec{\theta }^{*K}_\ell )^t, \end{aligned}$$

for some parameters $\tilde{\sigma }_j$ (resp. $\tilde{\gamma }_j$) between $\sigma$ and $\sigma ^{*K}_\ell (u)$ (resp. $\gamma$ and $\gamma ^{*K}_\ell (u)$). From Assumption 2, we get, for all $\ell =1,\ldots , K$,

$$\begin{aligned} \frac{n}{k_n}\Vert \nabla _{\varvec{\theta }}L^\ell (\varvec{\theta },u)\Vert _{\infty }\ge \mathfrak C_1\Vert \varvec{\theta }-\varvec{\theta }^{*K}_{\ell }(u)\Vert _{\infty }. \end{aligned}$$

Hence, for all $\ell =1,\ldots , K$,

$$\begin{aligned} \mathbb {P}\left( \Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_\ell \Vert _{\infty }\ge t\right) \le \mathbb {P}\left( \frac{n}{k_n}\Vert \nabla _{\varvec{\theta }}L^\ell (\widehat{\varvec{\theta }}^K,u)\Vert _{\infty }\ge \mathfrak C_1t\right) . \end{aligned}$$

Since for all $\ell =1,\ldots , K$, $\nabla _{\varvec{\theta }} L_n^\ell (\widehat{\varvec{\theta }}^K)=0,$ $\mathcal{W}_1^\ell (\widehat{\varvec{\theta }}^K(u),u)=-\frac{n}{k_n}\nabla _{\varvec{\theta }}L^\ell (\widehat{\varvec{\theta }}^K,u).$ Hence,

$$\begin{aligned} \mathbb {P}\left( \sup _{\begin{array}{c} \ell =1,\ldots , K, \\ { u_{\textrm{min}} \le u \le u_{\textrm{max}}} \end{array}} \Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_l(u)\Vert _{\infty }\ge t\right) \le \mathbb {P}\left( \sup _{\begin{array}{c} \ell =1,\ldots ,K,\\ { u_{\textrm{min}} \le u \le u_{\textrm{max}}} \end{array}}\Vert \mathcal{W}_1^\ell (\widehat{\varvec{\theta }}^K(u),u)\Vert _{\infty }\ge \mathfrak C_1t\right) , \end{aligned}$$

and the right-hand side is bounded by

$$\begin{aligned} \mathbb {P}\left( \overline{\mathcal{Z}}(M_n)\ge \frac{\mathfrak C_1t}{2}\right) +\mathbb {P}\left( \underline{\mathcal{Z}}(M_n)\ge \frac{\mathfrak C_1t}{2}\right) . \end{aligned}$$

The result follows from Theorem 6 and 7.

1.3 Proof of Theorem 1

The proof of the first part of Theorem 1 then consists in gathering the results on the leaves obtained in Corollary 8. Let ${ u_{\textrm{min}} \le u \le u_{\textrm{max}}}$,

$$\begin{aligned} \Vert \widehat{T}_K-T^*_K\Vert _2^2\le \sum _{\ell =1}^K \Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_\ell \Vert _{\infty }^2\le K \sup _{\ell =1,...,K}\Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_\ell \Vert _{\infty }^2. \end{aligned}$$

Hence

$$\begin{aligned} \begin{aligned}&{\mathbb {P}\left( \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}\Vert \widehat{T}_K-T^*_K\Vert _2^2\ge t\right) } \\&\le \mathbb {P}\left( \sup _{\begin{array}{c} \ell =1,\ldots ,K,\\ { u_{\textrm{min}} \le u \le u_{\textrm{max}}} \end{array}}\Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_\ell \Vert _{\infty }\ge t^{1/2}K^{-1/2}\right) . \end{aligned} \end{aligned}$$

The results follows from Corollary 8, and from the assumption on $K\le K_{\textrm{max}}=O(k_n^3)$ (Assumption 1).

To prove the second part of Theorem 1, write

$$\begin{aligned} \mathbb {E}\left[ { \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}}\Vert \widehat{T}_K-T^*_K\Vert _2^2\right] =\int _0^{\infty } \mathbb {P}({ \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}} \Vert \widehat{T}_K-T^*_K\Vert _2^2\ge t)dt. \end{aligned}$$

Let $t_n=c_1 K (\log k_n)k_n^{-1},$ then

$$\begin{aligned} \begin{aligned}&{\int _0^{\infty } \mathbb {P}({ \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}}\Vert \widehat{T}_K-T^*_K\Vert _2^2\ge t)dt}\\&\le t_n+\int _{t_n}^{\infty } \mathbb {P}({ \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}}\Vert \widehat{T}_K-T^*_K\Vert _2^2\ge t)dt. \end{aligned} \end{aligned}$$

We now use Theorem 1 to bound the integral on the right-hand side. Since $\int _0^{\infty }\exp (-a t)dt=\frac{1}{a},$ $\int _0^{\infty } \exp (-a^{1/2} t^{1/2})dt=\frac{2}{a},$ and $\int _1^{\infty }t^{-3/2}dt=2,$ we get

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ { \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}}\Vert \widehat{T}_K-T^*_K\Vert _2^2\right]&\le { t_n+ \frac{2K\beta ^2 (\log k_n)^2}{\mathcal{C}_1 k_n }+\frac{4K \beta ^2 (\log k_n)^2}{\mathcal{C}_2^2 k_n } + \frac{2 \mathcal{C}_3K}{k_n^{5/2} }}\\&\le \frac{c_1 K \log k_n}{k_n}+ \frac{2 K \beta ^2 (\log k_n)^2}{\mathcal{C}_1 k_n }\\&+\frac{4K \beta ^2(\log k_n)^2}{\mathcal{C}_2^2 k_n } + \frac{2 \mathcal{C}_3K}{k_n^{5/2} }\\&\le \frac{\mathcal{C}_4 K (\log k_n)^2}{k_n}. \end{aligned} \end{aligned}$$

1.4 Proof of Proposition 2

For all $\textbf{x}$,

$$\begin{aligned} \Vert \varvec{\theta }^*(\textbf{x}) - \varvec{\theta }_0(\textbf{x})\Vert _\infty = \Vert \sum _{\ell =1}^{K_{\textrm{max}}}\left( \varvec{\theta }^{*}_\ell - \varvec{\theta }_{0}(\textbf{x}) \right) \textbf{1}_{\textbf{x}\in \mathcal{T}_\ell }\Vert _\infty \le \sum _{\ell =1}^{K_{\textrm{max}}}\Vert \varvec{\theta }^{*}_\ell - \varvec{\theta }_{0}(\textbf{x}) \Vert _\infty \textbf{1}_{\textbf{x}\in \mathcal{T}_\ell } \, . \end{aligned}$$

Now, from Taylor series, for $\ell =1,\ldots , K$, conditionally on $\textbf{X}\in \mathcal{T}_\ell$,

$$\begin{aligned} \nabla _{\varvec{\theta }} L^\ell (\varvec{\theta }_{0}(\textbf{X}),u) = \mathbb {E}\left[ H^{\ell }_{(\tilde{\sigma }_1,\gamma _1),(\sigma _1,\tilde{\gamma }_1),(\tilde{\sigma }_2,\gamma _2),(\sigma _2,\tilde{\gamma }_2)}(Y-u) \mid \textbf{X}\in \mathcal{T}_\ell \right] (\varvec{\theta }_{0}(\textbf{X}) - \varvec{\theta }^{*}_\ell )^t , \end{aligned}$$

for some parameters $\tilde{\sigma }_j$ (resp. $\tilde{\gamma }_j$) between $\sigma _0(\textbf{X})$ and $\sigma ^{*K}_\ell (u)$ (resp. $\gamma _0(\textbf{X})$ and $\gamma ^{*K}_\ell (u)$).

Thus, under Assumption 2,

$$\begin{aligned} \begin{aligned}&{\Vert \varvec{\theta }_{0}(\textbf{X}) - \varvec{\theta }^{*}_\ell \Vert _\infty }\\&\le \frac{1}{\mathfrak C_1} \Vert \nabla _{\varvec{\theta }} L^\ell (\varvec{\theta }_{0}(\textbf{X}),u) \Vert _\infty \\&\le \frac{1}{\mathfrak C_1}\frac{k_n}{n} \textrm{max}\left( |\mathbb {E}\left[ \partial _\sigma \phi (Z,\varvec{\theta }_0(\textbf{X}) ) \mid \textbf{X}\in \mathcal{T}_\ell \right] |,|\mathbb {E}\left[ \partial _\gamma \phi (Z,\varvec{\theta }_0(\textbf{X}) )\mid \textbf{X}\in \mathcal{T}_\ell \right] |\right) \,, \end{aligned} \end{aligned}$$

where Z is a random variable distributed according to the distribution $F_u$ defined in Section 2.1 with $\sigma _0(\textbf{X}) =u\gamma _0(\textbf{X})$ and with

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \partial _\sigma \phi (Z,\varvec{\theta }_0(\textbf{X}) ) \mid \textbf{X}\in \mathcal{T}_\ell \right]&= -\frac{1}{u \gamma _{0}(\textbf{X})} + \frac{1}{u^2\gamma _{0}(\textbf{X})} \left( 1+\frac{1}{\gamma _{0}(\textbf{X})} \right) \mathbb {E}\left[ \frac{Z}{1+Z/u}\mid \textbf{X}\in \mathcal{T}_\ell \right] \\ \mathbb {E}\left[ \partial _\gamma \phi (Z,\varvec{\theta }_0(\textbf{X}) ) \mid \textbf{X}\in \mathcal{T}_\ell \right]&=-\frac{1}{\gamma _{0}(\textbf{X})^2}\mathbb {E}\left[ \log (1+Z/u) \mid \textbf{X}\in \mathcal{T}_\ell \right] \\&+ \frac{1}{u\gamma _{0}(\textbf{x})}\left( 1+\frac{1}{\gamma _{0}(\textbf{X})} \right) \mathbb {E}\left[ \frac{Z}{1+Z/u}\mid \textbf{X}\in \mathcal{T}_\ell \right] \,. \end{aligned} \end{aligned}$$

Under Assumption 3, we have

$$\begin{aligned} \overline{F}_{u}(z) = \left( 1+\frac{z}{u}\right) ^{-1/\gamma _{0}(\textbf{X})} \left\{ 1 + c\psi (u) \int _1^{1+z/u} v^{\rho -1} \textrm{d}v + o(\psi (u))\right\} \,. \end{aligned}$$

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \frac{Z}{1+Z/u}\mid \textbf{X}\in \mathcal{T}_\ell \right]&= \int _0^u \overline{F}_u \left( \frac{t}{1-t/u} \right) \textrm{d}t\\&= \frac{u}{1+1/\gamma _0(\textbf{X})} \left( 1 + \frac{ c\psi (u)}{1+1/\gamma _{0}(\textbf{X})-\rho } + o(\psi (u)) \right) \\&\le u \left( 1 + c\gamma _{0}(\textbf{X})\psi (u) + o(\psi (u)) \right) \end{aligned} \end{aligned}$$

and then

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \log (1+Z/u) \mid \textbf{X}\in \mathcal{T}_\ell \right]&= \int _0^u \mathbb {P}\left[ Z \ge u(\textrm{e}^t-1) \mid \textbf{X}\in \mathcal{T}_\ell \right] \textrm{d}t \\&= \gamma _{0}(\textbf{X})\left( 1 + \frac{c\psi (u)}{1/\gamma _{0}(\textbf{X})-\rho } + o(\psi (u)) \right) \\&\le \gamma _{0}(\textbf{X}) \left( 1+c\gamma _{0}(\textbf{X})\psi (\textbf{X})(u) + o(\psi (u)) \right) \, . \end{aligned} \end{aligned}$$

Consequently,

$$\begin{aligned} |\mathbb {E}\left[ \partial _\sigma \phi (Z,\varvec{\theta }_0(\textbf{X}) ) \mid \textbf{X}\in \mathcal{T}_\ell \right] |\le \frac{1}{\gamma _{\textrm{min}}}\left( 1+\frac{1}{u}\left( 1+\frac{1}{\gamma _{\textrm{min}}}\right) \right) \left( 1 + c\gamma _{0}(\textbf{X}) \psi (u) + o(\psi (u)) \right) \end{aligned}$$

and

$$\begin{aligned} |\mathbb {E}\left[ \partial _\gamma \phi (Z,\varvec{\theta }_0(\textbf{X}) ) \mid \textbf{X}\in \mathcal{T}_\ell \right] |\le \frac{1}{\gamma _{\textrm{min}}}\left( 1 +\frac{1}{\gamma _{\textrm{min}}}+\frac{\gamma _{\textrm{max}}}{\gamma _{\textrm{min}}}\right) \left( 1 + c\gamma _0(\textbf{X}) \psi (u ) + o(\psi (u)) \right) \, . \end{aligned}$$

Hence, conditionally on $\textbf{X}\in \mathcal{T}_\ell$,

$$\begin{aligned} \Vert \varvec{\theta }_{0}(\textbf{X}) - \varvec{\theta }^{*}_\ell \Vert _\infty \le \mathfrak C_2(u)\frac{k_n}{n} \left( 1 + c\gamma _{\textrm{max}} \psi (u) + o(\psi (u)) \right) \,, \end{aligned}$$

where $\mathfrak C_2(u)=\frac{1}{\mathfrak C_1}\frac{1}{\gamma _{\textrm{min}}}\textrm{max}\left( 1+\frac{1}{u}+\frac{1}{u\gamma _{\textrm{min}}},1+\frac{1}{\gamma _{\textrm{min}}}+\frac{\gamma _{\textrm{max}}}{\gamma _{\textrm{min}}} \right)$.

Finally, for all $\textbf{x}$,

$$\begin{aligned} \begin{aligned} \Vert \varvec{\theta }^{*}(\textbf{x}) - \varvec{\theta }_0(\textbf{x})\Vert _\infty&\le \sum _{\ell =1}^{K_{\textrm{max}}}\Vert \varvec{\theta }^{*}_\ell - \varvec{\theta }_{0}(\textbf{x}) \Vert _\infty \textbf{1}_{\textbf{x}\in \mathcal{T}_\ell }\\&\le \mathfrak C_2(u)\frac{k_n}{n} \left( 1 + c\gamma _{\textrm{max}} \psi (u) + o(\psi (u)) \right) \sum _{\ell =1}^{K_{\textrm{max}}} \textbf{1}_{\textbf{x}\in \mathcal{T}_\ell }\\&\le \mathfrak C_2(u)\frac{k_n}{n} \left( 1 + c\gamma _{\textrm{max}} \psi (u) + o(\psi (u)) \right) \, . \end{aligned} \end{aligned}$$

1.5 Proof of Theorem 3

First, let us introduce some notations that are needed in the proof.

Define the log-likelihood $L_n(T_K,u)$ associated with a tree $T_K$ with K leaves $(\mathcal{T}_{\ell })_{\ell = 1,\ldots , K}$ and with parameters $\varvec{\theta }(u)=\left( \varvec{\theta }^K_{\ell }(u)\right) _{\ell =1,\ldots ,K}$

$$\begin{aligned} L_n(T_K,u) = \sum _{\ell = 1}^K L_n^{\ell }(\varvec{\theta }^K_\ell ,u) = \frac{1}{k_n}\sum _{\ell = 1}^K \sum _{i=1}^n \phi (Y_i-u,\varvec{\theta }^K_{\ell })\textbf{1}_{Y_i>u} \textbf{1}_{\textbf{X}_i\in \mathcal{T}_{\ell }} \, , \end{aligned}$$

and $L(T_K,u) = \mathbb {E}[L_n(T_K,u)]$. Finally, for two trees T and $T'$, $\Delta L_n(T, T') = L_n(T,u) - L_n(T',u)$ and similarly, $\Delta L(T, S) = L(T,u) - L(T',u)$.

The following lemma will be needed to prove Theorem 3.

Lemma 9

Let $\mathfrak D = \inf _u\inf _{K < K^*} \Delta L(T^*,T^*_K)$ and $u \in [u_{\textrm{min}},u_{\textrm{max}}]$ fixed. Suppose that there exists a constant $c_2>0$ such that the penalization constant $\lambda$ satisfies

$$\begin{aligned} c_2 \{\log k_n\}^{1/2} k_n^{-1/2} \le \lambda \le (\mathfrak {D} - 2c_2 \{\log (k_n)\}^{1/2} k_n^{-1/2})k_n^{-1}, \end{aligned}$$

then, under Assumptions 1 and 2, for $K> K^*$,

$$\begin{aligned} \begin{aligned} \mathbb {P}(\widehat{K}=K)&\le 2\left( \exp \left( - \frac{ { C_1} k_n \lambda ^2(K-K^*)^2}{\beta ^2 (\log k_n)^{2}} \right) + \exp \left( -\frac{{ C_2} k_n \lambda (K-K^*))}{\beta \log k_n} \right) \right) \\&+\frac{{ C_3}}{k_n^{5/2} \lambda ^3(K-K^*)^3}, \end{aligned} \end{aligned}$$

and, for $K<K^*,$

$$\begin{aligned} \begin{aligned} \mathbb {P}(\widehat{K}=K)&\le 4\exp \left( - \frac{ C_1 k_n \{\mathfrak {D}-\lambda (K^*-K)\}^2}{\beta ^2 (\log k_n)^{2}} \right) \\&+4 \exp \left( -\frac{C_2 k_n \{\mathfrak {D}-\lambda (K^*-K)\}}{\beta \log k_n} \right) \\&+\frac{2C_3}{k_n^{5/2} \{\mathfrak {D}-\lambda (K^*-K)\}^3}. \end{aligned} \end{aligned}$$

Proof

Let $u \in [u_{\textrm{min}},u_{\textrm{max}}]$ fixed. If $\widehat{K}=K,$ this means that

$$\begin{aligned} \Delta L_n(T_K,T_{K^*}): = L_n(T_K,u)-L_n(T_{K^*},u)>\lambda (K-K^* ). \end{aligned}$$

Decompose

$$\begin{aligned} \begin{aligned} \Delta L_n(T_K,T_{K_0})&=\{L_n(T_K,u)-L_n(T^*_K,u)\}+\{L_n(T^*_K,u)-L_n(T^*,u)\}\\&+\{L_n(T^*,u)-L_n(T_{K^*},u)\}. \end{aligned} \end{aligned}$$

Since $L_n(T^*,u)-L_n(T_{K^*},u)<0,$

$$\begin{aligned} \Delta L_n(T_K,T_{K^*})\le \{L_n(T_K,u)-L_n(T^*_K,u)\}+\{L_n(T^*_K,u)-L_n(T^*,u)\}. \end{aligned}$$

For $K > K^*,$ $T^*_K=T^*,$ hence,

$$\begin{aligned} \begin{aligned} \mathbb {P}(\widehat{K}=K)&\le \mathbb {P}\left( \Delta L_n(T_K, T^*_K)>\lambda (K-K^*)\right) \\&\le \mathbb {P}\left( |\Delta L_n(T_K, T^*_K) - \Delta L( T_K, T^*_K)|>\lambda (K-K^*)\right) . \end{aligned} \end{aligned}$$

For $K>K^*$, a bound is then obtained from Theorems 6 and 7 if $\lambda (K-K^*) \ge {c_1} \{\log (k_n)\}^{1/2} k_n^{-1/2}$, that is $\lambda \ge {c_1} \{\log k_n\}^{1/2} k_n^{-1/2}$.

Now, for $K<K^*,$

$$\begin{aligned} \begin{aligned} \Delta L_n(T^*_K,T^*)&\le |\Delta L_n(T^*_{K},T^*)-\Delta L(T^*_{K},T^*)|+\Delta L(T^*_{K},T^*)\\&\le |\Delta L_n(T^*,T^*_{K})-\Delta L(T^*,T^*_K)|- \mathfrak D(K^*,K). \end{aligned} \end{aligned}$$

where $\mathfrak D = \inf _{K<K^*, u\in [u_{\textrm{min}}, u_{\textrm{max}}]} \mathfrak D(K^*,K),$ Hence,

$$\begin{aligned} \begin{aligned}&{\mathbb {P}(\widehat{K}=K)}\\&\le \mathbb {P}\left( \Delta L_n(T_K, T^*_K)\ge \frac{\mathfrak D- \lambda (K^*-K)}{2}\right) \\&+ \mathbb {P}\left( |\Delta L_n(T^*,T^*_K)-\Delta L(T^*,T^*_K)|\ge \frac{\mathfrak D - \lambda (K^*-K)}{2}\right) \\&\le \mathbb {P}\left( |\Delta L_n(T_K, T^*_K)-\Delta L( T_K, T^*_K)|\ge \frac{\mathfrak D - \lambda (K^*-K)}{2}\right) \\&+ \mathbb {P}\left( |\Delta L_n(T^*,T^*_K)-\Delta L(T^*,T^*_K)|\ge \frac{\mathfrak D - \lambda (K^*-K)}{2}\right) . \end{aligned} \end{aligned}$$

These two probabilities can be bounded using Theorems 6 and 7 provided that, for all $K<K^*,$

$$\begin{aligned} \frac{\mathfrak D - \lambda (K^*-K)}{2} \ge \mathfrak c_1 \{\log (k_n)\}^{1/2} k_n^{-1/2}, \end{aligned}$$

that is,

$$\begin{aligned} \lambda \le \mathfrak D - 2{\mathfrak c_1} \{\log (k_n)\}^{1/2} k_n^{-1/2} . \end{aligned}$$

We are now ready to prove Theorem 3. Let $u \in [u_{\textrm{min}} , u_{\textrm{max}}]$ fixed.

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \Vert \widehat{T}-T^*\Vert _{2}^2\right]&=\sum _{K=1}^{K_{\textrm{max}}}\mathbb {E}\left[ \Vert T_K-T^*\Vert _2^2\textbf{1}_{\widehat{K}=K}\right] \\&\le \mathbb {E}\left[ \Vert T_{K^*}-T^*\Vert _2^2\right] + \sum _{K=1, K\ne K^*}^{K_{\textrm{max}}}K \mathbb {P}(\widehat{K}=K) \\&+\sum _{K=1, K\ne K^*}^{K_{\textrm{max}}}\mathbb {E}\left[ \Vert T_K-T^*\Vert _2^2 \textbf{1}_{\Vert T_K-T^*\Vert _2^2> K}\textbf{1}_{\widehat{K}=K}\right] \\&\le \mathbb {E}\left[ \Vert T_{K^*}-T^*\Vert _2^2\right] + \sum _{K=1}^{K^*-1}K \mathbb {P}(\widehat{K}=K)\\&+ \sum _{K=K^*+1}^{K_{\textrm{max}}}K \mathbb {P}(\widehat{K}=K)\\&+2\sum _{K=1, K\ne K^*}^{K_{\textrm{max}}}\mathbb {E}\left[ \Vert T_K-T^*_K\Vert _2^2\textbf{1}_{\Vert T_K-T^*\Vert _2^2> K}\right] \\&+2\sum _{K=1, K\ne K^*}^{K_{\textrm{max}}}\mathbb {P}(\widehat{K}=K) \Vert T^*-T_K^*\Vert _2^2 . \end{aligned} \end{aligned}$$

Firstly, from Theorem 1,

$$\begin{aligned} \begin{aligned}&{\mathbb {E}\left[ \Vert T_K-T_K^*\Vert _2^2 \textbf{1}_{\Vert T_K-T^*\Vert _2^2> K}\right] }\\&= K \mathbb {P}\left( \Vert T_K-T_K^*\Vert _2^2> K\right) + \int _K^{\infty } \mathbb {P}\left( \Vert T_K-T_K^*\Vert _2^2 > t\right) \textrm{d}t\\&\le 2 K\left( 1+\frac{\beta ^2 (\log k_n)^2}{\mathcal{C}_1 k_n}\right) \exp \left( -\frac{\mathcal{C}_1 k_n }{ \beta ^2 (\log k_n)^2}\right) \\&+2K\left( 1+ \frac{2\beta (\log k_n)}{\mathcal{C}_2 k_n}+\frac{2\beta ^2 (\log k_n)^2}{\mathcal{C}_2^2 k_n^2}\right) \exp \left( -\frac{\mathcal{C}_2 k_n}{\beta (\log k_n)}\right) + \frac{2\mathcal{C}_3 K^{1/2}}{k_n^{5/2} }\, . \end{aligned} \end{aligned}$$

Secondly, recall that

$$\begin{aligned} \Vert T^*_K-T^*\Vert ^2_2 = \int \Vert \varvec{\theta }^{*K}(\textbf{x})-\varvec{\theta }^{*}(\textbf{x})\Vert ^2_{\infty }\textrm{d}P_{\textbf{X}}(\textbf{x}) \le K_{\textrm{max}} \sum _{\ell =1}^{K_{\textrm{max}}}\mu (\mathcal{T}_\ell )\Vert \varvec{\theta }^{*K}_\ell -\varvec{\theta }^{*}_\ell \Vert _\infty ^2 \, , \end{aligned}$$

where $\mu (\mathcal{T}_\ell ) = \mathbb {P}(\textbf{X}\in \mathcal{T}_\ell )$. Following the same idea as in the proof of Proposition 2, from Taylor series , under Assumptions 2 and 3,

$$\begin{aligned} \Vert \varvec{\theta }^{*K}_{\ell } - \varvec{\theta }^*_\ell \Vert _\infty ^2 \le \mathfrak C^2_2(u)\frac{k_n^2}{n^2} \left( 1 + c\gamma _{\textrm{max}} \psi (u) + o(\psi (u)) \right) ^2 \,. \end{aligned}$$

Hence,

$$\begin{aligned} \begin{aligned} \Vert T^*_K-T^*\Vert _2^2&\le \mathfrak C^2_2(u)\frac{k_n^2}{n^2} (1 + c\gamma _{\textrm{max}} \psi (u) + o(\psi (u)))^2\sum _{\ell =1}^{K_{\textrm{max}}} \textbf{1}_{\textbf{X}\in \mathcal{T}_\ell }\\&\le \mathfrak C_3(u)\frac{k_n^2}{n^2} \,. \end{aligned} \end{aligned}$$

Finally,

$$\begin{aligned} \mathbb {E}\left[ \Vert \widehat{T}-T^*\Vert _{2}^2 \right] \le \frac{\mathcal{C}_5 K^* (\log k_n)^2 }{k_n}, \end{aligned}$$

for some constant $\mathcal{C}_5.$ .

Appendix B: Covering numbers

Lemma 10

Following the notations of the proof of Theorem 6, the class of functions $\mathfrak {F}$ satisfies

$$\begin{aligned} \mathcal{N}_{\Phi }(\varepsilon ,\mathfrak {F})\le \frac{\mathfrak C_4K^{4(d+1)(d+2)}\Vert \Phi \Vert _2^{\alpha _1}\sigma _{n}^{\alpha }}{\varepsilon ^{\alpha }}, \end{aligned}$$

for some constants $\mathfrak C_4>0$ and $\alpha >0$ (not depending on n nor K).

Proof

Let

$$\begin{aligned} \begin{aligned} g_{\varvec{\theta }}(z)&= -\frac{1}{\sigma }+\left( \frac{1}{\gamma }+1\right) \frac{\gamma z}{\sigma ^2(1+\frac{z\gamma }{\sigma })}, \\ h_{\varvec{\theta }}(z)&= -\frac{1}{\gamma ^2}\log \left( 1+\frac{z\gamma }{\sigma } \right) +\frac{\left( \frac{1}{\gamma }+1\right) z}{\sigma +z\gamma }, \end{aligned} \end{aligned}$$

for $z>0.$ For $\varvec{\theta }$ and $\varvec{\theta }'$ in $\mathcal{S}\times \Gamma ,$ we have (from a straightforward Taylor expansion),

$$\begin{aligned} |g_{\varvec{\theta }}(y-u) - g_{\varvec{\theta }'}(y-u)|\le C |\gamma - \gamma '|+ C'|\sigma -\sigma '|, \end{aligned}$$

for some constants C and $C'.$ More precisely, one can take

$$\begin{aligned} \begin{aligned} C&= \frac{6}{\gamma _{\textrm{min}}^2\sigma _{\textrm{min}}},\\ C'&= \frac{1}{\sigma _{\textrm{min}}^2}\left( 1+3\left\{ 1+\frac{1}{\gamma _{\textrm{min}}}\right\} \right) . \end{aligned} \end{aligned}$$

Next, observe that

$$\begin{aligned} |g_{\varvec{\theta }'}(y-u) - g_{\varvec{\theta }'}(y-u')|\le C''|u-u'|, \end{aligned}$$

where $C''=4\gamma ^2_{\textrm{max}}/[\gamma _{\textrm{min}}\sigma ^3].$ Which leads to

$$\begin{aligned} |g_{\varvec{\theta }}(y-u) - g_{\varvec{\theta }'}(y-u')|\le C_g \textrm{max}(\Vert \varvec{\theta }-\varvec{\theta }'\Vert _{\infty },|u-u'|), \end{aligned}$$

for some constant $C_g>0.$ Similarly,

$$\begin{aligned} |h_{\varvec{\theta }}(y-u) - h_{\varvec{\theta }'}(y-u)|\le C_1(4+\log (1+wy))|\gamma - \gamma ' |+ C_2|\sigma -\sigma '|, \end{aligned}$$

Next,

$$\begin{aligned} |h_{\varvec{\theta }'}(y-u) - h_{\varvec{\theta }'}(y-u')|\le C_7 |u-u'|, \end{aligned}$$

where $C_7=5/(\gamma _{\textrm{min}}\sigma _{\textrm{min}}),$ leading to, for some $C_h>0,$

$$\begin{aligned} |h_{\varvec{\theta }}(y-u) - h_{\varvec{\theta }'}(y-u')|\le C_h \textrm{max}(\Vert \varvec{\theta }-\varvec{\theta }'\Vert _{\infty },|u-u'|). \end{aligned}$$

On the other hand,

$$\begin{aligned} |\phi (y-u,\varvec{\theta })-\phi (y-u,\varvec{\theta }')|\le \frac{1}{\gamma _{\textrm{min}}^2}(2+\log (1+wy))|\gamma -\gamma '|+ \frac{3}{\gamma _{\textrm{min}}\sigma _{\textrm{min}}}|\sigma -\sigma '|, \end{aligned}$$

and

$$\begin{aligned} |\phi (y-u,\varvec{\theta }')-\phi (y-u',\varvec{\theta }')|\le \frac{1}{\sigma _{\textrm{min}}}|u-u'|. \end{aligned}$$

Define $\mathfrak {F}_1=\{g_{\varvec{\theta }}(\cdot -u):\varvec{\theta } \in \mathcal{S}\times \Gamma , u\in [u_{\textrm{min}},u_{\textrm{max}}]\},$ $\mathfrak {F}_2=\{h_{\varvec{\theta }}(\cdot -u):\varvec{\theta } \in \mathcal{S}\times \Gamma , u\in [u_{\textrm{min}},u_{\textrm{max}}]\},$ and $\mathfrak {F}_3=\{\phi (\cdot -u,\varvec{\theta }):\varvec{\theta } \in \mathcal{S}\times \Gamma , u\in [u_{\textrm{min}},u_{\textrm{max}}]\}.$ From ((van der Vaart 1998), Example 19.7), we get, for $i=1,...,3,$

$$\begin{aligned} N(\varepsilon ,\mathfrak {F}_i)\le \varphi _i \Vert \Phi \Vert _2^{\alpha _1}\sigma _{n}^{\alpha _1}\varepsilon ^{-\alpha _1}, \end{aligned}$$

for some $\alpha >0$ and constants $\varphi _i.$

On the other hand, let

$$\begin{aligned} \mathfrak {F}_4=\left\{ \textbf{x}\mapsto \textbf{1}_{\textbf{x}\in \mathcal{T}_\ell } :\ell =1,\ldots ,K \right\} , \end{aligned}$$

and

$$\begin{aligned} \mathfrak {F}_5=\left\{ y \mapsto \textbf{1}_{y>u} :u \in \mathcal{U} \right\} . \end{aligned}$$

From Lemma 4 in (Lopez et al. 2016), we have $N(\varepsilon ,\mathfrak {F}_4)\le m^k K^{\alpha _2}\varepsilon ^{-\alpha _2},$ where $\alpha _2=4(d+1)(d+2),$ and where k is the number of discrete components taking at most m modalities. On the other hand, from Example 19.6 in (van der Vaart 1998), $N(\varepsilon ,\mathfrak {F}_5)\le 2\varepsilon ^{-2}.$

From ((Einmahl and Mason 2005), Lemma A.1), we get, for $i=1,\ldots ,3,$

$$\begin{aligned} N(\varepsilon ,\mathfrak {F}_i\mathfrak {F}_4\mathfrak {F}_5)\le \frac{4 m^kK^{\alpha _2}\textrm{max}(C_g,C_h)\Vert \Phi \Vert _2^{\alpha _1}\sigma _{n}^{\alpha _1}}{\varepsilon ^{\alpha _1+\alpha _2+\alpha _3}}. \end{aligned}$$

Multiplying $\mathfrak {F}_i\mathfrak {F}_4\mathfrak {F}_5$ by a single indicator function $\textbf{1}_{\Phi (Y_i)\le M_n}$ does not change the covering number, and the result follows.

Appendix C: Technical Lemmas

Lemma 11

1.
The derivatives of the functions $y\rightarrow \phi (y-u,\varvec{\theta })$ with respect to $\varvec{\theta }$ are uniformly bounded by
$$\begin{aligned} \Phi (y)=C(1+\log (1+wy)), \end{aligned}$$
where C is a constant (not depending on n), and $w=\gamma _{\textrm{max}}/\sigma _{\textrm{min}}.$
2.
There exists a certain $\rho _0>0$ such that
$$\begin{aligned} m_{\rho _0} := \mathbb {E}\left[ \exp (\rho _0\Phi (Y))\right] <\infty . \end{aligned}$$

Proof

To proof point 1, it is sufficient to derive the GP likelihood and see that they can be upper-bounded by $\Phi$.

Now, for point 2, note that for all $\textbf{x}$, $\gamma (\textbf{x}) \ge \gamma _{\textrm{min}} >0$, Y is heavy-tailed random variable, then $\log (Y)$, and thus $\Phi (Y)$, is a light-tailed random variable. Thus $\Phi (Y)$ has finite exponential moments.

Lemma 12

With $v_{\mathfrak {F}}$ defined in Proposition 4,

$$\begin{aligned} v_{\mathfrak {F}}\le \frac{M_n^2k_n}{n}. \end{aligned}$$

Proof

We have

$$\begin{aligned} \begin{aligned} v_{\mathfrak {F}}&\le \mathbb {E}\left[ \Phi (Y)^2 \textbf{1}_{Y\ge u_{\textrm{min}}}\textbf{1}_{\Phi (Y)\le M_n}\right] \\&\le M_n^2 \mathbb {P}(Y\ge u_{\textrm{min}})=\frac{M_n^2k_n}{n}. \end{aligned} \end{aligned}$$

Lemma 13

Define, for $j = 1, 2, 3$,

$$\begin{aligned} E_{j,n}=\mathbb {E}\left[ \Phi (Y)^j \textbf{1}_{\Phi (Y)\ge M_n}\textbf{1}_{Y\ge u_{\textrm{min}}}\right] . \end{aligned}$$

Under the assumptions of Theorem 7,

$$\begin{aligned} E_{j,n}\le \frac{\mathfrak {e}_j k_n^{1/2}}{n^{1/2}n^{\rho _0\beta a_2/4}}. \end{aligned}$$

Proof

Applying twice Cauchy-Schwarz inequality leads to

$$\begin{aligned} E_{j,n}\le \mathbb {P}(Y\ge u_{\textrm{min}})^{1/2}\mathbb {E}[\Phi (Y)^{2j}\textbf{1}_{\Phi (Y)\ge M_n}]^{1/2}\le \frac{k_n^{1/2}}{n^{1/2}}\mathbb {E}[\Phi (Y)^{4j}]^{1/4}\mathbb {P}(\Phi (Y)\ge M_n)^{1/4}. \end{aligned}$$

Next, from Chernoff inequality,

$$\begin{aligned} \mathbb {P}(\Phi (Y)\ge M_n)\le \exp (-\rho _0 M_n)\mathbb {E}[\exp (\rho _0 \Phi (Y))]\le \frac{m_{\rho _0}}{n^{\rho _0\beta a_2}}. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Farkas, S., Heranval, A., Lopez, O. et al. Generalized pareto regression trees for extreme event analysis. Extremes (2024). https://doi.org/10.1007/s10687-024-00485-1

Download citation

Received: 19 July 2022
Revised: 02 November 2023
Accepted: 04 February 2024
Published: 23 March 2024
DOI: https://doi.org/10.1007/s10687-024-00485-1

Keywords

MSC Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized pareto regression trees for extreme event analysis

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

The JASP guidelines for conducting and reporting a Bayesian analysis

An Introduction to Machine Learning for Panel Data

Availability of supporting data

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval and Consent to participate

Human and animal ethics

Consent for publication

R codes

Competing interests

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 1.28 MB)

Supplementary file2 (zip 1.91 MB)

Appendices

Appendix A: Proofs

1.1 Concentration inequalities

Proposition 4

Proposition 5

1.2 Deviation results

Theorem 6

Proof

Theorem 7

Proof

Corollary 8

Proof

1.3 Proof of Theorem 1

1.4 Proof of Proposition 2

1.5 Proof of Theorem 3

Lemma 9

Proof

Appendix B: Covering numbers

Lemma 10

Proof

Appendix C: Technical Lemmas

Lemma 11

Proof

Lemma 12

Proof

Lemma 13

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

MSC Classification

Search

Navigation