Skip to main content
Log in

Generalized pareto regression trees for extreme event analysis

  • Published:
Extremes Aims and scope Submit manuscript

Abstract

This paper derives finite sample results to assess the consistency of Generalized Pareto regression trees introduced by Farkas et al. (Insur. Math. Econ. 98:92–105, 2021) as tools to perform extreme value regression for heavy-tailed distributions. This procedure allows the constitution of classes of observations with similar tail behaviors depending on the value of the covariates, based on a recursive partition of the sample and simple model selection rules. The results we provide are obtained from concentration inequalities, and are valid for a finite sample size. A misspecification bias that arises from the use of a “Peaks over Threshold” approach is also taken into account. Moreover, the derived properties legitimate the pruning strategies, that is the model selection rules, used to select a proper tree that achieves a compromise between simplicity and goodness-of-fit. The methodology is illustrated through a simulation study, and a real data application in insurance for natural disasters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of supporting data

Since the data were provided by a private partnership with the Mission Risques Naturels, the data are not publicly available.

References

Download references

Acknowledgements

The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR) under reference ANR-20-CE40-0025-01 (T-REX project).

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

All the authors wrote the main manuscript text, and the supplementary material All the authors prepared the all figures and tables All the authors reviewed the manuscript.

Corresponding author

Correspondence to Maud Thomas.

Ethics declarations

Ethical approval and Consent to participate

All the authors approve and consent to participate.

Human and animal ethics

Not applicable.

Consent for publication

All the authors consent for publication.

R codes

The R codes are publicly available at https://github.com/antoine-heranval/Generalized-Pareto-Regression-Trees-for-extreme-event-analysis.

Competing interests

The authors have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 1.28 MB)

Supplementary file2 (zip 1.91 MB)

Appendices

Appendix A: Proofs

In this Section, we present in details the proof of the results presented throughout the paper. Concentration inequalities required to obtain the results are presented in Section 1. These inequalities are used to obtain deviation bounds in Section 2, which are the key ingredients of the proof of Theorem 1 (Section 3), Corollary 8 (Section 2), and Theorem 3 (Section 5). Section 2 shows some results on covering numbers that are required to control the complexity of some classes of functions considered in the proofs. Some technical lemmas are gathered in Section 3.

1.1 Concentration inequalities

The proofs of the main results are mostly based on concentration inequalities. The following inequality was proved initially Talagrand (1994), (see also (Einmahl and Mason 2005)).

Proposition 4

Let \((\textbf{V}_i)_{1\le i \le n}\) denote i.i.d. replications of a random vector \(\textbf{V},\) and let \((\varepsilon _i)_{1\le i \le n}\) denote a vector of i.i.d. Rademacher variables (that is, \(\mathbb {P}(\varepsilon _i=-1)=\mathbb {P}(\varepsilon _i=1)=1/2)\) independent from \((\textbf{V}_i)_{1\le i \le n}.\) Let \(\mathfrak {F}\) be a pointwise measurable class of functions bounded by a finite constant \(M_0.\) Then, for all t

$$\begin{aligned} \begin{aligned} \mathbb {P}&\left( \sup _{\varphi \in \mathfrak {F}}\left\| \sum _{i=1}^n \{\varphi (\textbf{V}_i)-\mathbb {E}[\varphi (\textbf{V})]\}\right\| _{\infty }>A_1\left\{ \mathbb {E}\left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\left\| \sum _{i=1}^n \varphi (\textbf{V}_i)\varepsilon _i\right\| _{\infty }\right] +t\right\} \right) \\&\le 2\left\{ \exp \left( -\frac{A_2t^2}{nv_{\mathfrak {F}}}\right) +\exp \left( -\frac{A_2t}{M_0}\right) \right\} , \end{aligned} \end{aligned}$$

with \(v_{\mathfrak {F}}=\sup _{\varphi \in \mathfrak {F}}\textrm{Var}(\Vert \varphi (\textbf{V})\Vert _{\infty }),\) and where \(A_1\) and \(A_2\) are universal constants.

The difficulty in using Proposition 4 comes from the need to control the symmetrized quantity \(\mathbb {E}\left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\left\| \sum _{i=1}^n \varphi (\textbf{V}_i)\varepsilon _i\right\| \right] .\) Proposition 5 is due to Einmahl and Mason (2005) and allows this control via some assumptions on the considered class of functions \(\mathfrak {F}\).

We first need to introduce some notations regarding covering numbers of a class of functions. More details can be found for example in ((van der Vaart 1998), Chapter2.6). Let us consider a class of functions \(\mathfrak {F}\) with envelope \(\Phi\) (which means that for (almost) all v\(\varphi _{\varvec{\theta }}\in \mathfrak {F},\) \(|f(v)|\le \Phi (v)\)). Then, for any probability measure \(\mathbb {Q},\) introduce \(N(\varepsilon ,\mathfrak {F},\mathbb {Q})\) the minimum number of \(L^2(\mathbb {Q})\) balls of radius \(\varepsilon\) to cover the class \(\mathfrak {F}.\) Then, define

$$\mathcal{N}_{\Phi }(\varepsilon ,\mathfrak {F})=\sup _{\mathbb {Q}:\mathbb {Q}(\Phi ^2)<\infty } N(\varepsilon (\mathbb {Q}(\Phi ^2)^{1/2}),\mathfrak {F},\mathbb {Q}).$$

Proposition 5

Let \(\mathfrak {F}\) be a point-wise measurable class of functions bounded by \(M_0\) with envelope \(\Phi\) such that, for some constants \(A_3, \alpha \ge 1,\) and \(0\le \sqrt{v} \le M_0,\) we have

  1. (i)

    \(\mathcal{N}_{\Phi }(\varepsilon ,\mathfrak {F})\le A_3 \varepsilon ^{-\alpha },\) for \(0<\varepsilon <1,\)

  2. (ii)

    \(\sup _{\varphi \in \mathfrak {F}}\mathbb {E}\left[ \varphi (\textbf{V})^2\right] \le v,\)

  3. (iii)

    \(M_0\le \frac{1}{4\alpha ^{1/2}}\sqrt{nv/\log (A_4M_0/\sqrt{v}) },\) with \(A_4=\textrm{max}(e,A_3^{1/\alpha }).\)

Then, for some absolute constant \(A_5,\)

$$\begin{aligned} \mathbb {E}\left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\left\| \sum _{i=1}^n \varphi (\textbf{V}_i)\varepsilon _i\right\| \right] \le A_5\sqrt{\alpha n v \log (A_4M_0/\sqrt{v})}. \end{aligned}$$

1.2 Deviation results

We first introduce some notations that will be used throughout Sections “Deviation results” to “Covering numbers”. In the following, \(\varphi _{\varvec{\theta }}\) is a function indexed by \(\varvec{\theta }=(\sigma ,\gamma )^{t}\) denoting either \(\phi (\cdot ,\varvec{\theta })\), \(\partial _{\sigma } \phi (\cdot ,\varvec{\theta }),\) or \(\partial _{\gamma }\phi (\cdot ,\varvec{\theta }).\)

We consider in the following the class of functions \(\mathfrak {F}\) defined as

$$\begin{aligned} \mathfrak F =\left\{ y \mapsto \varphi _{ \varvec{\theta }}(y-u)\textbf{1}_{y\ge u}\textbf{1}_{\textbf{x}\in \mathcal{T}_\ell }, \; \varvec{\theta } \in \varvec{\Theta },\; u\in [u_{\textrm{min}};u_{\textrm{max}}],\ell =1,...,K\right\} . \end{aligned}$$
(10)

By Lemma 11, the functions \(y \mapsto \partial _{\sigma } \phi (y-u,\varvec{\theta })\) and \(y \mapsto \partial _{\gamma } \phi (y-u,\varvec{\theta })\) are uniformly bounded (eventually up to some multiplication by a constant) by \(\Phi (y)=\log (1+wy),\) where \(w=\gamma _{\textrm{max}}/\sigma _{\textrm{min}}\). On the other hand, \(y \mapsto \phi (y-u,\varvec{\theta })\) is bounded by \(\log \sigma _n+\Phi (y)=O(\log (k_n))+\Phi (y).\)

Next, for \(\ell =1,\ldots ,K\), and \(\varvec{\theta }=(\sigma ,\gamma )^t \in \varvec{\Theta }\), let

$$\begin{aligned} L_n^{\ell }(\varvec{\theta },u)=\frac{1}{k_n}\sum _{i=1}^n \phi (Y_i-u,\varvec{\theta })\textbf{1}_{Y_i>u}\textbf{1}_{\textbf{X}_i\in \mathcal{T}_{\ell }}, \end{aligned}$$

be the (normalized) negative GP log-likelihood associated with the leaf \(\ell\) of a tree \(T_K\) with set of K leaves \((\mathcal{T}_\ell )_{\ell =1,\ldots ,K}\). Let \(L^{\ell }(\varvec{\theta },u) = \mathbb {E}[L_n^{\ell }(\varvec{\theta },u)]\). The key results behind Theorems 1 and 3 relies on studying the deviations of the processes, indexed by \(\varvec{\theta },\; u\) and \(\ell\),

$$\begin{aligned} \mathcal{W}_0^{\ell }(\varvec{\theta },u) = L_n^{\ell }(\varvec{\theta },u)-L^{\ell }(\varvec{\theta },u), \end{aligned}$$
$$\begin{aligned} \mathcal{W}_1^\ell (\varvec{\theta },u) = \nabla _{\varvec{\theta }} L_n^\ell (\varvec{\theta },u)-\nabla _{\varvec{\theta }} L^\ell (\varvec{\theta },u). \end{aligned}$$

Let \(M_n = \beta \log k_n \le \beta a_1 \log (n)\) with \(\beta >0\) and \(a_1>0\) (with \(a_1\) defined in Assumption 1). We study the deviations of these processes by decomposing \(\mathcal{W}_i^\ell (\varvec{\theta },u),\) for \(i=0,1,\) (which is a sum of i.i.d. observations) into two sums.

  • the first one gathers observations smaller than some bound (more precisely, such that \(\Phi (Y_i)\le M_n\)), which is considered in Theorem 6. Since these observations are bounded (even if this bound in fact depends on n and can tend to infinity when n grows), we can apply a concentration inequality such as the one of Section 1. Let us stress that \(\sup _{\varphi _{\varvec{\theta }} \in \mathfrak F} \Vert \varphi _{\varvec{\theta }}(y)\textbf{1}_{\Phi (y)\le M_n}\Vert _{\infty } \le M_n\);

  • in the second one (Theorem 7), we consider the observations larger than this bound, and control them through the fact that the function \(\Phi\) has finite exponential moments (see Lemma 11).

Corollary 8, which provides deviation bounds for estimation errors in the leaves of the tree, is then a direct consequence.

Theorem 6

Let

$$\begin{aligned} \underline{\mathcal{Z}}(M_n) = \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}} \left|\frac{1}{k_n} \sum _{i=1}^n \left( \varphi _{\varvec{\theta }}(Y_i) \textbf{1}_{\Phi (Y_i)\le M_n} - \mathbb {E}\left[ \varphi _{\varvec{\theta }}(Y_i) \textbf{1}_{\Phi (Y_i) \le M_n}\right] \right) \right|. \end{aligned}$$

If \(k_n = O(n^{a_1})\) with \(a_1 >0\) (Assumption 1), then, for \(t \ge {\mathfrak c_1} (\log k_n)^{1/2} k_n^{-1/2}\),

$$\begin{aligned} \mathbb {P}\left( \underline{\mathcal{Z}}(M_n) \ge t\right) \le 2\left( \exp \left( - \frac{{ C_1} k_n t^2}{\beta ^2 (\log k_n)^2} \right) + \exp \left( -\frac{{ C_2} k_n t}{\beta \log k_n} \right) \right) . \end{aligned}$$
(11)

Proof

From Proposition 4,

$$\begin{aligned} \begin{aligned}&\mathbb {P} \left( \underline{\mathcal{Z}}(M_n) \ge A_1 \left\{ \mathbb {E} \left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\frac{1}{k_n}\left|\sum _{i=1}^n \varphi _{\varvec{\theta }}(Y_i)\textbf{1}_{\Phi (Y_i)\le M_n}\varepsilon _i\right|\right] +t\right\} \right) \\&\le 2 \left( \exp \left( - \frac{A_2 k_n^2 t^2}{nv_{\mathfrak {F}}} \right) + \exp \left( -\frac{A_2 k_n t}{M_n} \right) \right) \, , \end{aligned} \end{aligned}$$
(12)

with \(v_{\mathfrak F} = \sup _{\varphi \in \mathfrak F} \textrm{var}\left( \vert \varphi (Y) \vert \right)\). From Lemma 12, \(v_{\mathfrak {F}}\le M_n^2k_nn^{-1},\) which shows that the first exponential term on the right-hand side of (12) is smaller than

$$\begin{aligned} \exp \left( - \frac{A_2 k_n t^2}{M_n^2} \right) . \end{aligned}$$
(13)

We can now apply Proposition 5 (combined with Lemma 10) to this class of functions with \(v=M_n^2k_nn^{-1}\) and \(M_0=M_n.\) Hence,

$$\begin{aligned} \mathbb {E} \left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\frac{1}{k_n}\left|\sum _{i=1}^n \varphi _{\varvec{\theta }}(Y_i)\textbf{1}_{\Phi (Y_i)\le M_n} \varepsilon _i\right|\right] \le \frac{{ A_6}}{k_n}\sqrt{nv \mathfrak {s}_n}={ A_6} \frac{ \mathfrak {s}^{1/2}_n}{k_n^{1/2}} \; , \end{aligned}$$

where \({ A'_6}>0\) and \(\mathfrak {s}_n=\log (\sigma _n^{\alpha } K^{4(d+1)(d+2)}n/k_n)\) (\(\alpha >0\) being defined in Lemma 10). From Assumption 1, we see that \(\mathfrak {s}_n=O(\log (k_n))\) (let us recall that K is necessarily less than n). Whence, if \({ \mathfrak c_1}= 2A_1{A'_6}\), for \(t\ge {\mathfrak c_1} \left\{ \log \left( k_n\right) \right\} ^{1/2} k_n^{-1/2}\),

$$\begin{aligned} \mathbb {P}\left( \underline{\mathcal{Z}}(M_n) \ge t\right) \le \mathbb {P} \left( \underline{\mathcal{Z}}(M_n) \ge A_1 \left\{ \mathbb {E} \left[ \sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\frac{1}{k_n}\left|\sum _{i=1}^n \varphi _{\varvec{\theta }}(Y_i)\textbf{1}_{\Phi (Y_i)\le M_n}\varepsilon _i\right|\right] + \frac{t}{2A_1}\right\} \right) \; . \end{aligned}$$

Equation (11) follows from (12) and (13) with \({{C}_1}=A_2A_1^{-2}/4\) and \({ C_2}=A_2A_1^{-1}/2.\)

Theorem 7

Let

$$\begin{aligned} \overline{\mathcal{Z}}(M_n)=\sup _{\varphi _{\varvec{\theta }}\in \mathfrak {F}}\left|\frac{1}{k_n} \sum _{i=1}^n \left( f(Y_i )\textbf{1}_{\Phi (Y_i)> M_n}\right) - \mathbb {E}\left[ \varphi _{\varvec{\theta }}(Y_i)\textbf{1}_{\Phi (Y_i) > M_n}\right] \right|. \end{aligned}$$

If \(k_n = O(a_1)\) with \(a_1>0\) (Assumption 1), then there exists \(\rho _0>0\) (Lemma 11) such that for \(\beta a_1 \ge 10/\rho _0,\) and \(t\ge {\mathfrak c_2} k_n^{-1/2}\),

$$\begin{aligned} \mathbb {P}\left( \overline{\mathcal{Z}}(M_n) \ge t\right) \le \frac{{ C_{3}}}{k_n^{5/2} t^3}. \end{aligned}$$
(14)

Proof

Let \(\beta '=\beta a_2.\) \(\overline{\mathcal{Z}}(M_n)\) is upper-bounded by

$$\begin{aligned} \frac{1}{k_n} \sum _{i=1}^n \left\{ \Phi (Y_i) \textbf{1}_{\Phi (Y_i) \ge M_n}\textbf{1}_{Y_i\ge u_{\textrm{min}}} + \mathbb {E} \left[ \Phi (Y) \textbf{1}_{\Phi (Y) \ge M_n}\textbf{1}_{Y\ge u_{\textrm{min}}}\right] \right\} \, . \end{aligned}$$

A bound for \(E_{1,n}=\mathbb {E} \left[ \Phi (Y) \textbf{1}_{\Phi (Y) \ge M_n}\textbf{1}_{Y\ge u_{\textrm{min}}}\right]\) is obtained from Lemma 13, and \(nE_{1,n}/k_{n}\le \mathfrak {e}_1 k_n^{-1/2}\) if \(\beta ' \ge 2/\rho _0.\)

Next, from Markov inequality,

$$\begin{aligned} \begin{aligned} t^3\mathbb {P}\left( \frac{1}{k_n} \sum _{i=1}^n \Phi (Y_i) \textbf{1}_{\Phi (Y_i) \ge M_n}\textbf{1}_{Y_i\ge u_{\textrm{min}}}\ge t\right)&\le \frac{n E_{3,n}}{k_n^3}+\frac{n(n-1)E_{2,n}E_{1,n}}{k_n^3}\\&+\frac{n(n-1)(n-2)E_{1,n}^3}{k_n^3}. \end{aligned} \end{aligned}$$

From Lemma 13, we get

$$\begin{aligned} \begin{aligned} \frac{n E_{3,n}}{k_n^3}&\le \frac{\mathfrak {e}_3 n^{-(\rho _0\beta '/4-1/2)}}{k_n^{5/2}}, \\ \frac{n(n-1)E_{2,n}E_{1,n}}{k_n^3}&\le \frac{\mathfrak {e}_2\mathfrak {e}_1n^{-(\rho _0\beta '/2-3/2)}}{k_n^{5/2}}, \\ \frac{n(n-1)(n-2)E_{1,n}^3}{k_n^3}&\le \frac{\mathfrak {e}_1^3n^{-(\rho _0\beta '/4-5/2)}}{k_n^{5/2}}. \end{aligned} \end{aligned}$$

Each of these terms is bounded by \(\textrm{max}(\mathfrak {e}_3,\mathfrak {e}_2 \mathfrak {e}_1,\mathfrak {e}_1^3)k_n^{-5/2}\) for \(\beta ' \ge 10/\rho _0.\) Thus, for \(t\ge 2 \mathfrak {e}_1 k_n^{-1/2}\) and \(\beta ' \ge 10/\rho _0,\)

$$\begin{aligned} \begin{aligned}&\mathbb {P}\left( \overline{\mathcal{Z}}_n \ge t \right) \\&\le \mathbb {P} \left( \frac{1}{k_n} \sum _{i=1}^n \Phi (Y_i) \textbf{1}_{\Phi (Y_i) \ge M_n}\textbf{1}_{Y_i\ge u_{\textrm{min}}} \ge \frac{t}{2}\right) + \mathbb {P}\left( \mathbb {E} \left[ \Phi (Y) \textbf{1}_{\Phi (Y) \ge M_n}\textbf{1}_{Y\ge u_{\textrm{min}}}\right] \ge \frac{t}{2}\right) \\&\le \frac{8\textrm{max}(\mathfrak {e}_3,\mathfrak {e}_2 \mathfrak {c}_1,\mathfrak {e}_1^3)}{t^3k_n^{5/2}} \end{aligned} \end{aligned}$$

We now apply these results to deduce deviation bounds on the estimators \(\widehat{\varvec{\theta }}_{\ell }\) in the leaves of the tree.

Corollary 8

Under the assumptions of Theorems 6 and 7 and Assumption 2, for \(t\ge \mathfrak c_3 (\log k_n)^{1/2}k_n^{-1/2},\)

$$\begin{aligned} \begin{aligned} \mathbb {P}\left( \sup _{\begin{array}{c} \ell =1,\ldots , K,\\ { u_{\textrm{min}} \le u \le u_{\textrm{max}}} \end{array}} \Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_\ell \Vert _{\infty }\ge t\right)&\le 2\left( \exp \left( - \frac{{ C_4} k_n t^2}{\beta ^2 (\log k_n)^{2}} \right) + \exp \left( -\frac{{ C_5} k_n t}{\beta \log k_n} \right) \right) \\&+\frac{{C_6}}{k_n^{5/2} t^3}. \end{aligned} \end{aligned}$$

Proof

For \(1 \le \ell \le K\) and \({ u_{\textrm{min}} \le u \le u_{\textrm{max}}}\), let \(\varvec{\theta }=(s,\gamma )^{t}\) and, for \(\ell = 1,\ldots , K\), \(\varvec{\theta }^{*K}_\ell =(s^{*K}_\ell (u),\gamma ^{*K}_\ell (u))^{t},\) and let

$$\begin{aligned} \nabla _{\varvec{\theta }}L^\ell (\varvec{\theta },u) = \mathbb {E}\left[ \left( \begin{array}{c} \partial _\sigma \phi (Y-u,\varvec{\theta }) \\ \partial _\gamma \phi (Y-u,\varvec{\theta })\end{array}\right) \textbf{1}_{Y\ge u} \textbf{1}_{\textbf{X}\in \mathcal{T}_\ell } \right] . \end{aligned}$$

From Taylor series,

$$\begin{aligned} \nabla _{\varvec{\theta }}L^\ell (\varvec{\theta },u)=\mathbb {E}\left[ H^{\ell }_{(\tilde{\sigma }_1,\gamma _1),(\sigma _1,\tilde{\gamma }_1),(\tilde{\sigma }_2,\gamma _2),(\sigma _2,\tilde{\gamma }_2)}(Y-u)\textbf{1}_{\textbf{X}\in \mathcal{T}_\ell }\right] (\varvec{\theta }-\varvec{\theta }^{*K}_\ell )^t, \end{aligned}$$

for some parameters \(\tilde{\sigma }_j\) (resp. \(\tilde{\gamma }_j\)) between \(\sigma\) and \(\sigma ^{*K}_\ell (u)\) (resp. \(\gamma\) and \(\gamma ^{*K}_\ell (u)\)). From Assumption 2, we get, for all \(\ell =1,\ldots , K\),

$$\begin{aligned} \frac{n}{k_n}\Vert \nabla _{\varvec{\theta }}L^\ell (\varvec{\theta },u)\Vert _{\infty }\ge \mathfrak C_1\Vert \varvec{\theta }-\varvec{\theta }^{*K}_{\ell }(u)\Vert _{\infty }. \end{aligned}$$

Hence, for all \(\ell =1,\ldots , K\),

$$\begin{aligned} \mathbb {P}\left( \Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_\ell \Vert _{\infty }\ge t\right) \le \mathbb {P}\left( \frac{n}{k_n}\Vert \nabla _{\varvec{\theta }}L^\ell (\widehat{\varvec{\theta }}^K,u)\Vert _{\infty }\ge \mathfrak C_1t\right) . \end{aligned}$$

Since for all \(\ell =1,\ldots , K\), \(\nabla _{\varvec{\theta }} L_n^\ell (\widehat{\varvec{\theta }}^K)=0,\) \(\mathcal{W}_1^\ell (\widehat{\varvec{\theta }}^K(u),u)=-\frac{n}{k_n}\nabla _{\varvec{\theta }}L^\ell (\widehat{\varvec{\theta }}^K,u).\) Hence,

$$\begin{aligned} \mathbb {P}\left( \sup _{\begin{array}{c} \ell =1,\ldots , K, \\ { u_{\textrm{min}} \le u \le u_{\textrm{max}}} \end{array}} \Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_l(u)\Vert _{\infty }\ge t\right) \le \mathbb {P}\left( \sup _{\begin{array}{c} \ell =1,\ldots ,K,\\ { u_{\textrm{min}} \le u \le u_{\textrm{max}}} \end{array}}\Vert \mathcal{W}_1^\ell (\widehat{\varvec{\theta }}^K(u),u)\Vert _{\infty }\ge \mathfrak C_1t\right) , \end{aligned}$$

and the right-hand side is bounded by

$$\begin{aligned} \mathbb {P}\left( \overline{\mathcal{Z}}(M_n)\ge \frac{\mathfrak C_1t}{2}\right) +\mathbb {P}\left( \underline{\mathcal{Z}}(M_n)\ge \frac{\mathfrak C_1t}{2}\right) . \end{aligned}$$

The result follows from Theorem 6 and 7.

1.3 Proof of Theorem 1

The proof of the first part of Theorem 1 then consists in gathering the results on the leaves obtained in Corollary 8. Let \({ u_{\textrm{min}} \le u \le u_{\textrm{max}}}\),

$$\begin{aligned} \Vert \widehat{T}_K-T^*_K\Vert _2^2\le \sum _{\ell =1}^K \Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_\ell \Vert _{\infty }^2\le K \sup _{\ell =1,...,K}\Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_\ell \Vert _{\infty }^2. \end{aligned}$$

Hence

$$\begin{aligned} \begin{aligned}&{\mathbb {P}\left( \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}\Vert \widehat{T}_K-T^*_K\Vert _2^2\ge t\right) } \\&\le \mathbb {P}\left( \sup _{\begin{array}{c} \ell =1,\ldots ,K,\\ { u_{\textrm{min}} \le u \le u_{\textrm{max}}} \end{array}}\Vert \widehat{\varvec{\theta }}^K_\ell -\varvec{\theta }^{*K}_\ell \Vert _{\infty }\ge t^{1/2}K^{-1/2}\right) . \end{aligned} \end{aligned}$$

The results follows from Corollary 8, and from the assumption on \(K\le K_{\textrm{max}}=O(k_n^3)\) (Assumption 1).

To prove the second part of Theorem 1, write

$$\begin{aligned} \mathbb {E}\left[ { \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}}\Vert \widehat{T}_K-T^*_K\Vert _2^2\right] =\int _0^{\infty } \mathbb {P}({ \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}} \Vert \widehat{T}_K-T^*_K\Vert _2^2\ge t)dt. \end{aligned}$$

Let \(t_n=c_1 K (\log k_n)k_n^{-1},\) then

$$\begin{aligned} \begin{aligned}&{\int _0^{\infty } \mathbb {P}({ \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}}\Vert \widehat{T}_K-T^*_K\Vert _2^2\ge t)dt}\\&\le t_n+\int _{t_n}^{\infty } \mathbb {P}({ \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}}\Vert \widehat{T}_K-T^*_K\Vert _2^2\ge t)dt. \end{aligned} \end{aligned}$$

We now use Theorem 1 to bound the integral on the right-hand side. Since \(\int _0^{\infty }\exp (-a t)dt=\frac{1}{a},\) \(\int _0^{\infty } \exp (-a^{1/2} t^{1/2})dt=\frac{2}{a},\) and \(\int _1^{\infty }t^{-3/2}dt=2,\) we get

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ { \sup _{u_{\textrm{min}} \le u \le u_{\textrm{max}}}}\Vert \widehat{T}_K-T^*_K\Vert _2^2\right]&\le { t_n+ \frac{2K\beta ^2 (\log k_n)^2}{\mathcal{C}_1 k_n }+\frac{4K \beta ^2 (\log k_n)^2}{\mathcal{C}_2^2 k_n } + \frac{2 \mathcal{C}_3K}{k_n^{5/2} }}\\&\le \frac{c_1 K \log k_n}{k_n}+ \frac{2 K \beta ^2 (\log k_n)^2}{\mathcal{C}_1 k_n }\\&+\frac{4K \beta ^2(\log k_n)^2}{\mathcal{C}_2^2 k_n } + \frac{2 \mathcal{C}_3K}{k_n^{5/2} }\\&\le \frac{\mathcal{C}_4 K (\log k_n)^2}{k_n}. \end{aligned} \end{aligned}$$

1.4 Proof of Proposition 2

For all \(\textbf{x}\),

$$\begin{aligned} \Vert \varvec{\theta }^*(\textbf{x}) - \varvec{\theta }_0(\textbf{x})\Vert _\infty = \Vert \sum _{\ell =1}^{K_{\textrm{max}}}\left( \varvec{\theta }^{*}_\ell - \varvec{\theta }_{0}(\textbf{x}) \right) \textbf{1}_{\textbf{x}\in \mathcal{T}_\ell }\Vert _\infty \le \sum _{\ell =1}^{K_{\textrm{max}}}\Vert \varvec{\theta }^{*}_\ell - \varvec{\theta }_{0}(\textbf{x}) \Vert _\infty \textbf{1}_{\textbf{x}\in \mathcal{T}_\ell } \, . \end{aligned}$$

Now, from Taylor series, for \(\ell =1,\ldots , K\), conditionally on \(\textbf{X}\in \mathcal{T}_\ell\),

$$\begin{aligned} \nabla _{\varvec{\theta }} L^\ell (\varvec{\theta }_{0}(\textbf{X}),u) = \mathbb {E}\left[ H^{\ell }_{(\tilde{\sigma }_1,\gamma _1),(\sigma _1,\tilde{\gamma }_1),(\tilde{\sigma }_2,\gamma _2),(\sigma _2,\tilde{\gamma }_2)}(Y-u) \mid \textbf{X}\in \mathcal{T}_\ell \right] (\varvec{\theta }_{0}(\textbf{X}) - \varvec{\theta }^{*}_\ell )^t , \end{aligned}$$

for some parameters \(\tilde{\sigma }_j\) (resp. \(\tilde{\gamma }_j\)) between \(\sigma _0(\textbf{X})\) and \(\sigma ^{*K}_\ell (u)\) (resp. \(\gamma _0(\textbf{X})\) and \(\gamma ^{*K}_\ell (u)\)).

Thus, under Assumption 2,

$$\begin{aligned} \begin{aligned}&{\Vert \varvec{\theta }_{0}(\textbf{X}) - \varvec{\theta }^{*}_\ell \Vert _\infty }\\&\le \frac{1}{\mathfrak C_1} \Vert \nabla _{\varvec{\theta }} L^\ell (\varvec{\theta }_{0}(\textbf{X}),u) \Vert _\infty \\&\le \frac{1}{\mathfrak C_1}\frac{k_n}{n} \textrm{max}\left( |\mathbb {E}\left[ \partial _\sigma \phi (Z,\varvec{\theta }_0(\textbf{X}) ) \mid \textbf{X}\in \mathcal{T}_\ell \right] |,|\mathbb {E}\left[ \partial _\gamma \phi (Z,\varvec{\theta }_0(\textbf{X}) )\mid \textbf{X}\in \mathcal{T}_\ell \right] |\right) \,, \end{aligned} \end{aligned}$$

where Z is a random variable distributed according to the distribution \(F_u\) defined in Section 2.1 with \(\sigma _0(\textbf{X}) =u\gamma _0(\textbf{X})\) and with

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \partial _\sigma \phi (Z,\varvec{\theta }_0(\textbf{X}) ) \mid \textbf{X}\in \mathcal{T}_\ell \right]&= -\frac{1}{u \gamma _{0}(\textbf{X})} + \frac{1}{u^2\gamma _{0}(\textbf{X})} \left( 1+\frac{1}{\gamma _{0}(\textbf{X})} \right) \mathbb {E}\left[ \frac{Z}{1+Z/u}\mid \textbf{X}\in \mathcal{T}_\ell \right] \\ \mathbb {E}\left[ \partial _\gamma \phi (Z,\varvec{\theta }_0(\textbf{X}) ) \mid \textbf{X}\in \mathcal{T}_\ell \right]&=-\frac{1}{\gamma _{0}(\textbf{X})^2}\mathbb {E}\left[ \log (1+Z/u) \mid \textbf{X}\in \mathcal{T}_\ell \right] \\&+ \frac{1}{u\gamma _{0}(\textbf{x})}\left( 1+\frac{1}{\gamma _{0}(\textbf{X})} \right) \mathbb {E}\left[ \frac{Z}{1+Z/u}\mid \textbf{X}\in \mathcal{T}_\ell \right] \,. \end{aligned} \end{aligned}$$

Under Assumption 3, we have

$$\begin{aligned} \overline{F}_{u}(z) = \left( 1+\frac{z}{u}\right) ^{-1/\gamma _{0}(\textbf{X})} \left\{ 1 + c\psi (u) \int _1^{1+z/u} v^{\rho -1} \textrm{d}v + o(\psi (u))\right\} \,. \end{aligned}$$
$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \frac{Z}{1+Z/u}\mid \textbf{X}\in \mathcal{T}_\ell \right]&= \int _0^u \overline{F}_u \left( \frac{t}{1-t/u} \right) \textrm{d}t\\&= \frac{u}{1+1/\gamma _0(\textbf{X})} \left( 1 + \frac{ c\psi (u)}{1+1/\gamma _{0}(\textbf{X})-\rho } + o(\psi (u)) \right) \\&\le u \left( 1 + c\gamma _{0}(\textbf{X})\psi (u) + o(\psi (u)) \right) \end{aligned} \end{aligned}$$

and then

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \log (1+Z/u) \mid \textbf{X}\in \mathcal{T}_\ell \right]&= \int _0^u \mathbb {P}\left[ Z \ge u(\textrm{e}^t-1) \mid \textbf{X}\in \mathcal{T}_\ell \right] \textrm{d}t \\&= \gamma _{0}(\textbf{X})\left( 1 + \frac{c\psi (u)}{1/\gamma _{0}(\textbf{X})-\rho } + o(\psi (u)) \right) \\&\le \gamma _{0}(\textbf{X}) \left( 1+c\gamma _{0}(\textbf{X})\psi (\textbf{X})(u) + o(\psi (u)) \right) \, . \end{aligned} \end{aligned}$$

Consequently,

$$\begin{aligned} |\mathbb {E}\left[ \partial _\sigma \phi (Z,\varvec{\theta }_0(\textbf{X}) ) \mid \textbf{X}\in \mathcal{T}_\ell \right] |\le \frac{1}{\gamma _{\textrm{min}}}\left( 1+\frac{1}{u}\left( 1+\frac{1}{\gamma _{\textrm{min}}}\right) \right) \left( 1 + c\gamma _{0}(\textbf{X}) \psi (u) + o(\psi (u)) \right) \end{aligned}$$

and

$$\begin{aligned} |\mathbb {E}\left[ \partial _\gamma \phi (Z,\varvec{\theta }_0(\textbf{X}) ) \mid \textbf{X}\in \mathcal{T}_\ell \right] |\le \frac{1}{\gamma _{\textrm{min}}}\left( 1 +\frac{1}{\gamma _{\textrm{min}}}+\frac{\gamma _{\textrm{max}}}{\gamma _{\textrm{min}}}\right) \left( 1 + c\gamma _0(\textbf{X}) \psi (u ) + o(\psi (u)) \right) \, . \end{aligned}$$

Hence, conditionally on \(\textbf{X}\in \mathcal{T}_\ell\),

$$\begin{aligned} \Vert \varvec{\theta }_{0}(\textbf{X}) - \varvec{\theta }^{*}_\ell \Vert _\infty \le \mathfrak C_2(u)\frac{k_n}{n} \left( 1 + c\gamma _{\textrm{max}} \psi (u) + o(\psi (u)) \right) \,, \end{aligned}$$

where \(\mathfrak C_2(u)=\frac{1}{\mathfrak C_1}\frac{1}{\gamma _{\textrm{min}}}\textrm{max}\left( 1+\frac{1}{u}+\frac{1}{u\gamma _{\textrm{min}}},1+\frac{1}{\gamma _{\textrm{min}}}+\frac{\gamma _{\textrm{max}}}{\gamma _{\textrm{min}}} \right)\).

Finally, for all \(\textbf{x}\),

$$\begin{aligned} \begin{aligned} \Vert \varvec{\theta }^{*}(\textbf{x}) - \varvec{\theta }_0(\textbf{x})\Vert _\infty&\le \sum _{\ell =1}^{K_{\textrm{max}}}\Vert \varvec{\theta }^{*}_\ell - \varvec{\theta }_{0}(\textbf{x}) \Vert _\infty \textbf{1}_{\textbf{x}\in \mathcal{T}_\ell }\\&\le \mathfrak C_2(u)\frac{k_n}{n} \left( 1 + c\gamma _{\textrm{max}} \psi (u) + o(\psi (u)) \right) \sum _{\ell =1}^{K_{\textrm{max}}} \textbf{1}_{\textbf{x}\in \mathcal{T}_\ell }\\&\le \mathfrak C_2(u)\frac{k_n}{n} \left( 1 + c\gamma _{\textrm{max}} \psi (u) + o(\psi (u)) \right) \, . \end{aligned} \end{aligned}$$

1.5 Proof of Theorem 3

First, let us introduce some notations that are needed in the proof.

Define the log-likelihood \(L_n(T_K,u)\) associated with a tree \(T_K\) with K leaves \((\mathcal{T}_{\ell })_{\ell = 1,\ldots , K}\) and with parameters \(\varvec{\theta }(u)=\left( \varvec{\theta }^K_{\ell }(u)\right) _{\ell =1,\ldots ,K}\)

$$\begin{aligned} L_n(T_K,u) = \sum _{\ell = 1}^K L_n^{\ell }(\varvec{\theta }^K_\ell ,u) = \frac{1}{k_n}\sum _{\ell = 1}^K \sum _{i=1}^n \phi (Y_i-u,\varvec{\theta }^K_{\ell })\textbf{1}_{Y_i>u} \textbf{1}_{\textbf{X}_i\in \mathcal{T}_{\ell }} \, , \end{aligned}$$

and \(L(T_K,u) = \mathbb {E}[L_n(T_K,u)]\). Finally, for two trees T and \(T'\), \(\Delta L_n(T, T') = L_n(T,u) - L_n(T',u)\) and similarly, \(\Delta L(T, S) = L(T,u) - L(T',u)\).

The following lemma will be needed to prove Theorem 3.

Lemma 9

Let \(\mathfrak D = \inf _u\inf _{K < K^*} \Delta L(T^*,T^*_K)\) and \(u \in [u_{\textrm{min}},u_{\textrm{max}}]\) fixed. Suppose that there exists a constant \(c_2>0\) such that the penalization constant \(\lambda\) satisfies

$$\begin{aligned} c_2 \{\log k_n\}^{1/2} k_n^{-1/2} \le \lambda \le (\mathfrak {D} - 2c_2 \{\log (k_n)\}^{1/2} k_n^{-1/2})k_n^{-1}, \end{aligned}$$

then, under Assumptions 1 and 2, for \(K> K^*\),

$$\begin{aligned} \begin{aligned} \mathbb {P}(\widehat{K}=K)&\le 2\left( \exp \left( - \frac{ { C_1} k_n \lambda ^2(K-K^*)^2}{\beta ^2 (\log k_n)^{2}} \right) + \exp \left( -\frac{{ C_2} k_n \lambda (K-K^*))}{\beta \log k_n} \right) \right) \\&+\frac{{ C_3}}{k_n^{5/2} \lambda ^3(K-K^*)^3}, \end{aligned} \end{aligned}$$

and, for \(K<K^*,\)

$$\begin{aligned} \begin{aligned} \mathbb {P}(\widehat{K}=K)&\le 4\exp \left( - \frac{ C_1 k_n \{\mathfrak {D}-\lambda (K^*-K)\}^2}{\beta ^2 (\log k_n)^{2}} \right) \\&+4 \exp \left( -\frac{C_2 k_n \{\mathfrak {D}-\lambda (K^*-K)\}}{\beta \log k_n} \right) \\&+\frac{2C_3}{k_n^{5/2} \{\mathfrak {D}-\lambda (K^*-K)\}^3}. \end{aligned} \end{aligned}$$

Proof

Let \(u \in [u_{\textrm{min}},u_{\textrm{max}}]\) fixed. If \(\widehat{K}=K,\) this means that

$$\begin{aligned} \Delta L_n(T_K,T_{K^*}): = L_n(T_K,u)-L_n(T_{K^*},u)>\lambda (K-K^* ). \end{aligned}$$

Decompose

$$\begin{aligned} \begin{aligned} \Delta L_n(T_K,T_{K_0})&=\{L_n(T_K,u)-L_n(T^*_K,u)\}+\{L_n(T^*_K,u)-L_n(T^*,u)\}\\&+\{L_n(T^*,u)-L_n(T_{K^*},u)\}. \end{aligned} \end{aligned}$$

Since \(L_n(T^*,u)-L_n(T_{K^*},u)<0,\)

$$\begin{aligned} \Delta L_n(T_K,T_{K^*})\le \{L_n(T_K,u)-L_n(T^*_K,u)\}+\{L_n(T^*_K,u)-L_n(T^*,u)\}. \end{aligned}$$

For \(K > K^*,\) \(T^*_K=T^*,\) hence,

$$\begin{aligned} \begin{aligned} \mathbb {P}(\widehat{K}=K)&\le \mathbb {P}\left( \Delta L_n(T_K, T^*_K)>\lambda (K-K^*)\right) \\&\le \mathbb {P}\left( |\Delta L_n(T_K, T^*_K) - \Delta L( T_K, T^*_K)|>\lambda (K-K^*)\right) . \end{aligned} \end{aligned}$$

For \(K>K^*\), a bound is then obtained from Theorems 6 and 7 if \(\lambda (K-K^*) \ge {c_1} \{\log (k_n)\}^{1/2} k_n^{-1/2}\), that is \(\lambda \ge {c_1} \{\log k_n\}^{1/2} k_n^{-1/2}\).

Now, for \(K<K^*,\)

$$\begin{aligned} \begin{aligned} \Delta L_n(T^*_K,T^*)&\le |\Delta L_n(T^*_{K},T^*)-\Delta L(T^*_{K},T^*)|+\Delta L(T^*_{K},T^*)\\&\le |\Delta L_n(T^*,T^*_{K})-\Delta L(T^*,T^*_K)|- \mathfrak D(K^*,K). \end{aligned} \end{aligned}$$

where \(\mathfrak D = \inf _{K<K^*, u\in [u_{\textrm{min}}, u_{\textrm{max}}]} \mathfrak D(K^*,K),\) Hence,

$$\begin{aligned} \begin{aligned}&{\mathbb {P}(\widehat{K}=K)}\\&\le \mathbb {P}\left( \Delta L_n(T_K, T^*_K)\ge \frac{\mathfrak D- \lambda (K^*-K)}{2}\right) \\&+ \mathbb {P}\left( |\Delta L_n(T^*,T^*_K)-\Delta L(T^*,T^*_K)|\ge \frac{\mathfrak D - \lambda (K^*-K)}{2}\right) \\&\le \mathbb {P}\left( |\Delta L_n(T_K, T^*_K)-\Delta L( T_K, T^*_K)|\ge \frac{\mathfrak D - \lambda (K^*-K)}{2}\right) \\&+ \mathbb {P}\left( |\Delta L_n(T^*,T^*_K)-\Delta L(T^*,T^*_K)|\ge \frac{\mathfrak D - \lambda (K^*-K)}{2}\right) . \end{aligned} \end{aligned}$$

These two probabilities can be bounded using Theorems 6 and 7 provided that, for all \(K<K^*,\)

$$\begin{aligned} \frac{\mathfrak D - \lambda (K^*-K)}{2} \ge \mathfrak c_1 \{\log (k_n)\}^{1/2} k_n^{-1/2}, \end{aligned}$$

that is,

$$\begin{aligned} \lambda \le \mathfrak D - 2{\mathfrak c_1} \{\log (k_n)\}^{1/2} k_n^{-1/2} . \end{aligned}$$

We are now ready to prove Theorem 3. Let \(u \in [u_{\textrm{min}} , u_{\textrm{max}}]\) fixed.

$$\begin{aligned} \begin{aligned} \mathbb {E}\left[ \Vert \widehat{T}-T^*\Vert _{2}^2\right]&=\sum _{K=1}^{K_{\textrm{max}}}\mathbb {E}\left[ \Vert T_K-T^*\Vert _2^2\textbf{1}_{\widehat{K}=K}\right] \\&\le \mathbb {E}\left[ \Vert T_{K^*}-T^*\Vert _2^2\right] + \sum _{K=1, K\ne K^*}^{K_{\textrm{max}}}K \mathbb {P}(\widehat{K}=K) \\&+\sum _{K=1, K\ne K^*}^{K_{\textrm{max}}}\mathbb {E}\left[ \Vert T_K-T^*\Vert _2^2 \textbf{1}_{\Vert T_K-T^*\Vert _2^2> K}\textbf{1}_{\widehat{K}=K}\right] \\&\le \mathbb {E}\left[ \Vert T_{K^*}-T^*\Vert _2^2\right] + \sum _{K=1}^{K^*-1}K \mathbb {P}(\widehat{K}=K)\\&+ \sum _{K=K^*+1}^{K_{\textrm{max}}}K \mathbb {P}(\widehat{K}=K)\\&+2\sum _{K=1, K\ne K^*}^{K_{\textrm{max}}}\mathbb {E}\left[ \Vert T_K-T^*_K\Vert _2^2\textbf{1}_{\Vert T_K-T^*\Vert _2^2> K}\right] \\&+2\sum _{K=1, K\ne K^*}^{K_{\textrm{max}}}\mathbb {P}(\widehat{K}=K) \Vert T^*-T_K^*\Vert _2^2 . \end{aligned} \end{aligned}$$

Firstly, from Theorem 1,

$$\begin{aligned} \begin{aligned}&{\mathbb {E}\left[ \Vert T_K-T_K^*\Vert _2^2 \textbf{1}_{\Vert T_K-T^*\Vert _2^2> K}\right] }\\&= K \mathbb {P}\left( \Vert T_K-T_K^*\Vert _2^2> K\right) + \int _K^{\infty } \mathbb {P}\left( \Vert T_K-T_K^*\Vert _2^2 > t\right) \textrm{d}t\\&\le 2 K\left( 1+\frac{\beta ^2 (\log k_n)^2}{\mathcal{C}_1 k_n}\right) \exp \left( -\frac{\mathcal{C}_1 k_n }{ \beta ^2 (\log k_n)^2}\right) \\&+2K\left( 1+ \frac{2\beta (\log k_n)}{\mathcal{C}_2 k_n}+\frac{2\beta ^2 (\log k_n)^2}{\mathcal{C}_2^2 k_n^2}\right) \exp \left( -\frac{\mathcal{C}_2 k_n}{\beta (\log k_n)}\right) + \frac{2\mathcal{C}_3 K^{1/2}}{k_n^{5/2} }\, . \end{aligned} \end{aligned}$$

Secondly, recall that

$$\begin{aligned} \Vert T^*_K-T^*\Vert ^2_2 = \int \Vert \varvec{\theta }^{*K}(\textbf{x})-\varvec{\theta }^{*}(\textbf{x})\Vert ^2_{\infty }\textrm{d}P_{\textbf{X}}(\textbf{x}) \le K_{\textrm{max}} \sum _{\ell =1}^{K_{\textrm{max}}}\mu (\mathcal{T}_\ell )\Vert \varvec{\theta }^{*K}_\ell -\varvec{\theta }^{*}_\ell \Vert _\infty ^2 \, , \end{aligned}$$

where \(\mu (\mathcal{T}_\ell ) = \mathbb {P}(\textbf{X}\in \mathcal{T}_\ell )\). Following the same idea as in the proof of Proposition 2, from Taylor series , under Assumptions 2 and 3,

$$\begin{aligned} \Vert \varvec{\theta }^{*K}_{\ell } - \varvec{\theta }^*_\ell \Vert _\infty ^2 \le \mathfrak C^2_2(u)\frac{k_n^2}{n^2} \left( 1 + c\gamma _{\textrm{max}} \psi (u) + o(\psi (u)) \right) ^2 \,. \end{aligned}$$

Hence,

$$\begin{aligned} \begin{aligned} \Vert T^*_K-T^*\Vert _2^2&\le \mathfrak C^2_2(u)\frac{k_n^2}{n^2} (1 + c\gamma _{\textrm{max}} \psi (u) + o(\psi (u)))^2\sum _{\ell =1}^{K_{\textrm{max}}} \textbf{1}_{\textbf{X}\in \mathcal{T}_\ell }\\&\le \mathfrak C_3(u)\frac{k_n^2}{n^2} \,. \end{aligned} \end{aligned}$$

Finally,

$$\begin{aligned} \mathbb {E}\left[ \Vert \widehat{T}-T^*\Vert _{2}^2 \right] \le \frac{\mathcal{C}_5 K^* (\log k_n)^2 }{k_n}, \end{aligned}$$

for some constant \(\mathcal{C}_5.\) .

Appendix B: Covering numbers

Lemma 10

Following the notations of the proof of Theorem 6, the class of functions \(\mathfrak {F}\) satisfies

$$\begin{aligned} \mathcal{N}_{\Phi }(\varepsilon ,\mathfrak {F})\le \frac{\mathfrak C_4K^{4(d+1)(d+2)}\Vert \Phi \Vert _2^{\alpha _1}\sigma _{n}^{\alpha }}{\varepsilon ^{\alpha }}, \end{aligned}$$

for some constants \(\mathfrak C_4>0\) and \(\alpha >0\) (not depending on n nor K).

Proof

Let

$$\begin{aligned} \begin{aligned} g_{\varvec{\theta }}(z)&= -\frac{1}{\sigma }+\left( \frac{1}{\gamma }+1\right) \frac{\gamma z}{\sigma ^2(1+\frac{z\gamma }{\sigma })}, \\ h_{\varvec{\theta }}(z)&= -\frac{1}{\gamma ^2}\log \left( 1+\frac{z\gamma }{\sigma } \right) +\frac{\left( \frac{1}{\gamma }+1\right) z}{\sigma +z\gamma }, \end{aligned} \end{aligned}$$

for \(z>0.\) For \(\varvec{\theta }\) and \(\varvec{\theta }'\) in \(\mathcal{S}\times \Gamma ,\) we have (from a straightforward Taylor expansion),

$$\begin{aligned} |g_{\varvec{\theta }}(y-u) - g_{\varvec{\theta }'}(y-u)|\le C |\gamma - \gamma '|+ C'|\sigma -\sigma '|, \end{aligned}$$

for some constants C and \(C'.\) More precisely, one can take

$$\begin{aligned} \begin{aligned} C&= \frac{6}{\gamma _{\textrm{min}}^2\sigma _{\textrm{min}}},\\ C'&= \frac{1}{\sigma _{\textrm{min}}^2}\left( 1+3\left\{ 1+\frac{1}{\gamma _{\textrm{min}}}\right\} \right) . \end{aligned} \end{aligned}$$

Next, observe that

$$\begin{aligned} |g_{\varvec{\theta }'}(y-u) - g_{\varvec{\theta }'}(y-u')|\le C''|u-u'|, \end{aligned}$$

where \(C''=4\gamma ^2_{\textrm{max}}/[\gamma _{\textrm{min}}\sigma ^3].\) Which leads to

$$\begin{aligned} |g_{\varvec{\theta }}(y-u) - g_{\varvec{\theta }'}(y-u')|\le C_g \textrm{max}(\Vert \varvec{\theta }-\varvec{\theta }'\Vert _{\infty },|u-u'|), \end{aligned}$$

for some constant \(C_g>0.\) Similarly,

$$\begin{aligned} |h_{\varvec{\theta }}(y-u) - h_{\varvec{\theta }'}(y-u)|\le C_1(4+\log (1+wy))|\gamma - \gamma ' |+ C_2|\sigma -\sigma '|, \end{aligned}$$

Next,

$$\begin{aligned} |h_{\varvec{\theta }'}(y-u) - h_{\varvec{\theta }'}(y-u')|\le C_7 |u-u'|, \end{aligned}$$

where \(C_7=5/(\gamma _{\textrm{min}}\sigma _{\textrm{min}}),\) leading to, for some \(C_h>0,\)

$$\begin{aligned} |h_{\varvec{\theta }}(y-u) - h_{\varvec{\theta }'}(y-u')|\le C_h \textrm{max}(\Vert \varvec{\theta }-\varvec{\theta }'\Vert _{\infty },|u-u'|). \end{aligned}$$

On the other hand,

$$\begin{aligned} |\phi (y-u,\varvec{\theta })-\phi (y-u,\varvec{\theta }')|\le \frac{1}{\gamma _{\textrm{min}}^2}(2+\log (1+wy))|\gamma -\gamma '|+ \frac{3}{\gamma _{\textrm{min}}\sigma _{\textrm{min}}}|\sigma -\sigma '|, \end{aligned}$$

and

$$\begin{aligned} |\phi (y-u,\varvec{\theta }')-\phi (y-u',\varvec{\theta }')|\le \frac{1}{\sigma _{\textrm{min}}}|u-u'|. \end{aligned}$$

Define \(\mathfrak {F}_1=\{g_{\varvec{\theta }}(\cdot -u):\varvec{\theta } \in \mathcal{S}\times \Gamma , u\in [u_{\textrm{min}},u_{\textrm{max}}]\},\) \(\mathfrak {F}_2=\{h_{\varvec{\theta }}(\cdot -u):\varvec{\theta } \in \mathcal{S}\times \Gamma , u\in [u_{\textrm{min}},u_{\textrm{max}}]\},\) and \(\mathfrak {F}_3=\{\phi (\cdot -u,\varvec{\theta }):\varvec{\theta } \in \mathcal{S}\times \Gamma , u\in [u_{\textrm{min}},u_{\textrm{max}}]\}.\) From ((van der Vaart 1998), Example 19.7), we get, for \(i=1,...,3,\)

$$\begin{aligned} N(\varepsilon ,\mathfrak {F}_i)\le \varphi _i \Vert \Phi \Vert _2^{\alpha _1}\sigma _{n}^{\alpha _1}\varepsilon ^{-\alpha _1}, \end{aligned}$$

for some \(\alpha >0\) and constants \(\varphi _i.\)

On the other hand, let

$$\begin{aligned} \mathfrak {F}_4=\left\{ \textbf{x}\mapsto \textbf{1}_{\textbf{x}\in \mathcal{T}_\ell } :\ell =1,\ldots ,K \right\} , \end{aligned}$$

and

$$\begin{aligned} \mathfrak {F}_5=\left\{ y \mapsto \textbf{1}_{y>u} :u \in \mathcal{U} \right\} . \end{aligned}$$

From Lemma 4 in (Lopez et al. 2016), we have \(N(\varepsilon ,\mathfrak {F}_4)\le m^k K^{\alpha _2}\varepsilon ^{-\alpha _2},\) where \(\alpha _2=4(d+1)(d+2),\) and where k is the number of discrete components taking at most m modalities. On the other hand, from Example 19.6 in (van der Vaart 1998), \(N(\varepsilon ,\mathfrak {F}_5)\le 2\varepsilon ^{-2}.\)

From ((Einmahl and Mason 2005), Lemma A.1), we get, for \(i=1,\ldots ,3,\)

$$\begin{aligned} N(\varepsilon ,\mathfrak {F}_i\mathfrak {F}_4\mathfrak {F}_5)\le \frac{4 m^kK^{\alpha _2}\textrm{max}(C_g,C_h)\Vert \Phi \Vert _2^{\alpha _1}\sigma _{n}^{\alpha _1}}{\varepsilon ^{\alpha _1+\alpha _2+\alpha _3}}. \end{aligned}$$

Multiplying \(\mathfrak {F}_i\mathfrak {F}_4\mathfrak {F}_5\) by a single indicator function \(\textbf{1}_{\Phi (Y_i)\le M_n}\) does not change the covering number, and the result follows.

Appendix C: Technical Lemmas

Lemma 11

  1. 1.

    The derivatives of the functions \(y\rightarrow \phi (y-u,\varvec{\theta })\) with respect to \(\varvec{\theta }\) are uniformly bounded by

    $$\begin{aligned} \Phi (y)=C(1+\log (1+wy)), \end{aligned}$$

    where C is a constant (not depending on n), and \(w=\gamma _{\textrm{max}}/\sigma _{\textrm{min}}.\)

  2. 2.

    There exists a certain \(\rho _0>0\) such that

    $$\begin{aligned} m_{\rho _0} := \mathbb {E}\left[ \exp (\rho _0\Phi (Y))\right] <\infty . \end{aligned}$$

Proof

To proof point 1, it is sufficient to derive the GP likelihood and see that they can be upper-bounded by \(\Phi\).

Now, for point 2, note that for all \(\textbf{x}\), \(\gamma (\textbf{x}) \ge \gamma _{\textrm{min}} >0\), Y is heavy-tailed random variable, then \(\log (Y)\), and thus \(\Phi (Y)\), is a light-tailed random variable. Thus \(\Phi (Y)\) has finite exponential moments.

Lemma 12

With \(v_{\mathfrak {F}}\) defined in Proposition 4,

$$\begin{aligned} v_{\mathfrak {F}}\le \frac{M_n^2k_n}{n}. \end{aligned}$$

Proof

We have

$$\begin{aligned} \begin{aligned} v_{\mathfrak {F}}&\le \mathbb {E}\left[ \Phi (Y)^2 \textbf{1}_{Y\ge u_{\textrm{min}}}\textbf{1}_{\Phi (Y)\le M_n}\right] \\&\le M_n^2 \mathbb {P}(Y\ge u_{\textrm{min}})=\frac{M_n^2k_n}{n}. \end{aligned} \end{aligned}$$

Lemma 13

Define, for \(j = 1, 2, 3\),

$$\begin{aligned} E_{j,n}=\mathbb {E}\left[ \Phi (Y)^j \textbf{1}_{\Phi (Y)\ge M_n}\textbf{1}_{Y\ge u_{\textrm{min}}}\right] . \end{aligned}$$

Under the assumptions of Theorem 7,

$$\begin{aligned} E_{j,n}\le \frac{\mathfrak {e}_j k_n^{1/2}}{n^{1/2}n^{\rho _0\beta a_2/4}}. \end{aligned}$$

Proof

Applying twice Cauchy-Schwarz inequality leads to

$$\begin{aligned} E_{j,n}\le \mathbb {P}(Y\ge u_{\textrm{min}})^{1/2}\mathbb {E}[\Phi (Y)^{2j}\textbf{1}_{\Phi (Y)\ge M_n}]^{1/2}\le \frac{k_n^{1/2}}{n^{1/2}}\mathbb {E}[\Phi (Y)^{4j}]^{1/4}\mathbb {P}(\Phi (Y)\ge M_n)^{1/4}. \end{aligned}$$

Next, from Chernoff inequality,

$$\begin{aligned} \mathbb {P}(\Phi (Y)\ge M_n)\le \exp (-\rho _0 M_n)\mathbb {E}[\exp (\rho _0 \Phi (Y))]\le \frac{m_{\rho _0}}{n^{\rho _0\beta a_2}}. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farkas, S., Heranval, A., Lopez, O. et al. Generalized pareto regression trees for extreme event analysis. Extremes (2024). https://doi.org/10.1007/s10687-024-00485-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10687-024-00485-1

Keywords

MSC Classification

Navigation