1 Introduction

Statistical depth functions have gained importance in the last three decades. The best well known and most studied among them include the half space depth (Tukey 1975), the simplicial depth (Liu 1990, 1992), and the multivariate \(L^1\) depth.

Other well-known depths are the convex hull peeling depth (Barnett 1976), the Oja depth (Oja 1983), and the spherical depth (Elmore et al. 2006), among others. They have been extended to functional spaces, see (Fraiman and Muniz 2001; López-Pintado and Romo 2009; Claeskens et al. 2014; Cuevas and Fraiman 2009), to Riemannian manifolds, see (Fraiman et al. 2019), and also to general metric spaces, see (Cholaquidis et al. 2020). Several different applications of depths notions have been proposed, in particular for classification problems, by means of the depth-depth method (Vencálek 2017), or to functional data, see (Mosler and Mozharovskyi 2017). Most classical notions of depth, introduced initially on \({\mathbb {R}}^d\), cannot be directly extended to general metric spaces or even to functional spaces or manifolds. Some of them are computationally infeasible on high-dimensional spaces, such as Liu’s or Tukey’s depth, because the computational complexity is exponential in the dimension.

This is not the case of the lens depth, introduced in (Liu and Modarres 2011), whose computational complexity is of order \(n^2\), and, as we will see, it can be easily extended to general metric spaces. This makes the lens depth particularly suitable for estimating its level sets, by means of the level sets of its empirical version, based on an iid sample, which is one of the main goals of this manuscript.

The estimation of the level sets of a depth measure is of current interest, see (Brunel 2019). As indicated in Laketa and Nagy (2021), “a definition of depth-based central regions of the data, which are the regions where the depth exceeds given thresholds, ensues naturally”. Level set estimation of depths was initially studied in (Tukey 1975), as a key tool for the visualization and exploration of data. Other significant contributions can be found in (Koshevoy et al. 1997; Zuo and Serfling 2000; Serfling 2002; Dyckerhoff 2016). As it was pointed out in (Liu et al. 1999), “the shape and size of these levels, as well as the direction and speed at which they expand, provides insight into the dispersion, kurtosis, and asymmetry of the underlying distribution”. It also allows to extend the notion of quantiles to multivariate or functional data and can be used for outlier detection, see, for instance, (Febrero et al. 2008; Dai and Genton 2019), as well as for supervised classification, see (Ruts and Rousseeuw 1996; Hubert et al. 2017). The boundary of the level sets allows to extend the notion of quantile, see (Agostinelli 2018). As shown in the two real data sets we analyse, the shape of the level sets of the lens depth can help to detect outliers. This is of crucial importance, when dealing with vectorcardiograms, because outliers mean data from children with possibly cardiological problems. More precisely, during the electrical activity of the heart, myocardial cells depolarize and repolarize, producing current flows of different magnitudes and directions. The sum of all these current flows gives rise to a vector indicating the main direction of the flow. Thus, an atrial depolarization vector, three ventricular depolarization vectors and a ventricular repolarization vector are identified. In reality, these vectors do not occur independently of each other, but rather the current, once it begins to flow, originates a vector whose size and magnitude are modified at each moment. Vectorcardiography is the recording of the electrical activity of the heart through the originating vectors, which have the form of a closed loop.

An extension to Riemannian manifolds of the lens depth (called weighted lens depth) was recently introduced in (Cholaquidis et al. 2020) to tackle the supervised classification problem, and a point-wise a.s consistency result is obtained (see Theorem 2 in (Cholaquidis et al. 2020)). Here, we address the more challenging problem of level set estimation of the lens depth on general metric spaces, while in (Cholaquidis et al. 2020) weighted lens depth is introduced to deal with the density f together with the underlying Riemannian structure to perform supervised classification by means of the depth-depth-G method introduced in Cuesta-Albertos et al. (2017).

In our setting, the lens depth is defined as follows: let \(X_1,X_2\) be two random elements defined on a (rich enough) probability space \((\Omega , {\mathcal {A}},{\mathbb {P}})\), taking values in a complete separable metric space (Md) endowed with the Borel \(\sigma \)-algebra. Assume that they are independent and identically distributed. In what follows, the distribution of a random element X of M will be denoted by \(P_X\).

Given \(x_1, x_2 \in M\) define their “associate lens” by

$$\begin{aligned} A(x_1, x_2):= B(x_1, d(x_1,x_2)) \cap B(x_2, d(x_1,x_2)), \end{aligned}$$

where B(pr) is the closed ball centred at p with radius \(r>0\). The lens depth of a point \(x\in M\) is defined by \(\text {LD}(x)={\mathbb {P}}(x\in A(X_1,X_2))\). Given an iid sample \({\mathcal {X}}_n= \{X_1, \ldots , X_n\}\) from a distribution \(P_X\), the empirical version of \(\text {LD}\) is given by the U-statistics of order two,

$$\begin{aligned} \widehat{\text {LD}}_n(x)=\left( {\begin{array}{c}n\\ 2\end{array}}\right) ^{-1} \sum _{1 \le i_1 < i_2 \le n} \mathbbm {1}_{A(X_{i_1}, X_{i_2})}(x). \end{aligned}$$
(1)

To gain some insight into the shape of \(\{\widehat{\text {LD}}_n \ge \lambda \}\), see Fig. .

Fig. 1
figure 1

Level sets of the lens depth based on a sample of 15 iid random vectors, distributed as a bivariate standard normal distribution. The intensity of blue represents the depth

The level set estimation problem is required to define a topology in which the level set estimators will converge. In Molchanov (1998) almost sure consistency is obtained, when the level sets are intersected with compact sets, and the topology inherited from Hausdorff distance is considered. In Cuevas et al. (2006), they are assumed to be compact.

To ask compactness is not too restrictive for data in the classical finite-dimensional Euclidean space. However, this is not the case for infinite-dimensional spaces like in the functional data setting. To overcome this problem, other weaker topologies were considered, for instance, Mosco or Wijsman topology, see Terán (2016).

Following this idea, we prove first that the empirical version of the level sets of the lens depth is almost surely (a.s.) consistent estimators of their population counterpart in the Painlevé–Kuratowski topology.

The general approach we use to obtain a.s. consistency results for the plug-in estimator \(\{\widehat{\text {LD}}_n \ge \lambda \}\) to its population counterpart \(\{\text {LD}\ge \lambda \}\) is to prove the uniform convergence of \(\widehat{\text {LD}}_n\) to \(\text {LD}\) on the whole space for separable and complete metric spaces. This has interest in itself and extends the results in Liu and Modarres (2011) for the finite-dimensional Euclidean space.

Next, we prove that the empirical version of the level sets of the lens depth converges a.s. in Hausdorff distance, when they are met with any compact set.

This paper is organized as follows. Section 2 introduces the notation, some previous definitions, and the notion of Painlevé–Kuratowski convergence. In Sect. 3, we state the a.s. convergence of level sets in such topology. The a.s. uniform consistency of \( \widehat{\text {LD}}_n\) is stated in Sect. 4. Section 5 addresses the problem of a.s. convergence of level sets in the Hausdorff distance, as well as the a.s. convergence in measure, together with the a.s. consistency of their boundaries. All proofs are given in Appendix.

Lastly, in Sect. 6 we tackle the study of two interesting real data sets, the vectorcardiogram data set (see Sect. 6.1), and the influenza data set (see Sect. 6.2).

2 Preliminaries

In this section, we will introduce the notation and necessary definitions used throughout this paper. Given a metric space (Md), which will be assumed to be separable and complete, we denote by B(xr) the closed ball centred at x with radius \(r>0\). The boundary of a set \(A\subset M\) is denoted by \(\partial A\). Given \(x\in A^c\), we denote \(d(x,A)=\inf _{a\in A} d(x,a)\) and \(\text {diam}(A)=\sup _{x,y\in A}d(x,y)\). Given two non-empty closed sets AB, the Hausdorff distance between them is defined as

$$\begin{aligned} d_H(A,B) \mathrel {\mathop :}=\max \Big \{\max _{a\in A}d(a,B), \ \max _{b\in B}d(b,A)\Big \}, \end{aligned}$$
(2)

where \(d_H(\emptyset ,B)= + \infty \) with \(B \ne \emptyset \) and \(d_H(\emptyset ,\emptyset )= 0\).

Given a finite Borel measure \(\nu \) and two measurable sets A and B, the distance in measure between them is defined as \(d_\nu (A,B)=\nu \)(A\(\setminus \)B) + \(\nu \)(C\(\setminus \)B).

Given a function \(f: M \rightarrow {\mathbb {R}}\) and \(\lambda \in {\mathbb {R}}\), we denote by \(\{f\ge \lambda \}\) the \(\lambda \)-level set \(\{x\in M: f(x)\ge \lambda \}\).

2.1 Painlevé–Kuratowski convergence

We will prove first the convergence of the estimated level sets to its population counterpart in the Painlevé–Kuratowski sense. Given a sequence of sets \(A_n\subset M\), the Painlevé–Kuratowski limit inferior of \(A_n\) as \(n\rightarrow \infty \) is

$$\begin{aligned} \underset{n\rightarrow \infty }{\text {Li}} A_n=\Big \{x\in M: \underset{n\rightarrow \infty }{\text {limsup}} \ \ d(x,A_n)=0\Big \}. \end{aligned}$$

The Painlevé–Kuratowski limit superior of \(A_n\) as \(n\rightarrow \infty \) is

$$\begin{aligned} \underset{n\rightarrow \infty }{\text {Ls}} A_n=\Big \{x\in M: \underset{n\rightarrow \infty }{\text {liminf}} \ \ d(x,A_n)=0\Big \}. \end{aligned}$$

In general \(\underset{n\rightarrow \infty }{\text {Li}} A_n\subset \underset{n\rightarrow \infty }{\text {Ls}} A_n\). When they agree, the common set is called the Painlevé–Kuratowski limit. It is immediate to prove that if \(d_H(A_n,A)\rightarrow 0\), then \(A_n\rightarrow A\) in the Painlevé–Kuratowski sense. The converse implication is not true in general, unless M is totally bounded, see Corollary 5.1.11 and page 147 in Beer (1993), since we are assuming that M is complete.

3 Painlevé–Kuratowski convergence of level set estimators

We will prove that the limit in the Painlevé–Kuratowski sense of the sequence of sets \(\{\widehat{\text {LD}}_n\ge \lambda \}\), is the set \(\{\text {LD}\ge \lambda \}\). This will follow from the uniform convergence of the empirical version of the lens depth given in the next section and the fact that \(\text {LD}(x)\) is a continuous function of x, which follows from Proposition 3 in (Cholaquidis et al. 2020).

Theorem 1

Let (Md) be a complete separable metric space and \(P_X\) a Borel measure on M. Assume that \(P_X(\partial B(x,\delta ))=0\) for all \(x\in M\) and \(\delta >0\). Assume also that \(\lambda \) is such that if \(\text {LD}(x)=\lambda \), there exist \(u_n\rightarrow x\) such that \(\text {LD}(u_n)>\lambda \). Then, with probability one, for n large enough,

$$\begin{aligned} \{\text {LD}\ge \lambda \} = \underset{n\rightarrow \infty }{\text {Li}} \{\widehat{\text {LD}}_n\ge \lambda \} = \underset{n\rightarrow \infty }{\text {Ls}} \{\widehat{\text {LD}}_n\ge \lambda \}. \end{aligned}$$

Remark 1

Observe that if \(P_X(\partial B(x,\delta ))\ne 0\), lens depth may be pathological in infinite-dimensional spaces. Indeed, suppose that the distribution \(P_X\) is a discrete distribution supported on an orthogonal basis of \(L^2\) on the unit sphere. Then, all points in the support of the distribution have lens depth one. Regarding the hypothesis on the \(\lambda \)-level set, the following proposition states that the set of \(\lambda \) for which this condition does not holds is at most countable.

Proposition 1

Let (Md) be a separable metric space and \(f:M\rightarrow {\mathbb {R}}\). Let us denote

$$\begin{aligned} {\mathcal {A}}=\{\lambda : f^{-1}(\lambda ) \text { contains at least a local maximum}\}. \end{aligned}$$

Then, \({\mathcal {A}}\) is countable.

4 Uniform consistency of \(\widehat{\text {LD}}_n\)

A key argument to prove the consistency of level sets in the Painlevé–Kuratowski topology (as well as in the Hausdorff distance) is to prove that \(\widehat{\text {LD}}_n\) converges uniformly to \(\text {LD}\) a.s., which is the main goal of this section.

For that purpose, we will use the following version of Theorem 1 in (Billingsley and Topsøe 1967).

Theorem 2

(Billingsley and Topsøe (1967)) Let \({\mathcal {B}}(M\times M)\) be the class of all real valued, bounded, measurable functions defined on the metric space \((M\times M,\rho )\), where \(\rho (z,y)=\max \{d(z_1,y_1),d(z_2,y_2)\}\). Suppose \({\mathcal {F}}\subset {\mathcal {B}}(M\times M)\) is a subclass of functions. Then,

$$\begin{aligned} \sup _{f\in {\mathcal {F}}} \Big | \int fdP_n-\int fdP\Big |\rightarrow 0, \end{aligned}$$

for every sequence \(P_n\) that converges weakly to P if, and only if,

$$\begin{aligned} \sup \{|f(z)-f(t)|:f\in {\mathcal {F}},z=(z_1,z_2),t=(t_1,t_2)\in M\times M\}< \infty , \end{aligned}$$
(3)

and for all \(\epsilon >0\),

$$\begin{aligned} \lim _{\delta \rightarrow 0} \sup _{f\in {\mathcal {F}}} P\Big [\{y=(y_1,y_2): \omega _f\{B_\rho (y,\delta )\}\ge \epsilon \}\Big ]=0, \end{aligned}$$
(4)

where \(\omega _f(A)=\sup \{|f(z)-f(t)|:z,t\in A\}\) and \(B_\rho (y,\delta )\) is the open ball in the metric space \((M\times M,\rho )\) of radius \(\delta >0\).

Theorem 3

Let (Md) be a complete separable metric space and \(P_X\) a Borel measure on M. Assume that \(P_X(\partial B(x,\delta ))=0\) for all \(x\in M\) and \(\delta >0\). Then,

$$\begin{aligned} \sup _{x} |\widehat{\text {LD}}_n(x)-\text {LD}(x)|\rightarrow 0\quad a.s., \text { as } n\rightarrow \infty . \end{aligned}$$

The previous result is proven for \(M={\mathbb {R}}^d\) in (Liu and Modarres 2011), endowed with the Euclidean norm.

5 Level set estimation in Hausdorff metric

To obtain the consistency in Hausdorff distance of the level set estimator, Theorem 2.1 in (Molchanov 1998) will play a key role, together with the uniform convergence of the previous section. We will make use of the following slightly restricted version.

Theorem 4

(Molchanov (1998)) Let \(f_n,f:M\rightarrow {\mathbb {R}}\) be continuous functions. Assume that for each compact set \(K_0\), \(\sup _{x\in K_0}|f_n(x)-f(x)|\rightarrow 0\). Assume that for all \(\lambda \in [c_1,c_2]\), \(\{f\ge \lambda \}\subset \overline{\{f>\lambda \}}\). Then,

$$\begin{aligned} \sup _{c_1\le \lambda \le c_2} d_H\big (\{f_n\ge \lambda \}\cap K_0,\{f\ge \lambda \}\cap K_0\big )\rightarrow 0. \end{aligned}$$

Given positive numbers \(c_1<c_2\), assume that for all \(\lambda \in [c_1,c_2]\), we have \(\{f\ge \lambda \}\subset \overline{\{f>\lambda \}}\). From Theorem 2.1 in (Molchanov 1998) together with Theorem 3, and the fact that \(\text {LD}\) is a continuous function, we get

$$\begin{aligned} \sup _{c_1\le \lambda \le c_2} d_H\big (\{\widehat{\text {LD}}_n\ge \lambda \}\cap K,\{\text {LD}\ge \lambda \}\cap K\big )\rightarrow 0\quad a.s., \text { as } \quad n\rightarrow \infty \end{aligned}$$

for any compact set K. If \(\nu (\{\text {LD}=\lambda \})=0\), it follows easily from Theorem 3 that for all compact K,

$$\begin{aligned} d_\nu \big (\{\widehat{\text {LD}}_n\ge \lambda \}\cap K,\{\text {LD}\ge \lambda \}\cap K\big )\rightarrow 0\quad a.s., \text { as } \quad n\rightarrow \infty . \end{aligned}$$

being \(d_\nu \) any finite Borel measure.

The convergence of the boundaries of the level sets is in general more involved, to that aim we will use the following result.

Theorem 5

(Cuevas et al. (2006)) Given a continuous function \(f:M\rightarrow {\mathbb {R}}\), let \(f_n=f_n(\omega ,\cdot )\), with \(\omega \in \Omega \), a sequence of functions \(f_n:M \rightarrow {\mathbb {R}}\), \(n=1,2,\dots ,\). Assume that for each n, \(f_n\) is continuous with probability one. Assume that the following assumptions are fulfilled.

\((h_1)\):

M is locally connected.

\((h_2)\):

For all \(x\in \partial \{f\ge \lambda \}\), there exist sequences \(u_n,l_n\rightarrow x\) such that \(f(u_n)>\lambda \) and \(f(l_n)< \lambda \).

\((f_1)\):

\(\partial \{ f \ge \lambda \}\ne \emptyset \). Moreover, there exists \(\lambda ^{-}< \lambda \) such that the set \(\{ f \ge \lambda ^{-} \}\) is compact.

If \(\sup _{ x \in M} \big \vert f(x) - f_n(x) \big \vert \rightarrow 0\) a.s., then

$$\begin{aligned} d_H \big (\partial \{f\ge \lambda \}, \partial \{f_n\ge \lambda \} \big ) \rightarrow 0, \quad a.s., \text { as } n \rightarrow \infty . \end{aligned}$$

The following theorem states that \(\partial \{\widehat{\text {LD}}_n\ge \lambda \} \cap K\) is a consistent estimator of \(\partial \{\text {LD}\ge \lambda \}\cap K\). The proof follows the same lines used to prove Theorem 5. However, we cannot apply that theorem directly because \(\widehat{\text {LD}}_n(x)\) is not a continuous function, since the range of \(\widehat{\text {LD}}_n(x)\) is contained in the set \(\{k\left( {\begin{array}{c}n\\ 2\end{array}}\right) ^{-1}:k=0,\dots ,\left( {\begin{array}{c}n\\ 2\end{array}}\right) \}\).

Theorem 6

Let \(\lambda >0\) be such that \(\{ \text {LD}\ge \lambda \}\ne \emptyset \). Under the assumptions of Theorem 3, together with hypotheses h1 and h2 of Theorem 5, we have that

$$\begin{aligned} \lim _{n\rightarrow \infty } d_H\big (\partial \{\widehat{\text {LD}}_n\ge \lambda \} \cap K,\partial \{\text {LD} \ge \lambda \}\cap K\big )=0\quad a.s., \end{aligned}$$

for all compact sets K.

6 An application to the study of two sets of real data

In what follows, we analyse two interesting real data sets: the vectorcardiogram (VCG) and the influenza data sets.

6.1 The vectorcardiogram data set

Until the half of twentieth century, electrocardiograms were the non-invasive methodology used to detect heart diseases. In Frank (1956), an alternative and complementary methodology is introduced, called vectorcardiography (VCG). In many cases, electrocardiogram (ECG) records are not sufficient and complementary studies are necessary for the detection of cardiac failure, such as myocardial infarction, see Hafshejani et al. (2021). In this respect, the VCG can be of great help, but its interpretation is more difficult than an ECG in general. As mentioned in Pastore et al. (2019) “from the year 2000 until the present decade an increasing number of publications related to electrovectorcardiography has been observed. In this context, and mainly regarding the definition of arrhythmogenic substrates, it was observed that the association of the ECG and the vectorcardiogram (VCG) methods could provide much more information about the cardiac electrical phenomena, thus increasing its employment and allowing us to differentiate potentially fatal cases from benign ones". Therefore, an automatic procedure that detects possible anomalous VCGs may be useful for the medical community.

The vectorcardiography is a method that produces a three-dimensional curve which comprises the records of the magnitude and direction of the electrical forces generated by the heart over time. These curves are called QRS loops, see Fig. . From the location in the body of the electrodes (IECAMHF) (see Fig. 2), it is possible to obtain electric vectorial signals of the heart in an orthogonal system (xyz), called Frank’s derivatives

$$\begin{aligned} \left\{ \begin{array}{ll} &{} x = 0.610 A + 0.171 C - 0.781 I\\ &{} y = 0.655 A + 0.345 C - 1.000 I\\ &{} z = 0.133 A + 0.736 M - 0.264 I- 0.374 E - 0.231 C. \end{array}\right. \end{aligned}$$
Fig. 2
figure 2

a Location of the electrodes to obtain Frank’s derivatives, in an orthogonal coordinate system (xyz). b QRS curve loop. The plane in blue is the best two-dimensional approximation of the curve by least squares

Our aim is to detect outliers on a real data set of vectorcardiograms of children, corresponding to children with (possibly) cardiological problems, by means of the shape and size of the level sets of lens depth. We consider a real-life data set that consists of 98 vectorcardiograms from children with ages varying between 2 and 19, where the data belong to the Stiefel manifold SO(3, 2) of all orthonormal 2-frames in \({\mathbb {R}}^3\) considered as \(3\times 2\) orthogonal matrices (see (Hatcher 2002)).

In (Downs and Liebman 1969), there is associated with each curve an element of SO(3, 2) that represents some of the information of the curve, see also (Downs 1972), where they describe the phenomena as “... given the matrix \(X=(X_1,X_2) \in \textrm{SO}(3,2)\), the unitary vector \(X_1 \in {\mathbb {R}}^3\) is the direction from the vertex to the apex of the so-called QRS loop, which is the most important loop among the three virtually planar loops contained in the vectorcardiogram (lies in the plane of the loop). And the vector unitary \(X_2 \in {\mathbb {R}}^3\) is orthogonal to \(X_1\) and to the plane of the loop, where the sense of \(X_2\) is determined by the direction of motion of the moving point generating the loop”.

This sample has been previously analysed in the literature, see, for instance, (Chikuse 2012; Chakraborty and Vemuri 2019; Pal et al. 2019).

Figure   represents each matrix in SO(3, 2) by two points in \(S^2\), one for each column. They are joined by an arc in \(S^2\). The arc joining the deepest pair of observations (w.r.t. lens depth) is represented in violet, while the outliers (for a level \(\lambda =0.10\)) are represented as red arcs.

Fig. 3
figure 3

Data visualization of the data from SO(3, 2). The violet arc represents the deepest observation according to lens depth. The outliers represented by red arcs correspond to the data outside the \(\lambda =0.10\) level set of the depth

6.2 Influenza data set

A problem of great importance, and of current interest in the scientific community, is the early detection of new variants of a virus in a given community. One method to address this problem is to analyse the temporal dynamics of the variability of the virus genetic chain. Roughly speaking, an increase in gene-chain diversity over a period of time may indicate diversification of the virus into new variants.

This data set consists of longitudinal data of the influenza virus, belonging to the family Orthomyxoviridae. An important problem analysed in (Cholaquidis et al. 2020) is to model the genomic evolution of the virus, see (Smith et al. 2004). In this paper, we focus on another important issue: modelling the temporal variability of the virus by means of the lens depth, and use this to predict and anticipate a possible pandemic. The influenza virus has an RNA genomic which is very common: it produces diseases like yellow fever and hepatitis and annually costs half a million deaths worldwide. It is well known that these viruses change their genetic pattern over time, which is vital for developing a possible vaccine. We will study the H3N2 variant of the virus, in particular, the subtype haemagglutinin (HA), which produced the SARS pandemic in 2002. This variant is known to have a variability in its genetic arrangement over time, see (Altman 2006; Monod et al. 2018).

The data set can be found in GI-SAID \(\text {EpiFlu}^{TM}\) database,Footnote 1 providing 1089 genomic sequences of H3N1 from 1993 to 2017 in New York, aligned using MUSCULE, see (Edgar 2004). We used reduced trees of 5 leaves, as in (Monod et al. 2018), to capture the structure of the data. This set of trees can be endowed with a distance, see (Billera et al. 2001). The database used was obtained from the GitHub repository https://github.com/antheamonod/FluPCA.

For this kind of data, constructing measures of centrality and variability, as well as confidence regions, is a problem that has been previously addressed in the literature, see, for instance, (Barden et al. 2018; Brown and Owen 2020; Willis 2019) and references therein.

For each year we computed the empirical lens depth of the trees, considered on the manifold of phylogenetic trees. We estimated the diameter of the level sets from the sample points that belong to the level sets, see Fig. . As can be seen, there is a larger dispersion in the years prior to the pandemic of 2002 compared to later years. Early detection of this increased variability may have an impact on health policy decision-making, with the aim of preventing the proliferation of the virus variants.

Fig. 4
figure 4

Diameters of the lens depth level sets from the sample of trees for the different years, with respect to \(\lambda \)