Statistical depth in abstract metric spaces

The concept of depth has proved very important for multivariate and functional data analysis, as it essentially acts as a surrogate for the notion of ranking of observations which is absent in more than one dimension. Motivated by the rapid development of technology, in particular the advent of ‘Big Data’, we extend here that concept to general metric spaces, propose a natural depth measure and explore its properties as a statistical depth function. Working in a general metric space allows the depth to be tailored to the data at hand and to the ultimate goal of the analysis, a very desirable property given the polymorphic nature of modern data sets. This flexibility is thoroughly illustrated by several real data analyses.


Introduction
Huge parts of statistical theory, especially its nonparametric side, heavily rely on the notion of ranks, see for instance Gibbons and Chakraborti (2010).However, ranks are not well defined in a multivariate framework as there exists no natural ordering in more than one dimension.This fact motivated Tukey (1975) to introduce the notion of statistical depth as a surrogate for 'multivariate ranks'.Concretely, a depth is a measure of how central (or how outlying) a given point is with respect to a multivariate probability distribution.Zuo and Serfling (2000), following some earlier considerations in Liu (1990), formulated the properties that a valid depth measure should satisfy.Since then, depth-based procedures have proved very important tools for robust multivariate statistical analyses, e.g.see Liu et al (1999) and Li andLiu (2004, 2008).Serfling (2006) and Mosler (2013) offer excellent short reviews of the ideas surrounding the concept of depth, while Hallin et al (2021) recently shed new light on the problem of 'multivariate ranks'.
The early 21st century has also seen such technological progress in recording devices and memory capacity, that any spatio-temporal phenomenon can now be recorded essentially in continuous time or space, giving rise to 'functional' random objects.As a result, a solid theory for Functional Data Analysis (FDA) has been developed as well, allowing the extension of most of the classical problems of statistical inference from the multivariate context to the inherently infinite-dimensional functional case.In particular, functional versions of statistical depth have been investigated (Fraiman and Muniz, 2001, Cuevas et al, 2007, López-Pintado and Romo, 2009, Dutta et al, 2011, López-Pintado and Romo, 2011, Sguera et al, 2013, Chakraborty and Chaudhuri, 2014, Hlubinka et al, 2015, Nieto-Reyes and Battey, 2021, Nieto-Reyes et al, 2021).It is worth noting that an infinite-dimensional environment implies specific theoretical and practical challenges, making the extension from 'multivariate' to 'functional' a non-trivial one (Nieto-Reyes and Battey, 2016).
In this paper, we carry on with this gradual extension process by defining the statistical depth for complex random objects living in abstract metric spaces.Again, this extension is motivated by the rapid development of technology.Indeed, this is the 'Big Data' era, in which digital data is recorded everywhere, all the time.The information that this huge amount of data contain may enable next-generation scientific breakthroughs, drive business forward or hold governments accountable.However, this is conditional on the existence of a statistical toolbox suitable for such Big Data, the profusion and nature of which inducing commensurate challenges.Indeed those data consist of objects as various as high-dimensional/infinite-dimensional vectors, matrices or functions representing images, shapes, movies, texts, handwriting or speech (to cite a few); and live streaming series thereof -this is often summarised as '3V' (Volume, Variety and Velocity).
Mainstream statistical techniques often fall short for analysing such complex mathematical objects.Yet, it remains true that any statistical analysis requires a sense of how close two instances of the object of interest are to one another.It is then only natural to assume that they live in a space where distances can be defined -that is, in a certain metric space (Snášel et al, 2017).This motivates the need for a statistical depth defined in an abstract metric space; hence our proposal of a 'metric depth'.
The idea that the concept of multivariate statistical depth could be extended to general non-Euclidean settings can be traced back to Carrizosa (1996, Section 3.1).Later, Li et al (2011) were considering a depth-based procedure for analysing abundance data, which are typically high-dimensional discrete data with many observed 0's.Because of that particular structure, the classical Euclidean distance is not optimal for quantifying (dis)similarities between observations, and analysts in the field usually prefer more specific metrics such as the Bray-Curtis distance1 (Bray and Curtis, 1957).In consequence, inspired by earlier works by Maa et al (1996) and Bartoszynski et al (1997), Li et al (2011) devised a depth measure which could allow the proximity between observations to be quantified by a specific, user-chosen distance/dissimilarity measure.This flexibility appears even more desirable when dealing with the polymorphous objects commonly found in modern data sets, as described above.For instance, functional objects are much richer than just infinite-dimensional vectors, and they can be compared on many different grounds: general appearance, short-or long-range variation, oscillating behaviour, etc.; which makes the choice of the 'proximity measure' between two such objects a very crucial one (Ferraty and Vieu, 2006, Chapter 3).On a more theoretical basis, an appropriate choice of such 'proximity measure' sometimes allows one to get around issues caused by the 'Curse of Dimensionality' (Geenens, 2011a).
Quantifying (dis)similarities between non-numeric objects is even more subject to discretionary choice.As an example, for comparing pieces of texts, the literature in text mining, linguistics and natural language processing proposed numerous metrics such as the Levenshtein distance, the Hamming distance, the Jaccard index or the Dice coefficient -each targetting different dimensions of words, sentences or texts, such as similarity in spelling or similarity in meaning (Wang and Dong, 2020).It is, therefore, paramount to have access to statistical procedures which allow a free choice of metric, and may be tailored to the kind of data at hand and to the ultimate purpose of the analysis.Indeed, our proposed 'metric depth' (µD), defined in Section 2, enables such flexible analyses.Its main properties are explored in Section 3 and an empirical version (computable from a sample) is described in Section 4. Section 5 illustrates its capabilities on several real data sets, including an application in 'text mining' (Section 5.5).Section 6 concludes.

Statistical depth in metric spaces: definition
Assume that the random object of interest, say X , lives in a certain space M which can be equipped with a distance d.To avoid dispensable technical complications, it will be assumed throughout that (M, d) is a complete and separable metric space.Let A be the σ-algebra on M generated by the open d-metric balls and P be the space of all probability measures defined on the Borel sets of A. 2 This makes (M, A, P ) a proper probability space for any P ∈ P. In particular, it will be assumed that the distribution of X belongs to P. Note that the cartesian product space (M × M, A × A, P × P ) is then also a valid probability space (Parthasarathy, 1967, Theorem I.1.10).We denote: for any measurable statement S : M × M → {0, 1} -the statement returns the value 1 if it is true, and 0 otherwise.So, P(S(X 1 , X 2 )) returns the probability that S is true if X 1 , X 2 are two independent replications of X , whose distribution is P .
Then we give the following definition: Definition 2.1.The 'metric depth' ('µD') of the point χ in the metric space (M, d) with respect to the probability measure P ∈ P is defined as: belongs to the σ-algebra A × A, with A defined above, making the probability statement P in (2.1) a well-defined one for any P ∈ P.
The interpretation of (2.1) in terms of depth is clear: a point χ ∈ M is deep with respect to the distribution P if it is likely to find it 'between' two objects X 1 and X 2 in M randomly generated from P .'Between' here means that the side joining X 1 and X 2 is the longest in a 'triangle' of M with vertices X 1 , X 2 and χ, or, in other words, that χ belongs to , where ) is the ball with center X 1 and radius d(X 1 , X 2 ).In this sense, (2.1) is an extension of the vectorial 'lens depth' (Liu and Modarres, 2011).If we define the 'lens' defined by X 1 and X 2 in (M, d), then µD(χ, P ) = P(L d (X 1 , X 2 ) χ).This is the probability that a random set contains a certain element χ, and interesting parallels can be drawn with the theory of random sets, in particular Choquet capacities and related ideas (Molchanov, 2005, Chapter 1).Note that, independently of this work, Cholaquidis et al (2020) recently explored the extension of the 'lens depth' to general metric spaces as well.Their focus and the content of their paper are, however, much different to what is investigated here.

Main properties
The fact that the distance d is left free really makes the metric depth µD a very flexible tool, as any meaningful d equipping M can be used in (2.1) without altering the theoretical properties which we explore below.
In addition, we note that no-where in the developments, it is used explicitly the fact that d(χ, ξ) = 0 ⇐⇒ χ = ξ for any two χ, ξ ∈ M (identity of indiscernibles).A proximity measure which satisfies all the properties of a distance (non-negativity, symmetry and triangle inequality) but not 'identity of indiscernibles' is called a pseudo-distance.Hence, the metric depth (2.1) can be used in conjunction with a pseudo-distance, while keeping its essential features.We can, for instance, assess the proximity between two objects by comparing the coefficients of their leading terms when expanded in certain bases, such as a spline basis in the case of functional data when smoothing the original data is necessary (Ramsay and Silverman, 2005, Chapter 3).Other examples are given in Section 5.

Elasticity invariance
(P 1 ) Let ϕ : M → M be an 'elastic' map in the sense that for any χ, ξ, χ , ξ ∈ M, µD(χ, P ), where P ϕ is the push-forward distribution of the image through ϕ of a random object of M having distribution P .
This follows from the fact that d(ϕ )} for such a map ϕ.These obviously include any isometry, such that d(χ, ξ) = d(ϕ(χ), ϕ(ξ)), or other dilation-type transformations such that d(χ, ξ) = a ϕ d(ϕ(χ), ϕ(ξ)), for some positive scalar constant a ϕ , but not only.Clearly, (P 1 ) establishes µD as a purely topological concept.On another note, (P 1 ) may be thought of as an extension of property P1 in Zuo and Serfling (2000, p. 463) -that a depth measure in R d 'should not depend on the underlying coordinate system or, in particular, on the scales of the underlying measurements'.
This kind of continuity condition guarantees that, with probability 1, a given χ ∈ M will not lie exactly on the boundary of a random lens such as (2.2).Then, we can prove the following properties (P 3 ) and (P 4 ).

Further comments
Zuo and Serfling (2000) listed two more desirable properties for a depth measure on R d : 'Maximality at centre' and 'Monotonicity relative to deepest point' (their properties P2 and P3).Similar features are difficult to investigate here for µD without giving a stronger structure to (M, d), such as some sort of convexity, or d to satisfy a parallelogram inequality, for example.As illustration, Zuo and Serfling (2000)'s P2 'Maximality at centre' requires the depth to be maximum at a uniquely defined 'centre' with respect to some notion of symmetry.Without assuming a stronger structure (M, d), even the very definition of symmetry in M is unclear.As our aim here is to stay as flexible as possible with the proposed metric depth, we do not investigate further in that direction.Those properties of µD may (or may not) be established on specific applications when M and d are precisely defined, though.
On a side note, even if Liu and Modarres (2011)  A last important point is the following.Suppose that the balls Then it can easily be checked that, for any non-degenerate distribution P ∈ P (i.e., not a unit point mass at some χ ∈ M), µD cannot be degenerated in the sense that µD(χ, P ) ≡ 0 for all χ ∈ M. Indeed, by convexity, the intersection is non-empty as soon as X 1 = X 2 , so, there always exists some χ ∈ M which gets a positive depth by (2.1).It is known that some instances of statistical depth admit such a degenerate behaviour.For instance, that is the case of López-Pintado and Romo (2009,2011)'s band and half-region depths for a wide class of distributions on common functional spaces (Chakraborty and Chaudhuri, 2014, Theorems 3 and 4).

Empirical metric depth
Assume now that we have a random sample of realisations {χ i ; i = 1, . . ., n} of the object Then the depth of some point χ ∈ M with respect to P must actually be estimated.The empirical analogue of (2.1) is naturally µD(χ, P n ), where P n is the empirical measure of the sample, i.e., the collection of 1/n-weighted point masses at the observed χ 1 , . . ., χ n .This yields Obviously, P n P -a.s.
−→ P , which guarantees under (3.1) the strong pointwise consistency of the estimator D(χ, P n ), that is for all χ ∈ M.This easily follows from Property (P 4 ).
Remark 4.1.Universal uniform strong consistency of µD on every subset of M that is equicontinuous with respect to d: a desirable and much stronger result that (4.2) is the universal uniform strong consistency of the depth measure µD on a subset Φ of M. This is defined as for any P ∈ P. Note that, from (2.2), (4.1) can also be written , where the distance between χ and a subset A of M is defined as inf ξ∈A d(χ, ξ).The choice w(z) = 1I {z=0} reduces down to the original expression, while a continuous w produces a smoothed version of it.Gijbels and Nagy (2015) showed that indeed, for continuous w as above, the 'adjusted' version of the band depth is universally strongly consistent on every equicontinuous subset Φ ⊂ M. We note that this result was derived in Gijbels and Nagy (2015) for the case of the band depth on the space of continuous functions with supremum norm (C, • ∞ ) only -in particular, its proof involves the Arzelà-Ascoli Theorem, a result specific to (C, • ∞ ).Although an attempt to extent this result to (4.3) could be pursued, Gijbels and Nagy (2015) admitted that their adjustment is primarily motivated by theoretical considerations, but plays very little role in practice.Therefore, we will not consider this any further here.
Finally, the obvious U -statistics structure of (4.1) allows us to easily deduce, through an appropriate Central Limit Theorem, the asymptotic normality of µD(χ, P n ), a result that could be used for inference, for instance to build a confidence region for the 'true' median element, i.e. the deepest element with respect to the population distribution P (Serfling and Wijesuriya, 2017).

Data examples
In this section, we illustrate the usefulness of the proposed metric depth µD on 5 real data sets: two one-dimensional functional datasets (Sections 5.1 and 5.2), a bidimensional functional dataset (Section 5.3), a symbolic data set (Section 5.4) and a non-numeric data set (text) (Section 5.5).

Canadian weather data
The Of course, the daily average temperature curves are particularly noisy, which could heavily affect the L 2 -distances computed between pairs of curves, hence the whole calculation of the depths.One can deal with the roughness of those curves in different manners: first, one could use smoothed versions of the initial curves, for instance the monthly average temperatures as in Serfling and Wijesuriya (2017); second, one could use for d a distance less affected by such noise than the L 2 one, for instance the supremum (L ∞ ) distance; finally one can expand the different curves in a certain basis and focus only on the first terms when assessing the proximity between them.We achieved that by expanding each curve in the empirical Principal Components basis (Hall, 2011) and keeping only the first two principal scores: the curves re-constructed from those two components only are indeed smooth approximations to the initial, rough curves.So, each curve is now represented by a point in the 2-dimensional space of the first two Principal Components, and the proximity between two curves quantified by the L 2 -distance between the corresponding two points.
In effect, this defines a pseudo-distance between the initial curves, see Ferraty and Vieu (2006, Section 3.4.1).The depths assigned to each station according to these 4 methods are shown in Table 5.1.The four depth measures are in very good agreement, essentially identifying the same central and outlying curves.This shows that the depth measure µD (2.1) and its empirical version (4.1) are quite robust to any reasonable choice of d.

Lip movement data
Malfait and Ramsay (2003) studied the relationship between lip movement and time of activation of different face muscles, see also Ramsay and Silverman (2002, Chapter 10) and Gervini (2008).The study involved a subject saying the word 'bob' 32 times and the movement of their lower lip was recorded each time.Those trajectories are shown in Figure 5.3, and all share the same pattern: a first peak corresponding to the firt /b/, then a plateau corresponding to the /o/ and finally a second peak for the second /b/.
These functions being very smooth (actually, they are smoothed versions of raw data not publicly available), it seems natural to use again the classical L 2 distance for assessing their Table 5.1: Canadian weather data -metric depth measures for 4 different (pseudo-)distances d: µD 2 : L 2 distance; µD ∞ : Supremum (L ∞ ) distance; µD m 2 : L 2 distance on the average monthly temperature curves; µD PCA : L 2 distance in the plane of the first two principal components.relative proximity.Hence, the respective depth of each curve with respect to the sample was obtained by (4.1) with d 2 (χ, ξ) = (χ(t) − ξ(t)) 2 dt.The 5 deepest and 5 least deep curves are shown in the top row of Figure 5.4.In particular, this depth identifies as outliers the three curves showing a second peak at a much later time than for the rest of the curves, which were already hived off by Gervini (2008).The remaining two outlying curves show two peaks of lower amplitude than the others, with a second peak occurring earlier than the bunch.Now, Malfait and Ramsay (2003), in their original study, were more interested in the acceleration of the lip during the process rather than on the lip motion itself.The study aimed at explaining time of activation of face muscles, and the acceleration reflects the force applied to tissue by muscle contraction.Hence, in this application, it may be worth contrasting the lip trajectories in terms of their corresponding accelerations, that is, comparing the second derivatives of the position curves.The L 2 distance between the second derivatives

Handwriting data
The 'handwriting' data set consists of twenty replications of the printing of the three letters 'fda' by a single individual.The position of the tip of the pen has been sampled 200 times per second.The data, available in the R package fda, have already been pre-processed so that the printed characters are scaled and oriented appropriately, see Figure 5.5.
These data are essentially bivariate functional data.Indeed, each instance χ of the word 'fda' arises through the simultaneous realisation of two components (χ X (t), χ Y (t)), where χ X (t) and χ Y (t) give the position along the horizontal axis and the vertical axis, respec- This distance can be used directly in (4.1) to identify the 5 deepest and 5 least deep instances of 'fda', see Figure 5.7.The bivariate nature of the data at hand does not cause any particular complication and the definition (2.1) need not be re-adapted to this case.Again, the so-defined depth only focuses on the 'drawings' fda themselves, and identifies the deepest instances.However, it was argued in the related literature that the tangential acceleration of the pen during the process was also a key element to analyse for understanding the writing dynamics, for instance for discriminating between genuine handwritings and forgeries (Geenens, 2011a,b).As in Subsection 5.2, one could therefore use (4.1) with d a pseudo-distance assessing the proximity between two instances of fda through their tangential acceleration curves only, if that was to be the focus of the analysis.

Age distribution in European countries
Symbolic Data Analysis (SDA) has recently grown as a popular research field in statistics (Billard andDiday, 2003, 2007).Indeed the intractably large 'Big Data' sets often need to be summarised so that the resulting summary datasets are of a manageable size, and so-called 'symbolic data' typically arise from such a process.No longer formatted as single values like classical data, they are meant to be 'aggregated' variable typically represented by lists, intervals, histograms, distributions and the like.In this section we give a closer look at a 'distribution-valued' symbolic data set.Specifically, we analyse the distribution of the age of the population of the 44 european countries (see Table 5.2).
The 2017 data were obtained from the US Census bureau (www.census.gov/population/international/data/). Typically, the population distribution for a given country is presented under the form of a population pyramid (that is, a histogram), from which a proper distribution function for population age can easily be extracted (Kosmelj and Billard, 2011).
Hence, each country (here: 'individual', also called 'concept' in the SDA literature) is characterised by a distribution.Figure 5  The data being here distribution functions of nonnegative variables, M can be identified with a space of distribution functions supported on R + , i.e. a space of nondecreasing càdlàg functions F with F (0) = 0 and lim t→∞ F (t) = 1, equipped with an appropriate distance.
The Wasserstein distance has proved useful for a wide range of problems explicitly involving distribution functions (Rachev, 1984, Panaretos andZemel, 2020), hence seems a natural choice in this setting as well.For some r ≥ 1, the Wasserstein distance between two distributions F and G whose rth moments exist, is defined as where the infimum is taken over the set of all joint bivariate distributions whose marginal distributions are F and G respectively.Properties of this distance are described in Major (1978) and Bickel and Freedman (1981).In particular, it is known that d r (F, G) is essentially the usual L r -distance between the quantile functions F −1 and G −1 over [0,1].Also, it is known that convergence in the Wasserstein distance is equivalent to convergence in distribution together with convergence of the first r moments.Hence, the distance d r quantifies the proximity between two distributions through both their general appearance and the values of their moments.In what follows, we take r = 2, hence we consider functional data in (M 2 , d 2 ), M 2 being the space of all probability distribution functions with finite second moment.
The flexibility of (2.1) allows us to base µD on the Wasserstein distance so as to define a depth measure specific to distribution functions without any difficulty.The 'Wassersteindepths' of the 44 countries are given in Table 5.2.The 5 deepest and least deep age distributions are shown in Figure 5.9.The deepest distribution, hence the most representative of the age distributions in Europe, appears to be that of Switzerland, a country located at the very heart of Europe, in-between the Western and Eastern countries, and inbetween the Northern countries and the Southern countries, at the meeting point between the 'Germanic' world (Germany, Austria) and the 'Latin' world (France, Italy).From that perspective, Switzerland can be regarded as really representative of a 'median' European country on many aspects.On the other hand, the Wasserstein-metric depth is null for Kosovo and Monaco, and indeed, the distributions for those two countries clearly lie outside the bunch of the other distributions.Monaco is a micro, mild-climate (and incidentally, tax haven) state which attracts a large amount of rich retirees from all over the continent (if not the world), hence its population is globally much older than for other countries and its age distribution is below the others.Monaco set aside, Germany and Italy show globally the oldest population of Europe.Kosovo was still recently at the heart of an armed conflict in the Balkans, which explains the low proportion of older people in that country and the position of its age distribution above all the others.To some extent, this also explains the outlyingness of Albania's curve.In any case, this example illustrates that one can readily define a depth measure tailored for distribution curves, which paves the way for developing rank-like procedures in Symbolic Data Analysis as well.

Authorship attribution by intertextual distance
Author identification on an unknown or doubtful text is one of the oldest statistical problems applied to literature.Here the capability of the proposed metric depth is illustrated within that framework.William Shakespeare and Thomas Middleton were contemporaries (late 16th-early 17th centuries), and their oeuvre are often compared.In that aim, Merriam (2003) examined 9 Middleton plays and 37 Shakespeare texts, and computed between each pair of them the so-called 'inter-textual distance' proposed by Labbé and Labbé (2001).Although the entities of interest are here purely non-numerical (famous literary pieces), the obtained matrix of distances allows us to outline the relative position of each text -and this is essentially all what is needed for µD to come into play.
As an example, Table 5.3 (recovered from Appendix 2 in Merriam ( 2003)) reports the 'intertextual' distances between the 9 essential plays of Middleton.Computing the empirical metric-depth (4.3) on each of this entry in the 'Middleton sample' reveals that the two deepest observations are 'More Dissemblers Besides Women' and 'A Trick to Catch the Old One' (both get a depth of 0.4167).They may, therefore, be considered as the most typical Middleton plays (as long as the 'inter-textual' distance is the relevant metric).This time focusing on the 37 Shakespeare texts only, 'Antony and Cleopatra' is identified as Shakespeare's most typical text; i.e., the deepest among the considered sample (depth: 0.5255) -see Table 5.4 (left column).The following most representative of Shakespeare plays are 'The Tempest' (0.5135), 'Othello' (0.5030) and 'Romeo and Juliet' (0.5015).The most outlying piece of work is the verset part of 'Henry V' (depth: 0), which tends to confirm a common conjecture hold by many experts on Shakespeare's oeuvre: the verset part of 'Henry V' was not written by Shakespeare himself, but by Christopher Marlowe (Viprey and Ledoux, 2006).Here we use it in an illustrative purpose only.(Merriam, 2002).Now, if we computed the metric depth of the 9 Middleton's plays in Shakespeare's sample, all would receive depth 0 -all are 'outlying' in Shakespeare's oeuvre.This clearly indicates that Middleton's work cannot be confused with Shakespeare's, and it should be easy to assign a new piece of text to one or the other based on µD.Further, it is interesting to analyse the depth of each text in a combined sample made up both the works of Middleton and Shakespeare.In particular, some of Shakespeare's texts which have a low depth in the 'Shakespeare's only' sample, see their depth increase by large in the combined sample.
This indicates that these pieces may have a strong Middleton flavour, to some extent.This hypothesis is confirmed for at least one of those plays: 'Timon of Athens' sees its depth increase from 0.1141 to 0.3971 if one includes Middleton's works in the reference sample; and indeed, extensive research on the topic has provided ample evidence that Middleton wrote approximately one third of that play (Taylor, 1987).
Note that computing and comparing the depth of certain observations in two different samples is the spirit of the DD-plot and the DD-classifier proposed by Li et al (2012).
These procedures can naturally be used in conjunction with the metric depth µD, enabling similar powerful depth-based analyses in abstract metric spaces.

Conclusion
In this paper, we have proposed a new statistical depth function, called 'metric depth' or just µD, defined in an abstract metric space.It is explicitly constructed on a certain distance d that must be chosen by the analyst, which allows them to tailor the depth to the data at hand and to the ultimate goal of the analysis.This offers an unmatched flexibility about the range of problems and applications that can be addressed using the said depth measure.The usefulness of µD has been illustrated on several real data sets, including one in the emergent field of Symbolic Data Analysis and an application in text mining (authorship attribution).Rejuvenating an old idea of Bartoszynski et al (1997), its definition is very intuitive: the depth of a functional point χ with respect to a distribution P is the probability to find it 'between' two functional objects X 1 and X 2 randomly generated from P , 'between' meaning here that χ belongs to the intersection of the two open dballs B d (X 1 , d(X 1 , X 2 )) and B d (X 2 , d(X 1 , X 2 )).This definition is natural and enjoys many pleasant properties.
Figure 5.2: Five deepest curves (left; the darkest curves are the deepest) and five least deep curves (right; the lightest curves are the least deep) according to (4.1) with d(χ, ξ) = χ − ξ 2 , the L 2 distance.

Figure 5 Figure 5 . 5 :
Figure 5.4: Top and middle row: Five deepest curves (left; the darkest curves are the deepest) and five least deep curves (right; the lightest curves are the least deep) according to (4.1) with (i) d(χ, ξ) = χ − ξ 2 , the L 2 distance between the curves (top row) and (ii) d(χ, ξ) = χ − ξ 2 , the L 2 distance between the second derivatives of the curves (middle row).Bottom row: Five deepest acceleration curves (left; the darkest curves are the deepest) and five least deep acceleration curves (right; the lightest curves are the least deep).

Figure 5 . 6 :Figure 5 . 7 :
Figure 5.6: One instance of the handwriting data, and its x-and y-components.
Figure 5.8: Age distribution in European countries Figure A.1: Top row: Example A.1; Central row: Example A.2; Bottom row: Example A.3.From left to right, density function (first column), sample lens depth constructed with n = 5, 000 sample draws from P (second column), corresponding heat-map (third column) and its section along the line x 2 = 0 (top-right panel), x 1 = x 2 (central-right panel) and x 1 = 0 (bottom-right panel).

Table 5 .
.8 displays the sample of age distributions.Here we will use the suggested metric depth µD to analyse which countries are most representative of the 'European' age distribution, and which countries can be regarded as 'outliers' in that 2: Age distribution in European countries -metric depth for the age distributions of the 44 European countries, based on the Wasserstein distance.