Density kernel depth for outlier detection in functional data

In this paper, we propose a novel approach to address the problem of functional outlier detection. Our method leverages a low-dimensional and stable representation of functions using Reproducing Kernel Hilbert Spaces (RKHS). We define a depth measure based on density kernels that satisfy desirable properties. We also address the challenges associated with estimating the density kernel depth. Throughout a Monte Carlo simulation we assess the performance of our functional depth measure in the outlier detection task under different scenarios. To illustrate the effectiveness of our method, we showcase the proposed method in action studying outliers in mortality rate curves.


Introduction
Advances in technology are providing data scientists with an unprecedented amount of high-dimensional data. Electrocardiogram signals, fMRI images or Mortality curves are relevant examples of what is nowadays called Functional Data [1]. In Functional Data Analysis (FDA), each observation is a function that constitutes an infinite dimensional object. Analysing functional outliers is critical in several contexts, including functional regression [2], robust functional principal component analysis [3], functional outlier visualisation [4], robust functional data clustering [5]; and also in several applied context where FD is involved [6].
Functional outliers are commonly classified into two categories: magnitude outliers and shape outliers [7,8]. The contribution of this paper, is to propose a novel depth measure for functional data to handle both type of functional outliers simultaneously. To achieve this goal, we introduce a density kernel depth that relies on a finite dimensional and stable representation of functions. The kernel depth measure satis-fies desirable properties, as we formally discuss in Sect. 3.1, and induces a centre-outward ordering on functions. We also propose suitable estimation methods that are based on a One-Class-Neighbour-Machine, a non parametric estimator of density level sets. Furthermore, we discuss a statistically sound bootstrap approach to infer functional outliers in data.
The remainder of the paper is organised as follows: Sect. 2 propose a suitable representation model for functional data. In Sect. 3 we introduce a density kernel depth measure, and also discuss its properties and suitable estimation methods. In Sect. 4 we provide an algorithm for functional outlier detection based on bootstrap procedure. In Sect. 5, we present a Monte Carlo simulation study and a real data application to mortality curves. Finally, in Sect. 6 we conclude the paper.

An RKHS framework for functional data
In what follows we consider a square integrable stochastic processes X (t) ∈ H in a separable Hilbert space of functions H ⊂ L 2 (T ), where T ⊂ R is a compact and convex set. As usual in practice, we also assume that curves are sampled over a discrete grid of points t = {t 1 , . . . , t p }, being p 0, in a signal plus noise fashion as follows: where x = {x(t 1 ) + e 1 , . . . , x(t p ) + e p } is the vector with the observed data and e = (e 1 , . . . , e p ) is an independent and zero-mean residual term.
Most functional data analysis approaches for preprocessing raw data as in Eq. (1) suggest to proceed as follows: Choose an orthogonal basis of functions B = {φ i } i≥1 , where each φ i ∈ H, and then represent each functional datum by means of a linear combination in the Span(B) [9,10]. A usual choice is to consider H as a Reproducing Kernel Hilbert Space (RKHS) of functions [11]. In this case, the elements in the spanning set B are the eigenfunctions associated to the positive-definite and symmetric kernel K : T × T → R that span H. In our setting, the functional representation problem can be framed as follows: We observe each curve on p sample points and the corresponding functional data estimator is obtained solving the following regularization problem: where L is a strictly convex functional with respect to the second argument, γ > 0 is a regularization parameter (chosen by cross-validation), and (h) is a regularization term. (2) exists, is unique, and admits a representation of the form: In the particular case of a squared loss function L(w, z) = (w − z) 2 and considering (h) = T h 2 (t) dt, the coefficients of the linear combination in Eq. (3) are obtained solving the following linear system: where α = (α 1 , . . . , α p ) T , I is the identity matrix of order p, and K the is the p × p Gram matrix with the kernel evaluations, [K] k,l = K (t k , t l ), for k = 1, . . . , p and l = 1, . . . , p.
A main drawback on the estimation entitled in Eq. (3) is the instability of α, which change substantially under small perturbations in data. To avoid such problem, we resort on Mercer theorem [11] and consider an alternative functional estimator based on the projection of x(t) onto the functional space generated by the first d p eigenfunctions of K : where λ = (λ 1 , . . . , λ d ) are the projection coefficients onto the functional space generated by the first d eigenfunctions of K (i.e. λ j ≡ l j (α T v j )/ √ p, where (l j , v j ) is the j th eigen pair of Gram matrix K), (t) = (φ 1 (t), . . . , φ d (t)) is a vector function with the first d eigenfunctions associated to K , and d p is a resolution parameter such that for a small ε d it holds that sup t∈T | x(t) −x(t) |≤ ε d , see [14] for further details. The proposed depth measures for functional data relies on the computation of λ, as we discuss in next Section.

Depth measures for functional data
There are several notions of depth measures in Statistics, all of them involve the computation of a quantity that represent the centrality of a given point z ∈ R d with respect to a probability distribution f . In this way, depth measures induce an order in data, and are a natural tool to identify outliers.
Some remarkable examples of depth measures for functional data are in order. In Fraiman and Muniz [15], the authors propose the Integrated Depth that resort on a trimmed functional mean estimator to ranks the functions. The Random Tuckey Depth [16] and the Random Projection Depth [17], rely on random projections of the functional data and the computation of the deepest function using univariate statistics. Another well known example is the case of the h-Mode Depth [17], that considers the expected kernel distances for each curve using the L 2 norm, see Example 1.
where K is a kernel function and h is the bandwidth parameter.
In FDA, the graphical analysis is always a complementary approach in terms of visualisation and interpretation of outliers. In this sense, the Band Depth and Modified Band depth introduced in López-Pintado and Romo [18] are suitable methods. An interesting review on topological functional depth measures can be found in [19] and [20]. We introduce next our depth measure that rely on density kernels.

Depth induced by density kernels
Let Z ∈ R d be a random vector with density function f , the function g : R d → R is f -monotone if satisfies the following condition: As an example, consider the parametric model Z ∼ N (μ, σ 2 ); then g(x, (μ, as the product of two f -monotone functions as follows: Notice that K f depends on the density function f that is unknown or intractable in practice. We address the estimation of K f using asymptotically f -monotone functions as discussed in next subsection. The density kernel depth measure is then obtained by combining a density kernel with a deepest (central) curve. To define the later, consider a statistical model f (a bounded density function) for the projection coefficients

Proposition 1
The DKD satisfies the following desirable properties [21] for depth measures: P1. Maximality at center: The DKD take the largest value evaluated at the deepest curve, i.e. sup P2. Monotonicity relative to the deepest function: P3. Vanishes at infinity:

P4. Invariant under affine transformations: Let T be the class of affine transformations in H and let τ ∈ T be an affine map, then DKD
In addition to properties P1-P4, the order induced by the DKD is also invariant under changes in the basis function B = {φ 1 (t), . . . , φ d (t)} that span the linear subspace in H where we project the curves. We give more details and a formal proof in the supplementary material. In the following subsection we address the details involved in the estimation of DKD from data.

Estimating the density kernel depth
Notice that g(·, f ) depends on the density function f that is unknown or intractable in practice; this leads to the following related concept: Given a random sample S n = As an example of asymptotic monotone function consider g(z, Therefore, a suitable estimator for K f is given by the product of two asymptotically f -monotone functions. The estimation of the DKD is based on an asymptotically f -monotone function g(·, S n ) which is proportional to a non-parametric consistent estimator of f -the density function that corresponds to the projection coefficients. Our density estimation method is based on the One-Class Neighbor Machine (OCNM), a well known nonparametric density level set estimator [22,23] Some comments about the density estimation method are in order. The OCNM is a consistent estimator of ν-level sets under mild conditions on f . In addition, the OCNM relies on a linear-convex optimization problem, entailing important computational advantages in comparison to other standard nonparametric density estimation approaches such as kernel method. For more details about the OCNM we refer to [23].
The estimation of DKD via the OCNM is straightforward. Given a sample of n discretised curves as in Eq. (1), D n = {x 1 , . . . , x n } and its corresponding projection coefficients {λ 1 , . . . , λ n } according to Eq. (5); we use the OCNM to estimate density contour clusters V ν around the estimated mode m for an increasing sequence ν ≡ {ν 1 , . . . , ν m }, such that 0 ≤ ν 1 < · · · < ν m ≤ 1. Notice that m ∈ {λ 1 , . . . , λ n } corresponds to the sample curve which belongs to the highest ν-density level set. We consider the following asymptotic fmonotone function: is the indicator function that take value 1 if λ ∈ V ν i and 0 otherwise. In this sense, the estimated DKD(x(t), D n , K f ) ≡ g(λ, D n )g( m, D n ) order the sampled curves around the estimated center x mo (t) = m T (t); i.e. high values of DKD corresponds to estimated central curves and vice-verse.

Functional outlier detection
Depth measures induces a centre-outward ordering, and constitute a natural tool to assess the presence of outlying curves in the sample. From an statistical perspective, the analysis of atypical curves relies on the distribution of DKD ( X (t), f , K f ). Nevertheless, nither the exact nor the asymptotic distribution of DKD is known. For this reason, we employ bootstrap methods to approximate such distribution and determine which curves in the sample are more likely to be outliers. In Algorithm 1, we present our bootstrap method that resembles the procedure given in [24]. For the analysis of outlier functional data, a suitable threshold q ∈ (0, 1) (q is intended to be the type I error on the identification method) needs to be specified in advance. In the case of Australian mortality curves discussed in Sect. 5, we choose q = 0.01; nevertheless in practice we recommend to conduct a sensitivity analysis regarding the value given to this sensible parameter.

For each D (b)
n , produce a small perturbations on the raw bootstraped sample data as follows: where z is sampled from a multivariate normal distribution with mean zero and covariance matrix S = n

Remove the curves identified as outliers in Step 4 and return to
Step 1 until no more outliers are found in the remaining data.
Some comments on Algorithm 1 are in order.
Step 1 and 2 entails standard statistical procedures.
Step 3 involves some parametric perturbation on data in order to produce a robust estimation of the true DKD quantile.

Experiments
In this section we develop numerical experiments to assess the performance of the proposed depth measure in the task of functional outlier detection. The implementation of DKD is available in the 'bigdatadist' R-package, and the R code to reproduce the experiments are provided as supplementary material.
Hereafter, for the functional data representation, we consider a Gaussian kernel function K (t, s) = exp(−σ t − s 2 ). The bandwidth parameter σ and the dimension of the basis function system d given in Eq. (5) where cross-validated through grid search. To compare our method we consider several functional depth methods: the modified band depth (MBD) [18] already implemented in the R-package 'depthTools' [25], the random Tukey depth (RTD) and the h-mode depth (HMD), see [16,17], implemented in the Rpackage 'fda-usc' [26], and the functional spatial depth (FSD), see [27].
To illustrate the generating process, in Fig. 1, we show one instance of the simulated curves in the scenarios (a) to (c) for q = 10%.

Results
To evaluate the outlier detection performance of the DKD we develop a Monte Carlo simulation study. To this end, for each scenario and contamination level, we generate M = 1000 data replications and report the following average metrics: True Positive Rate TPR = TP q×n (sensitivity); True Negative Rate TNR = TN (1−q)×n (specificity), and the area under the ROC curve aROC.
The Monte Carlo simulation results are presented in Table 1. The DKD resents a remarkable performance among other functional depths measures in the three scenarios considered for different levels of contamination q ∈ {1%, 5%, 10%}. However in the case where q = 1% the difference with respect to other depth measures is not statistically significant. Among the three different scenarios the performance of the DKD increases with respect to the rest of the competitors in scenario (b) and (c), where me move away from the Gaussian setting.

Detecting outlying curves in the Australian mortality database
For this experiment we consider age-specific log-mortality rates of Australian males. The data is publicly available in the R-package 'fds' 1 [28]. The data set consists of 103 curves that are registered over a range of 0-100 age cohorts, with each curve corresponding to one year between 1901 and 2003. As shown in Fig. 2, for low-age cohorts (until 12 years approximately), the mortality rates present a decreasing 1 Sourced primary from the Australian Demographic Data Bank trend and then start to grow until late ages, where all cohorts achieve a 100% mortality rate. Results Outlier detection is an unsupervised learning problem. Therefore, we do not know a-priori if there are any outlying curves (years) in this data set. Following the procedure detailed in Algorithm 1, we estimate the cut-off point q such that mislabelling of correct observations as outliers is about 1%. The results are presented in Table 2 and Fig. 2. The outliers detected by the DKD are years : 1919, 1943-1945. The year 1919 corresponds to the influenza pandemic episode that caused around 15, 000 casualties, as the virus spread through Australia. Given the pattern in the curve corresponding to year 1919, this curve can be visually identified and labelled as a magnitude outlier. In a related study that rely on the same data [29], the authors identified 1919 as an anomalous observation as well.
Moreover, these curves exhibit a distinctive shape compared to the others, lacking any prominent points (agecohorts) that would indicate their dissimilarity. As a result, visually detecting them is exceedingly difficult. Specifically, for the age-cohorts between 15 to 40, one can observe a discrepancy in the curve patterns, so they can be considered as shape outliers. It is particularly relevant to identify outliers in this dataset to achieve reliable predictions of mortality curves. With the onset of the COVID-19 pandemic, improved predictions can aid governments in making better-informed decisions.
The DKD handle both type of functional outliers, as follows from the simulations in the previous section. In Table  2 we compare our findings against other standard benchmarks introduced in the Monte Carlo simulation study as well as the Outliergram -particularly devised to capture shape outliers- [7] and the Functional Boxplot [8] (see Supplementary Material). Both devices are already implemented in the R-packages 'roahd' and 'fda' respectively. The Outliergram identifies the year '1914' as an outlier, while the Functional Boxplot has not pinpointed any anomalous observation. These tools are based on the Modified Band Depth measure. This approach is very efficient identifying magnitude outliers or shape outliers that present a sharp difference with respect to the rest of the sample. In this case the shape outliers are hidden within the rest of the data and this could explain the poor performance of the method. Finally, is also interesting to notice that the mortality curve corresponding to year 2003 is identified as an outlier for all the competitors.
Since year 2003 is located on the outwards of the data with respect to the deepest curve, we tend to see these find as a type-II-error in the analysis.

Conclusions
In this work, we propose a density kernel depth (DKD) measure as a tool for detecting outliers in functional data. The DKD method relies on a stable low-dimensional representation of curves in a linear subspace of a Reproducing Kernel Hilbert Space. We discuss interesting properties associated with the DKD and address the subtleties involved in its estimation. We showcase the performance of our method through a Monte Carlo simulation study using three different scenarios: symmetric, asymmetric, and bi-modal models for functional data. The DKD is able to identify the presence of shape and magnitude outliers not only in simulated data but also in the analysis of Australian male mortality rate curves.
Although the DKD has a remarkable performance in identifying outliers that are hidden within the rest of the curves and are extremely difficult to spot visually, we suggest com-bining DKD analysis with the outliergram [7] and functional boxplot methods [8] to achieve more robust results.
Author contributions Authors contributed equally to this work.

Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Data availability Data is available in the 'fds' R-package.

Declarations
Conflicts of interest None of the authors has a conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.