## Abstract

This paper proposes methods to detect outliers in functional data sets and the task of identifying atypical curves is carried out using the recently proposed kernelized functional spatial depth (KFSD). KFSD is a local depth that can be used to order the curves of a sample from the most to the least central, and since outliers are usually among the least central curves, we present a probabilistic result which allows to select a threshold value for KFSD such that curves with depth values lower than the threshold are detected as outliers. Based on this result, we propose three new outlier detection procedures. The results of a simulation study show that our proposals generally outperform a battery of competitors. We apply our procedures to a real data set consisting in daily curves of emission levels of nitrogen oxides (NO\(_{x}\)) since it is of interest to identify abnormal NO\(_{x}\) levels to take necessary environmental political actions.

### Similar content being viewed by others

## Notes

In presence of tie, the method with lower false outlier detection percentage (f) is preferred.

## References

Barnett V, Lewis T (1994) Outliers in statistical data, vol 3. Wiley, New York

Chakraborty A, Chaudhuri P (2014) On data depth in infinite dimensional spaces. Ann Inst Stat Math 66:303–324

Chen Y, Dang X, Peng H, Bart HL (2009) Outlier detection with the kernelized spatial depth function. IEEE Trans Pattern Anal Mach Intell 31:288–305

Cuesta-Albertos JA, Nieto-Reyes A (2008) The random Tukey depth. Comput Stat Data Anal 52:4979–4988

Cuevas A (2014) A partial overview of the theory of statistics with functional data. J Stat Plan Inference 147:1–23

Cuevas A, Fraiman R (2009) On depth measures and dual statistics. A methodology for dealing with general data. J Multivar Anal 100:753–766

Cuevas A, Febrero M, Fraiman R (2006) On the use of the bootstrap for estimating functions with functional data. Comput Stat Data Anal 51:1063–1074

Febrero M, Oviedo de la Fuente M (2012) Statistical computing in functional data analysis: the R package fda.usc. J Stat Softw 51:1–28

Febrero M, Galeano P, González-Manteiga W (2007) A functional analysis of NOx levels: location and scale estimation and outlier detection. Comput Stat 22:411–427

Febrero M, Galeano P, González-Manteiga W (2008) Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels. Environmetrics 19:331–345

Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York

Fraiman R, Muniz G (2001) Trimmed means for functional data. Test 10:419–440

Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York

Hyndman RJ, Shang HL (2010) Rainbow plots, bagplots, and boxplots for functional data. J Comput Graph Stat 19:29–45

Ignaccolo R, Franco-Villoria M, Fassò A (2015) Modelling collocation uncertainty of 3D atmospheric profiles. Stoch Environ Res Risk Assess 29:419–429

López-Pintado S, Romo J (2009) On the concept of depth for functional data. J Am Stat Assoc 104:718–734

McDiarmid C (1989) On the method of bounded differences. Survey in combinatorics. Cambridge University Press, Cambridge, pp 148–188

Menafoglio A, Guadagnini A, Secchi P (2014) A kriging approach based on Aitchison geometry for the characterization of particle-size curves in heterogeneous aquifers. Stoch Environ Res Risk Assess 28:1835–1851

Ramsay JO, Silverman BW (2005) Functional data analysis. Springer, New York

Ruiz-Medina MD, Espejo RM (2012) Spatial autoregressive functional plug-in prediction of ocean surface temperature. Stoch Environ Res Risk Assess 26:335–344

Sguera C, Galeano P, Lillo R (2014) Spatial depth-based classification for functional data. Test 23:725–750

Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London

Sun Y, Genton MG (2011) Functional boxplots. J Comput Graph Stat 20:316–334

Tukey JW (1975) Mathematics and the picturing of data. Proc Int Congr Math 2:523–531

## Acknowledgments

The authors would like to thank the editor in chief, the associate editor and an anonymous referee for their helpful comments. This research was partially supported by Spanish Ministry of Science and Innovation grant ECO2011-25706 and by Spanish Ministry of Economy and Competition grant ECO2012-38442.

## Author information

### Authors and Affiliations

### Corresponding author

## Appendix

### Appendix

### 1.1 From \(FSD(x, Y_{n})\) to \(KFSD(x, Y_{n})\)

To show how to pass from \(FSD(x, Y_{n})\) in (1) to \(KFSD(x, Y_{n})\) in (4), we first show that \(FSD(x, Y_{n})\) can be expressed in terms of inner products. We present this result for \(n=2\). The norm in (1) can be written as

Let \(\delta _{1}=\sqrt{\langle x,x\rangle +\langle y_{1},y_{1}\rangle -2\langle x,y_{1}\rangle }\) and \(\delta _{2}=\sqrt{\langle x,x\rangle +\langle y_{2},y_{2}\rangle -2\langle x,y_{2}\rangle }\). Then,

and apply the embedding map \(\phi \) to all the observations of the last expression. According to (2), this is equivalent to substitute the inner product function with a positive definite and stationary kernel function \(\kappa \), which explains the definition of \(KFSD(x, Y_{n})\) in (4) for \(n=2\). The generalization of this result to \(n>2\) is straightforward.

### 1.2 Proof of theorem 1

As explained in Sect. 3, Theorem 1 is a functional extension of a result derived by Chen et al. (2009) for KSD, and since they are closely related, next we report a sketch of the proof of Theorem 1. The proof for KSD is mostly based on an inequality known as McDiarmid ’s inequality (McDiarmid 1989), which also applies to general probability spaces, and therefore to functional Hilbert spaces. We report this inequality in the next lemma:

###
**Lemma 1**

*(McDiarmid [1.2]) Let *
\(\Omega _{1}, \ldots , \Omega _{n}\)
* be probability spaces. Let*
\({\mathbf {\Omega }} = \prod _{j=1}^{n} \Omega _{j}\)
* and let*
\(X: {\mathbf {\Omega} } \rightarrow {\mathbb {R}}\)
* be a random variable. For any *
\(j \in \left\{ 1, \ldots , n\right\} \),* let *
\((\omega _{1}, \ldots , \omega _{j}, \ldots ,\)
\(\omega _{n})\)
* and*
\(\left( \omega _{1}, \ldots , \hat{\omega }_{j}, \ldots , \omega _{n}\right) \)
* be two elements of*
\({\mathbf {\Omega }}\)
* that differ only in their*
*jth coordinates. Assume that*
*X is uniformly difference-bounded by*
\(\{c_j\}\),* that is, for any*
\(j \in \left\{ 1, \ldots , n\right\} \),

*Then, if*
\({\mathbb {E}}[X]\)
* exists, for any*
\(\tau > 0\)

In order to apply Lemma 1 to our problem, define

whose expected value is given by

Now, for any \(j \in \left\{ 1, \ldots , n_{Z}\right\} \) and \(\hat{z}_{j} \in {\mathbb {H}}\), the following inequality holds

and it provides assumption (12) of Lemma 1. Therefore, for any \(\tau > 0\)

and by the law of total probability

Next, setting \(\delta = \exp \left( -2n_{Z}\tau ^{2}\right) \), and solving for \(\tau \), the following result is obtained:

Therefore,

However, Theorem 1 provides a probabilistic upper bound for \({\mathbb {E}}_{x|Y_{n_{Y}}}\left[ g(x,Y_{n_{Y}})\right] \). First, recall that \(z_{1} \sim Y_{mix}\) and note that

Then, since \({\mathbb {E}}_{(z_{1} \sim Y_{nor})|Y_{n_{Y}}}\left[ g\left( z_{1},Y_{n_{Y}}\right) \right] = {\mathbb {E}}_{x|Y_{n_{Y}}}\left[ g\left( x,Y_{n_{Y}}\right) \right] \), for \(\alpha >0\),

Consequently, combining (15) and (16), and for \(r \ge \alpha \), we obtain

which completes the proof. \(\square \)

## Rights and permissions

## About this article

### Cite this article

Sguera, C., Galeano, P. & Lillo, R.E. Functional outlier detection by a local depth with application to NO_{
x
} levels.
*Stoch Environ Res Risk Assess* **30**, 1115–1130 (2016). https://doi.org/10.1007/s00477-015-1096-3

Published:

Issue Date:

DOI: https://doi.org/10.1007/s00477-015-1096-3