Functional outlier detection by a local depth with application to NO x levels

Sguera, Carlo; Galeano, Pedro; Lillo, Rosa E.

doi:10.1007/s00477-015-1096-3

Functional outlier detection by a local depth with application to NO_x levels

Original Paper
Published: 13 June 2015

Volume 30, pages 1115–1130, (2016)
Cite this article

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Carlo Sguera¹,
Pedro Galeano¹ &
Rosa E. Lillo¹

525 Accesses
22 Citations
1 Altmetric
Explore all metrics

Abstract

This paper proposes methods to detect outliers in functional data sets and the task of identifying atypical curves is carried out using the recently proposed kernelized functional spatial depth (KFSD). KFSD is a local depth that can be used to order the curves of a sample from the most to the least central, and since outliers are usually among the least central curves, we present a probabilistic result which allows to select a threshold value for KFSD such that curves with depth values lower than the threshold are detected as outliers. Based on this result, we propose three new outlier detection procedures. The results of a simulation study show that our proposals generally outperform a battery of competitors. We apply our procedures to a real data set consisting in daily curves of emission levels of nitrogen oxides (NO$_{x}$) since it is of interest to identify abnormal NO$_{x}$ levels to take necessary environmental political actions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Outlier Detection for Geostatistical Functional Data: An Application to Sensor Data

Comparison of local outlier detection techniques in spatial multivariate data

Article 28 June 2016

Outlier detection in spatial error models using modified thresholding-based iterative procedure for outlier detection approach

Article Open access 15 April 2024

Notes

In presence of tie, the method with lower false outlier detection percentage (f) is preferred.

References

Barnett V, Lewis T (1994) Outliers in statistical data, vol 3. Wiley, New York
Google Scholar
Chakraborty A, Chaudhuri P (2014) On data depth in infinite dimensional spaces. Ann Inst Stat Math 66:303–324
Article Google Scholar
Chen Y, Dang X, Peng H, Bart HL (2009) Outlier detection with the kernelized spatial depth function. IEEE Trans Pattern Anal Mach Intell 31:288–305
Article Google Scholar
Cuesta-Albertos JA, Nieto-Reyes A (2008) The random Tukey depth. Comput Stat Data Anal 52:4979–4988
Article Google Scholar
Cuevas A (2014) A partial overview of the theory of statistics with functional data. J Stat Plan Inference 147:1–23
Article Google Scholar
Cuevas A, Fraiman R (2009) On depth measures and dual statistics. A methodology for dealing with general data. J Multivar Anal 100:753–766
Article Google Scholar
Cuevas A, Febrero M, Fraiman R (2006) On the use of the bootstrap for estimating functions with functional data. Comput Stat Data Anal 51:1063–1074
Article Google Scholar
Febrero M, Oviedo de la Fuente M (2012) Statistical computing in functional data analysis: the R package fda.usc. J Stat Softw 51:1–28
Google Scholar
Febrero M, Galeano P, González-Manteiga W (2007) A functional analysis of NOx levels: location and scale estimation and outlier detection. Comput Stat 22:411–427
Article Google Scholar
Febrero M, Galeano P, González-Manteiga W (2008) Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels. Environmetrics 19:331–345
Article CAS Google Scholar
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York
Google Scholar
Fraiman R, Muniz G (2001) Trimmed means for functional data. Test 10:419–440
Article Google Scholar
Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York
Book Google Scholar
Hyndman RJ, Shang HL (2010) Rainbow plots, bagplots, and boxplots for functional data. J Comput Graph Stat 19:29–45
Article Google Scholar
Ignaccolo R, Franco-Villoria M, Fassò A (2015) Modelling collocation uncertainty of 3D atmospheric profiles. Stoch Environ Res Risk Assess 29:419–429
Article Google Scholar
López-Pintado S, Romo J (2009) On the concept of depth for functional data. J Am Stat Assoc 104:718–734
Article Google Scholar
McDiarmid C (1989) On the method of bounded differences. Survey in combinatorics. Cambridge University Press, Cambridge, pp 148–188
Google Scholar
Menafoglio A, Guadagnini A, Secchi P (2014) A kriging approach based on Aitchison geometry for the characterization of particle-size curves in heterogeneous aquifers. Stoch Environ Res Risk Assess 28:1835–1851
Article Google Scholar
Ramsay JO, Silverman BW (2005) Functional data analysis. Springer, New York
Book Google Scholar
Ruiz-Medina MD, Espejo RM (2012) Spatial autoregressive functional plug-in prediction of ocean surface temperature. Stoch Environ Res Risk Assess 26:335–344
Article Google Scholar
Sguera C, Galeano P, Lillo R (2014) Spatial depth-based classification for functional data. Test 23:725–750
Article Google Scholar
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Book Google Scholar
Sun Y, Genton MG (2011) Functional boxplots. J Comput Graph Stat 20:316–334
Article Google Scholar
Tukey JW (1975) Mathematics and the picturing of data. Proc Int Congr Math 2:523–531
Google Scholar

Download references

Acknowledgments

The authors would like to thank the editor in chief, the associate editor and an anonymous referee for their helpful comments. This research was partially supported by Spanish Ministry of Science and Innovation grant ECO2011-25706 and by Spanish Ministry of Economy and Competition grant ECO2012-38442.

Author information

Authors and Affiliations

Department of Statistics, Universidad Carlos III de Madrid, 28903, Getafe, Madrid, Spain
Carlo Sguera, Pedro Galeano & Rosa E. Lillo

Authors

Carlo Sguera
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Galeano
View author publications
You can also search for this author in PubMed Google Scholar
Rosa E. Lillo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlo Sguera.

Appendix

1.1 From $FSD(x, Y_{n})$ to $KFSD(x, Y_{n})$

To show how to pass from $FSD(x, Y_{n})$ in (1) to $KFSD(x, Y_{n})$ in (4), we first show that $FSD(x, Y_{n})$ can be expressed in terms of inner products. We present this result for $n=2$. The norm in (1) can be written as

$$\begin{aligned} \left\| \sum _{i}^{2}\frac{x-y_{i}}{\Vert x-y_{i}\Vert }\right\| ^{2} &= \left\| \frac{x-y_{1}}{\Vert x-y_{1}\Vert }+\frac{x-y_{2}}{\Vert x-y_{2}\Vert }\right\| ^{2}\\ &= \left\| \frac{x-y_{1}}{\sqrt{\langle x,x\rangle +\langle y_{1},y_{1}\rangle -2\langle x,y_{1}\rangle }}+\frac{x-y_{2}}{\sqrt{\langle x,x\rangle +\langle y_{2},y_{2}\rangle -2\langle x,y_{2}\rangle }}\right\| ^{2} \end{aligned}$$

Let $\delta _{1}=\sqrt{\langle x,x\rangle +\langle y_{1},y_{1}\rangle -2\langle x,y_{1}\rangle }$ and $\delta _{2}=\sqrt{\langle x,x\rangle +\langle y_{2},y_{2}\rangle -2\langle x,y_{2}\rangle }$. Then,

$$\begin{aligned} \left\| \sum _{i}^{2}\frac{x-y_{i}}{\Vert x-y_{i}\Vert }\right\| ^{2} &= \left\| \frac{x-y_{1}}{\delta _{1}}+\frac{x-y_{2}}{\delta _{2}}\right\| ^{2} \\ &= \left\| \frac{x-y_{1}}{\delta _{1}}\right\| +\left\| \frac{x-y_{2}}{\delta _{2}}\right\| + \frac{2}{\delta _{1}\delta _{2}}\langle x-y_{1},x-y_{2}\rangle \\ &= 2 + \frac{2}{\delta _{1}\delta _{2}}(\langle x,x\rangle +\langle y_{1},y_{2}\rangle -\langle x,y_{1}\rangle -\langle x,y_{2}\rangle )\\ &= \sum _{i,j=1}^{2}\frac{\langle x,x\rangle +\langle y_{i},y_{j}\rangle -\langle x,y_{i}\rangle -\langle x,y_{j}\rangle }{\delta _{i}\delta _{j}},\end{aligned}$$

and apply the embedding map $\phi $ to all the observations of the last expression. According to (2), this is equivalent to substitute the inner product function with a positive definite and stationary kernel function $\kappa $, which explains the definition of $KFSD(x, Y_{n})$ in (4) for $n=2$. The generalization of this result to $n>2$ is straightforward.

1.2 Proof of theorem 1

As explained in Sect. 3, Theorem 1 is a functional extension of a result derived by Chen et al. (2009) for KSD, and since they are closely related, next we report a sketch of the proof of Theorem 1. The proof for KSD is mostly based on an inequality known as McDiarmid ’s inequality (McDiarmid 1989), which also applies to general probability spaces, and therefore to functional Hilbert spaces. We report this inequality in the next lemma:

Lemma 1

(McDiarmid [1.2]) Let $\Omega _{1}, \ldots , \Omega _{n}$ be probability spaces. Let ${\mathbf {\Omega }} = \prod _{j=1}^{n} \Omega _{j}$ and let $X: {\mathbf {\Omega} } \rightarrow {\mathbb {R}}$ be a random variable. For any $j \in \left\{ 1, \ldots , n\right\} $, let $(\omega _{1}, \ldots , \omega _{j}, \ldots ,$ $\omega _{n})$ and $\left( \omega _{1}, \ldots , \hat{\omega }_{j}, \ldots , \omega _{n}\right) $ be two elements of ${\mathbf {\Omega }}$ that differ only in their jth coordinates. Assume that X is uniformly difference-bounded by $\{c_j\}$, that is, for any $j \in \left\{ 1, \ldots , n\right\} $,

$$\left| X\left( \omega _{1}, \ldots , \omega _{j}, \ldots , \omega _{n}\right) -X\left( \omega _{1}, \ldots , \hat{\omega }_{j}, \ldots , \omega _{n}\right) \right| \le c_{j}. $$

(12)

Then, if ${\mathbb {E}}[X]$ exists, for any $\tau > 0$

$${\mathrm {Pr}}\left( X-{\mathbb {E}}[X] \ge \tau \right) \le \exp \left( \frac{-2\tau ^2}{\sum _{j=1}^{n} c_{j}^{2}}\right) .$$

In order to apply Lemma 1 to our problem, define

$$X(z_{1},\ldots ,z_{n_{Z}}) = - \frac{1}{n_{Z}}\sum _{i=1}^{n_{Z}}g(z_{i},Y_{n_{Y}}|Y_{n_{Y}}),$$

(13)

whose expected value is given by

$${\mathbb {E}}[X] = {\mathbb {E}}_{z_{i}|Y_{n_{Y}}}\left[ - \frac{1}{n_{Z}}\sum _{i=1}^{n_{Z}}g(z_{i},Y_{n_{Y}}|Y_{n_{Y}})\right] = - {\mathbb {E}}_{z_{1}|Y_{n_{Y}}}\left[ g(z_{1},Y_{n_{Y}}|Y_{n_{Y}})\right] . $$

(14)

Now, for any $j \in \left\{ 1, \ldots , n_{Z}\right\} $ and $\hat{z}_{j} \in {\mathbb {H}}$, the following inequality holds

$$\left| X(z_{1},\ldots ,z_{j},\ldots ,z_{n_{Z}}) - X(z_{1},\ldots ,\hat{z}_{j},\ldots ,z_{n_{Z}})\right| \le \frac{1}{n_{Z}}, $$

and it provides assumption (12) of Lemma 1. Therefore, for any $\tau > 0$

$${\mathrm {Pr}}\left( {\mathbb {E}}_{z_{1}|Y_{n_{Y}}}\left[ g(z_{1},Y_{n_{Y}}|Y_{n_{Y}})\right] - \frac{1}{n_{Z}}\sum _{i=1}^{n_{Z}}g(z_{i},Y_{n_{Y}}|Y_{n_{Y}}) \ge \tau \right) \le \exp \left( -2n_{Z}\tau ^{2}\right) , $$

and by the law of total probability

$$\begin{aligned} \begin{array}{l} {\mathbb {E}}\left[ {\mathrm {Pr}}\left( {\mathbb {E}}_{z_{1}|Y_{n_{Y}}}\left[ g(z_{1},Y_{n_{Y}}|Y_{n_{Y}})\right] - \frac{1}{n_{Z}}\sum _{i=1}^{n_{Z}}g(z_{i},Y_{n_{Y}}|Y_{n_{Y}}) \ge \tau \right) \right] \\ \quad = {\mathrm {Pr}}\left( {\mathbb {E}}_{z_{1}|Y_{n_{Y}}}\left[ g(z_{1},Y_{n_{Y}})\right] - \frac{1}{n_{Z}}\sum _{i=1}^{n_{Z}}g(z_{i},Y_{n_{Y}}) \ge \tau \right) \le \exp \left( -2n_{Z}\tau ^{2}\right) \\ \end{array} \end{aligned}$$

Next, setting $\delta = \exp \left( -2n_{Z}\tau ^{2}\right) $, and solving for $\tau $, the following result is obtained:

$$\tau = \sqrt{\frac{\ln 1/\delta }{2n_{Z}}}.$$

Therefore,

$${\mathrm {Pr}}\left( {\mathbb {E}}_{z_{1}|Y_{n_{Y}}}\left[ g(z_{1},Y_{n_{Y}})\right] \le \frac{1}{n_{Z}}\sum _{i=1}^{n_{Z}}g(z_{i},Y_{n_{Y}}) + \sqrt{\frac{\ln 1/\delta }{2n_{Z}}}\right) \ge 1-\delta . $$

(15)

However, Theorem 1 provides a probabilistic upper bound for ${\mathbb {E}}_{x|Y_{n_{Y}}}\left[ g(x,Y_{n_{Y}})\right] $. First, recall that $z_{1} \sim Y_{mix}$ and note that

$${\mathbb {E}}_{(z_{1} \sim Y_{mix})|Y_{n_{Y}}}\left[ g\left( z_{1},Y_{n_{Y}}\right) \right] = (1-\alpha ){\mathbb {E}}_{(z_{1} \sim Y_{nor})|Y_{n_{Y}}}\left[ g\left( z_{1},Y_{n_{Y}}\right) \right] + \alpha {\mathbb {E}}_{(z_{1} \sim Y_{out})|Y_{n_{Y}}}\left[ g\left( z_{1},Y_{n_{Y}}\right) \right] . $$

Then, since ${\mathbb {E}}_{(z_{1} \sim Y_{nor})|Y_{n_{Y}}}\left[ g\left( z_{1},Y_{n_{Y}}\right) \right] = {\mathbb {E}}_{x|Y_{n_{Y}}}\left[ g\left( x,Y_{n_{Y}}\right) \right] $, for $\alpha >0$,

$${\mathbb {E}}_{x|Y_{n_{Y}}}\left[ g\left( x,Y_{n_{Y}}\right) \right] \le \frac{1}{1-\alpha }{\mathbb {E}}_{(z_{1} \sim Y_{mix})|Y_{n_{Y}}}\left[ g\left( z_{1},Y_{n_{Y}}\right) \right] . $$

(16)

Consequently, combining (15) and (16), and for $r \ge \alpha $, we obtain

$${\mathrm {Pr}}\left( {\mathbb {E}}_{x|Y_{n_{Y}}}\left[ g(x,Y_{n_{Y}})\right] \le \frac{1}{1-r}\left[ \frac{1}{n_{Z}}\sum _{i=1}^{n_{Z}}g(z_{i},Y_{n_{Y}}) + \sqrt{\frac{\ln 1/\delta }{2n_{Z}}}\right] \right) \ge 1-\delta ,$$

which completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sguera, C., Galeano, P. & Lillo, R.E. Functional outlier detection by a local depth with application to NO_x levels. Stoch Environ Res Risk Assess 30, 1115–1130 (2016). https://doi.org/10.1007/s00477-015-1096-3

Download citation

Published: 13 June 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s00477-015-1096-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Functional outlier detection by a local depth with application to NO_x levels

Abstract

Access this article

Similar content being viewed by others

Outlier Detection for Geostatistical Functional Data: An Application to Sensor Data

Comparison of local outlier detection techniques in spatial multivariate data

Outlier detection in spatial error models using modified thresholding-based iterative procedure for outlier detection approach

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 From \(FSD(x, Y_{n})\) to \(KFSD(x, Y_{n})\)

1.2 Proof of theorem 1

Lemma 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Functional outlier detection by a local depth with application to NO x levels

Abstract

Access this article

Similar content being viewed by others

Outlier Detection for Geostatistical Functional Data: An Application to Sensor Data

Comparison of local outlier detection techniques in spatial multivariate data

Outlier detection in spatial error models using modified thresholding-based iterative procedure for outlier detection approach

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 From \(FSD(x, Y_{n})\) to \(KFSD(x, Y_{n})\)

1.2 Proof of theorem 1

Lemma 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Functional outlier detection by a local depth with application to NO_x levels