Multivariate functional outlier detection

Abstract

Functional data are occurring more and more often in practice, and various statistical techniques have been developed to analyze them. In this paper we consider multivariate functional data, where for each curve and each time point a \(p\)-dimensional vector of measurements is observed. For functional data the study of outlier detection has started only recently, and was mostly limited to univariate curves \((p=1)\). In this paper we set up a taxonomy of functional outliers, and construct new numerical and graphical techniques for the detection of outliers in multivariate functional data, with univariate curves included as a special case. Our tools include statistical depth functions and distance measures derived from them. The methods we study are affine invariant in \(p\)-dimensional space, and do not assume elliptical or any other symmetry.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30

References

  1. Arribas-Gil A, Romo J (2014) Shape outlier detection and visualization for functional data: the outliergram. Biostatistics 15(4):603–619

    Article  Google Scholar 

  2. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  3. Bai ZD, He X (1999) Asymptotic distributions of the maximal depth estimators for regression and multivariate location. Ann Stat 27(5):1616–1637

    MathSciNet  Article  MATH  Google Scholar 

  4. Berrendero J, Justel A, Svarc M (2011) Principal components for multivariate functional data. Comput Stat Data Anal 55(9):2619–2634

    MathSciNet  Article  Google Scholar 

  5. Brys G, Hubert M, Struyf A (2004) A robust measure of skewness. J Comput Graph Stat 13:996–1017

    MathSciNet  Article  Google Scholar 

  6. Brys G, Hubert M, Rousseeuw PJ (2005) A robustification of independent component analysis. J Chemom 19:364–375

    Article  Google Scholar 

  7. Claeskens G, Hubert M, Slaets L, Vakili K (2014) Multivariate functional halfspace depth. J Am Stat Assoc 109(505):411–423

    MathSciNet  Article  Google Scholar 

  8. Cuevas A, Febrero M, Fraiman R (2006) On the use of the bootstrap for estimating functions with functional data. Comput Stat Data Anal 51(2):1063–1074

    MathSciNet  Article  MATH  Google Scholar 

  9. Cuevas A, Febrero M, Fraiman R (2007) Robust estimation and classification for functional data via projection-based depth notions. Comput Stat 22:481–496

    MathSciNet  Article  MATH  Google Scholar 

  10. Dang X, Serfling R (2010) Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. J Stat Plan Inference 140(1):198–213

    MathSciNet  Article  MATH  Google Scholar 

  11. Donoho D (1982) Breakdown properties of multivariate location estimators. PhD Qualifying paper. Dept Statistics, Harvard University, Boston

  12. Donoho D, Gasko M (1992) Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann Stat 20(4):1803–1827

    MathSciNet  Article  MATH  Google Scholar 

  13. Dyrby M, Engelsen S, Nørgaard L, Bruhn M, Lundsberg-Nielsen L (2002) Chemometric quantization of the active substance in a pharmaceutical tablet using near-infrared (NIR) transmittance and NIR FT-Raman spectra. Appl Spectrosc 56(5):579–585

    Article  Google Scholar 

  14. Esbensen K (2001) Multivariate data analysis in practice, 5th edn. Camo Software, Trondheim, Norway

  15. Febrero-Bande M, Galeano P, González-Manteiga W (2008) Outlier detection in functional data by depth measures, with application to identify abnormal \({\rm NO}_x\) levels. Environmetrics 19(4):331–345

    MathSciNet  Article  Google Scholar 

  16. Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York

    Google Scholar 

  17. Fraiman R, Muniz G (2001) Trimmed means for functional data. Test 10:419–440

    MathSciNet  Article  MATH  Google Scholar 

  18. Hallin M, Paindaveine D, Šiman M (2010) Multivariate quantiles and multiple-output regression quantiles: from \(L_1\) optimization to halfspace depth. Ann Stat 38(2):635–669

    Article  MATH  Google Scholar 

  19. He X, Wang G (1997) Convergence of depth contours for multivariate datasets. Ann Stat 25(2):495–504

    Article  MATH  Google Scholar 

  20. Hubert M, Vandervieren E (2008) An adjusted boxplot for skewed distributions. Comput Stat Data Anal 52(12):5186–5201

    MathSciNet  Article  MATH  Google Scholar 

  21. Hubert M, Van der Veeken S (2008) Outlier detection for skewed data. J Chemom 22:235–246

    Article  Google Scholar 

  22. Hubert M, Van der Veeken S (2010) Robust classification for skewed data. Adv Data Anal Classif 4:239–254

    MathSciNet  Article  MATH  Google Scholar 

  23. Hubert M, Claeskens G, De Ketelaere B, Vakili K (2012) A new depth-based approach for detecting outlying curves. In: Colubi A, Fokianos K, Gonzalez-Rodriguez G, Kontoghiorghes E (eds) Proceedings of COMPSTAT 2012, pp 329–340

  24. Hyndman R (1996) Computing and graphing highest density regions. Am Stat 50:120–126

    Google Scholar 

  25. Hyndman R, Shang H (2010) Rainbow plots, bagplots, and boxplots for functional data. J Comput Graph Stat 19(1):29–45

    MathSciNet  Article  Google Scholar 

  26. Ieva F, Paganoni AM (2013) Depth measures for multivariate functional data. Commun Stat Theory Methods 42(7):1265–1276

    MathSciNet  Article  MATH  Google Scholar 

  27. Larsen F, van den Berg F, Engelsen S (2006) An exploratory chemometric study of NMR spectra of table wines. J Chemom 20(5):198–208

    Article  Google Scholar 

  28. Liu R (1990) On a notion of data depth based on random simplices. Ann Stat 18(1):405–414

    Article  MATH  Google Scholar 

  29. Liu X, Zuo Y (2014) Computing halfspace depth and regression depth. Commun Stat Simul Comput 43(5):969–985

    MathSciNet  Article  MATH  Google Scholar 

  30. López-Pintado S, Romo J (2009) On the concept of depth for functional data. J Am Stat Assoc 104:718–734

    Article  Google Scholar 

  31. López-Pintado S, Romo J (2011) A half-region depth for functional data. Comput Stat Data Anal 55:1679–1695

    Article  Google Scholar 

  32. López-Pintado S, Sun Y, Lin J, Genton M (2014) Simplicial band depth for multivariate functional data. Adv Data Anal Classif 8:321–338

    MathSciNet  Article  Google Scholar 

  33. Maronna R, Martin D, Yohai V (2006) Robust statistics: theory and methods. Wiley, New York

    Google Scholar 

  34. Massé JC (2004) Asymptotics for the Tukey depth process, with an application to a multivariate trimmed mean. Bernoulli 10(3):397–419

    MathSciNet  Article  MATH  Google Scholar 

  35. Massé JC, Theodorescu R (1994) Halfplane trimming for bivariate distributions. J Multivar Anal 48(2):188–202

    Article  MATH  Google Scholar 

  36. Mizera I, Volauf M (2002) Continuity of halfspace depth contours and maximum depth estimators: diagnostics of depth-related methods. J Multivar Anal 83(2):365–388

    MathSciNet  Article  MATH  Google Scholar 

  37. Mosler K (2013) Depth statistics. In: Becker C, Fried R, Kuhnt S (eds) Robustness and complex data structures, Festschrift in Honour of Ursula Gather. Springer, Berlin, pp 17–34

    Google Scholar 

  38. Paindavaine D, Šiman M (2012) Computing multiple-output regression quantile regions. Comput Stat Data Anal 56:840–853

    Article  Google Scholar 

  39. Pigoli D, Sangalli L (2012) Wavelets in functional data analysis: estimation of multidimensional curves and their derivatives. Comput Stat Data Anal 56(6):1482–1498

    MathSciNet  Article  MATH  Google Scholar 

  40. Ramsay J, Silverman BW (2002) Applied functional data analysis. Springer Series in Statistics. Springer, Berlin

    Google Scholar 

  41. Ramsay J, Silverman BW (2006) Functional data analysis, 2nd edn. Springer, New York

    Google Scholar 

  42. Ramsay JO, Li X (1998) Curve registration. J R Stat Soc Ser B 60(2):351–363

    MathSciNet  Article  MATH  Google Scholar 

  43. Romanazzi M (2001) Influence function of halfspace depth. J Multivar Anal 77:138–161

    MathSciNet  Article  MATH  Google Scholar 

  44. Rousseeuw PJ, Leroy A (1987) Robust regression and outlier detection. Wiley-Interscience, New York

    Google Scholar 

  45. Rousseeuw PJ, Ruts I (1996) Bivariate location depth. Appl Stat 45:516–526

    Article  MATH  Google Scholar 

  46. Rousseeuw PJ, Ruts I (1998) Constructing the bivariate Tukey median. Stat Sin 8:827–839

    MathSciNet  MATH  Google Scholar 

  47. Rousseeuw PJ, Ruts I (1999) The depth function of a population distribution. Metrika 49:213–244

    MathSciNet  MATH  Google Scholar 

  48. Rousseeuw PJ, Struyf A (1998) Computing location depth and regression depth in higher dimensions. Stat Comput 8:193–203

    Article  Google Scholar 

  49. Rousseeuw PJ, Ruts I, Tukey J (1999) The bagplot: a bivariate boxplot. Am Stat 53:382–387

    Google Scholar 

  50. Rousseeuw PJ, Debruyne M, Engelen S, Hubert M (2006) Robustness and outlier detection in chemometrics. Crit Rev Anal Chem 36:221–242

    Article  Google Scholar 

  51. Ruts I, Rousseeuw PJ (1996) Computing depth contours of bivariate point clouds. Comput Stat Data Anal 23:153–168

    Article  MATH  Google Scholar 

  52. Stahel W (1981) Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen. PhD thesis, ETH Zürich

  53. Struyf A, Rousseeuw PJ (1999) Halfspace depth and regression depth characterize the empirical distribution. J Multivar Anal 69(1):135–153

    MathSciNet  Article  MATH  Google Scholar 

  54. Struyf A, Rousseeuw PJ (2000) High-dimensional computation of the deepest location. Comput Stat Data Anal 34(4):415–426

    Article  MATH  Google Scholar 

  55. Sun Y, Genton M (2011) Functional boxplots. J Comput Graph Stat 20(2):316–334

    MathSciNet  Article  Google Scholar 

  56. Tukey J (1977) Exploratory data analysis. Addison-Wesley, Reading, MA

    Google Scholar 

  57. Wang K, Gasser T (1997) Alignment of curves by dynamic time warping. Ann Stat 25(3):1251–1276

    MathSciNet  Article  MATH  Google Scholar 

  58. Zuo Y (2003) Projection-based depth functions and associated medians. Ann Stat 31(5):1460–1490

    Article  MATH  Google Scholar 

  59. Zuo Y, Serfling R (2000a) General notions of statistical depth function. Ann Stat 28:461–482

    MathSciNet  Article  MATH  Google Scholar 

  60. Zuo Y, Serfling R (2000b) On the performance of some robust nonparametric location measures relative to a general notion of multivariate symmetry. J Stat Plan Inference 84:55–79

    MathSciNet  Article  MATH  Google Scholar 

  61. Zuo Y, Serfling R (2000c) Structural properties and convergence results for contours of sample statistical depth functions. Ann Stat 28(2):483–499

    MathSciNet  Article  MATH  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mia Hubert.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hubert, M., Rousseeuw, P.J. & Segaert, P. Multivariate functional outlier detection. Stat Methods Appl 24, 177–202 (2015). https://doi.org/10.1007/s10260-015-0297-8

Download citation

Keywords

  • Depth
  • Diagnostic plot
  • Functional data
  • Graphical display
  • Heatmap
  • Robustness