Advertisement

Statistics and Computing

, 19:341 | Cite as

Controlling the size of multivariate outlier tests with the MCD estimator of scatter

  • Andrea CerioliEmail author
  • Marco Riani
  • Anthony C. Atkinson
Article

Abstract

Multivariate outlier detection requires computation of robust distances to be compared with appropriate cut-off points. In this paper we propose a new calibration method for obtaining reliable cut-off points of distances derived from the MCD estimator of scatter. These cut-off points are based on a more accurate estimate of the extreme tail of the distribution of robust distances. We show that our procedure gives reliable tests of outlyingness in almost all situations of practical interest, provided that the sample size is not much smaller than 50. Therefore, it is a considerable improvement over all the available MCD procedures, which are unable to provide good control over the size of multiple outlier tests for the data structures considered in this paper.

Keywords

Minimum covariance determinant estimator Robust distances Multiple outliers Simultaneous testing Calibration factor Simulation 

References

  1. Arsenis, S., Perrotta, D., Torti, F.: Price outliers in EU external trade data. Internal working document on work presented at the “Enlargement and Integration Workshop 2005”, Joint Research Centre of the European Commission, http://theseus.jrc.it/index.php?id=1298 (2005)
  2. Becker, C., Gather, U.: The masking breakdown point of multivariate outlier identification rules. J. Am. Stat. Assoc. 94, 947–955 (1999) zbMATHCrossRefMathSciNetGoogle Scholar
  3. Becker, C., Gather, U.: The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Comput. Stat. Data Anal. 36, 119–127 (2001) zbMATHCrossRefMathSciNetGoogle Scholar
  4. Butler, R.W., Davies, P.L., Jhun, M.: Asymptotics for the minimum covariance determinant estimator. Ann. Stat. 21, 1385–1400 (1993) zbMATHCrossRefMathSciNetGoogle Scholar
  5. Cohen Freue, G.V., Hollander, Z., Shen, E., Zamar, R.H., Balshaw, R., Scherer, A., McManus, B., Keown, P., McMaster, W.R., Ng, R.T.: MDQC: A new quality assessment method for microarrays based on quality control reports. Bioinformatics 23, 3162–3169 (2007) CrossRefGoogle Scholar
  6. Croux, H., Haesbroeck, G.: Influence function and efficiency of the minimum covariance determinant scatter matrix estimator. J. Multivar. Anal. 71, 161–190 (1999) zbMATHCrossRefMathSciNetGoogle Scholar
  7. Croux, H., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies. Biometrika 87, 603–618 (2000) zbMATHCrossRefMathSciNetGoogle Scholar
  8. Hardin, J., Rocke, D.M.: The distribution of robust distances. J. Comput. Graph. Stat. 14, 910–927 (2005) CrossRefMathSciNetGoogle Scholar
  9. Lopuhaä, H.P.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27, 1638–1665 (1999) zbMATHCrossRefGoogle Scholar
  10. Matsumoto, M., Nishimura, T.: Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul. 8, 3–30 (1998) zbMATHCrossRefGoogle Scholar
  11. Pison, G., Van Aelst, S.: Diagnostic plots for robust multivariate methods. J. Comput. Graph. Stat. 13, 310–329 (2004) CrossRefGoogle Scholar
  12. Pison, G., Van Aelst, S., Willems, G.: Small sample corrections for LTS and MCD. Metrika 55, 111–123 (2002) CrossRefMathSciNetGoogle Scholar
  13. Riani, M., Cerioli, A., Atkinson, A., Perrotta, D., Torti, F.: Fitting mixtures of regression lines with the Forward Search. In: Fogelman-Soulié, F., Perrotta, D., Piskorski, J., Steinberger, R. (eds.) Mining Massive Data Sets for Security. IOS Press, Amsterdam (2008) Google Scholar
  14. Riani, M., Atkinson, A.C., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc. Ser. B 71 (2009) Google Scholar
  15. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987) zbMATHCrossRefGoogle Scholar
  16. Rousseeuw, P.J., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999) CrossRefGoogle Scholar
  17. Rousseeuw, P.J., Van Zomeren, B.C.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85, 633–9 (1990) CrossRefGoogle Scholar
  18. Rousseeuw, P.J., Van Aelst, S., Van Driessen, K., Agulló, J.: Robust multivariate regression. Technometrics 46, 293–305 (2004) CrossRefMathSciNetGoogle Scholar
  19. Šidák, Z.: Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 62, 626–633 (1967) zbMATHCrossRefGoogle Scholar
  20. Todorov, V.: Robust selection of variables in linear discriminant analysis. Stat. Methods Appl. 15, 395–407 (2006) CrossRefMathSciNetGoogle Scholar
  21. Todorov, V.: A note on the MCD consistency and small sample correction factors. Unpublished manuscript (2008, in preparation) Google Scholar
  22. Todorov, V., Filzmoser, P.: Robust statistics for the one-way MANOVA. Unpublished manuscript (2008, submitted for publication) Google Scholar
  23. Willems, G., Pison, G., Rousseeuw, P.J., Van Aelst, S.: A robust Hotelling test. Metrika 55, 125–138 (2002) CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Andrea Cerioli
    • 1
    Email author
  • Marco Riani
    • 1
  • Anthony C. Atkinson
    • 2
  1. 1.Dipartimento di EconomiaUniversità di ParmaParmaItaly
  2. 2.Department of StatisticsThe London School of EconomicsLondonUK

Personalised recommendations