Multivariate Outlier Identification Based on Robust Estimators of Location and Scatter

  • Claudia Becker
  • Steffen Liebscher
  • Thomas Kirschstein

Abstract

Real-life data often contain some observations not consistent with the main bulk of the rest. Since classical statistical procedures often react sensitive against so-called outliers, the use of outlier identification methods based on robust statistical estimators is recommended. One class of such robust estimators is constructed according to the principle of subset selection, meaning that an outlier-free subset of the data is identified first which can then be used to discard or downweight deviating observations in order to robustly estimate the parameters of interest. Such approaches also deliver outlier identification methods. The general approach is presented and three methods are discussed which are developed especially for cases where there are no special restrictions on the data structure given by the main bulk of the observations.

References

  1. Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York: Wiley. MATHGoogle Scholar
  2. Becker, C., & Gather, U. (1999). The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association, 94, 947–955. MathSciNetMATHCrossRefGoogle Scholar
  3. Becker, C., & Gather, U. (2001). The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Computational Statistics & Data Analysis, 36, 119–127. MathSciNetMATHCrossRefGoogle Scholar
  4. Becker, C., & Paris Scholz, S. (2006). Deepest points and least deep points: robustness and outliers with MZE. In M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nürnberger, & W. Gaul (Eds.), From data and information analysis to knowledge engineering (pp. 254–261). Heidelberg: Springer. CrossRefGoogle Scholar
  5. Bennett, M., & Willemain, T. (2001). Resistant estimation of multivariate location using minimum spanning trees. Journal of Statistical Computation and Simulation, 69, 19–40. MathSciNetMATHCrossRefGoogle Scholar
  6. Choudhury, D. R., & Das, M. N. (1992). Use of combinatorics for unique detection of unknown number of outliers using group tests. Sankhya. Series B, 54, 92–99. MATHGoogle Scholar
  7. Dang, X., & Serfling, R. (2010). Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. Journal of Statistical Planning and Inference, 140, 782–801. MathSciNetCrossRefGoogle Scholar
  8. Davies, P. L., & Gather, U. (1993). The identification of multiple outliers. Journal of the American Statistical Association, 88, 782–801. MathSciNetMATHCrossRefGoogle Scholar
  9. Delaunay, B. (1934). Sur la sphere vide. Izvestiâ Akademii Nauk SSSR. Otdelenie Tehničeskih Nauk, 7, 793–800. Google Scholar
  10. Fieller, N. R. J. (1976). Some problems related to the rejection of outlying observations. Ph.D. Thesis, University of Hull, Hull. Google Scholar
  11. Gather, U., & Becker, C. (1997). Outlier identification and robust methods. In G. S. Maddala & C. R. Rao (Eds.), Handbook of statistics 15: robust inference (pp. 123–143). Amsterdam: Elsevier. CrossRefGoogle Scholar
  12. Hampel, F. R., Rousseeuw, P. J., Ronchetti, E., & Stahel, W. (1986). Robust statistics. The approach based on influence functions. New York: Wiley. MATHGoogle Scholar
  13. Hawkins, D. M. (1973). Repeated testing for outliers. Statistica Neerlandica, 27, 1–10. MathSciNetMATHCrossRefGoogle Scholar
  14. Hawkins, D. M. (1980). Identification of outliers. London: Chapman & Hall. MATHCrossRefGoogle Scholar
  15. Hubert, M., Rousseeuw, P. J., & van Aelst, S. (2008). High-breakdown robust multivariate methods. Statistical Science, 23, 92–119. MathSciNetCrossRefGoogle Scholar
  16. Jungnickel, D. (2008). Graphs, networks and algorithms (3rd ed.). Heidelberg: Springer. MATHCrossRefGoogle Scholar
  17. Kirschstein, T., Liebscher, S., & Becker, C. (2013). Robust estimation of location and scatter by pruning the minimum spanning tree. Submitted for publication. Google Scholar
  18. Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59–69. MathSciNetMATHCrossRefGoogle Scholar
  19. Kohonen, T. (2001). Self-organizing maps (3rd ed.). Berlin: Springer. MATHCrossRefGoogle Scholar
  20. Kruskal, J. (1956). On the shortest spanning subtree and the traveling salesman problem. Proceedings of the American Mathematical Society, 7, 48–50. MathSciNetMATHCrossRefGoogle Scholar
  21. Liebscher, S., Kirschstein, T., & Becker, C. (2012). The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator. Statistics and Computing, 22, 325–336. doi: 10.1007/s11222-011-9250-3. MathSciNetCrossRefGoogle Scholar
  22. Liebscher, S., Kirschstein, T., & Becker, C. (2013). Rdela—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point. Statistics and Computing doi: 10.1007/s11222-012-9337-5. Google Scholar
  23. Lopuhaä, H. P., & Rousseeuw, P. J. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics, 19, 229–248. MathSciNetMATHCrossRefGoogle Scholar
  24. Mara, W. (2011). The Chernobyl disaster: legacy and impact on the future of nuclear energy. New York: Marshall Cavendish. Google Scholar
  25. Murphy, R. B. (1951). On tests for outlying observations. Ph.D. Thesis, Princeton University, Ann Arbor. Google Scholar
  26. Pearson, E. S., & Chandra Sekar, C. (1936). The efficiency of statistical tools and a criterion for the rejection of outlying observations. Biometrika, 28, 308–320. MATHGoogle Scholar
  27. Rosner, B. (1975). On the detection of many outliers. Technometrics, 17, 221–227. MathSciNetMATHCrossRefGoogle Scholar
  28. Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. In W. Grossman, G. Pflug, I. Vincze, & W. Wertz (Eds.), Mathematical statistics and applications (pp. 283–297). Dordrecht: Reidel. CrossRefGoogle Scholar
  29. Ultsch, A. (1993). Self-organizing neural networks for visualization and classification. In O. Opitz, B. Lausen, & R. Klar (Eds.), Information and classification: concepts (pp. 307–313). Berlin: Springer. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Claudia Becker
    • 1
  • Steffen Liebscher
    • 1
  • Thomas Kirschstein
    • 1
  1. 1.Martin-Luther-University Halle-WittenbergHalleGermany

Personalised recommendations