Statistics and Computing

, Volume 22, Issue 1, pp 325–336 | Cite as

The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator

  • Steffen Liebscher
  • Thomas Kirschstein
  • Claudia Becker
Article

Abstract

Self-organizing maps (SOMs) introduced by Kohonen (Biol. Cybern. 43(1):59–69, 1982) are well-known in the field of artificial neural networks. The way SOMs are performing is very intuitive, leading to great popularity and numerous applications (related to statistics: classification, clustering). The result of the unsupervised learning process performed by SOMs is a non-linear, low-dimensional projection of the high-dimensional input data, that preserves certain features of the underlying data, e.g. the topology and probability distribution (Lee and Verleysen in Nonlinear Dimensionality Reduction, Springer, 2007; Kohonen in Self-organizing Maps, 3rd edn., Springer, 2001).

With the U-matrix Ultsch (Information and Classification: Concepts, Methods and Applications, pp. 307–313, Springer, 1993) introduced a powerful visual representation of the SOM results. We propose an approach that utilizes the U-matrix to identify outlying data points. Then the revised subsample (i.e. the initial sample minus the outlying points) is used to give a robust estimation of location and scatter.

Keywords

Outlier detection Computational complexity Minimum covariance determinant 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atkinson, A.C.: Fast very robust methods for the detection of multiple outliers. J. Am. Stat. Assoc. 89(428), 1329–1339 (1994) MATHCrossRefGoogle Scholar
  2. Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, New York (2000) Google Scholar
  3. Bartkowiak, A., Szustalewicz, A.: The grand tour as a method for detecting multivariate outliers. Mach. Graph. Vis. 6, 487–505 (1997) Google Scholar
  4. Becker, C., Gather, U.: The masking breakdown point of multivariate outlier identification rules. J. Am. Stat. Assoc. 94, 947–955 (1999) MathSciNetMATHCrossRefGoogle Scholar
  5. Becker, C., Gather, U.: The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Comput. Stat. Data Anal. 36(1), 119–127 (2001) MathSciNetMATHCrossRefGoogle Scholar
  6. Becker, C., Paris Scholz, S.: MVE, MCD, and MZE: A simulation study comparing convex body minimizers. Allg. Stat. Arch. 88(2), 155–162 (2004) MathSciNetGoogle Scholar
  7. Becker, C., Paris Scholz, S.: Deepest points and least deep points: robustness and outliers with MZE. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 254–261. Springer, Berlin (2006) CrossRefGoogle Scholar
  8. Cottrell, M., Fort, J.C., Pagès, G.: Theoretical aspects of the SOM algorithm. Neurocomputing 21(1–3), 119–138 (1998) MATHCrossRefGoogle Scholar
  9. Croux, C., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87(3), 603 (2000) MathSciNetMATHCrossRefGoogle Scholar
  10. Davies, P.: Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. Ann. Stat. 15(3), 1269–1292 (1987) MATHCrossRefGoogle Scholar
  11. Davies, P.: The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator. Ann. Stat. 20(4), 1828–1843 (1992) MATHCrossRefGoogle Scholar
  12. Davies, P., Gather, U.: Breakdown and groups. Ann. Stat. 33(3), 977–988 (2005) MathSciNetMATHCrossRefGoogle Scholar
  13. Davies, P., Gather, U.: Addendum to the discussion of “breakdown and groups”. Ann. Stat. 34(3), 1577–1579 (2006) MathSciNetMATHCrossRefGoogle Scholar
  14. Erwin, E., Obermeyer, K., Schulten, K.: Convergence properties of self-organizing maps. Artif. Neural Netw. 1, 409–414 (1991) Google Scholar
  15. Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps: ordering, convergence properties and energy functions. Biol. Cybern. 67(1), 47–55 (1992a) MATHCrossRefGoogle Scholar
  16. Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps: stationary states, metastability and convergence rate. Biol. Cybern. 67(1), 35–45 (1992b) MATHCrossRefGoogle Scholar
  17. Fort, J.: SOM’s mathematics. Neural Netw. 19(6–7), 812–816 (2006). Advances in Self Organising Maps—WSOM’05 MATHCrossRefGoogle Scholar
  18. Fung, W.-K.: Unmasking outliers and leverage points: a confirmation. J. Am. Stat. Assoc. 88(422), 515–519 (1993) MathSciNetCrossRefGoogle Scholar
  19. Gather, U., Becker, C.: Outlier identification and robust methods. In: Maddala, G., Rao, C. (eds.) Robust Inference. Handbook of Statistics, vol. 15, pp. 123–143 (1997) Google Scholar
  20. Hampel, F., Ronchetti, E., Rousseeuw, P., Stahel, W.: Robust Statistics: The Approach based on Influence Functions. Wiley, New York (2005) CrossRefGoogle Scholar
  21. Hardin, J., Rocke, D.: The distribution of robust distances. J. Comput. Graph. Stat. 14(4), 928–946 (2005) MathSciNetCrossRefGoogle Scholar
  22. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York (2009) Google Scholar
  23. Hertz, J., Palmer, R.G., Krogh, A.S.: Introduction to the Theory of Neural Computation. Perseus Publishing, New York (1991) Google Scholar
  24. Hubert, M., Rousseeuw, P., Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23(1), 92–119 (2008) CrossRefGoogle Scholar
  25. Kaski, S., Kangas, J., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Comput. Surv. 1(3&4), 1–176 (1998) Google Scholar
  26. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982) MathSciNetMATHCrossRefGoogle Scholar
  27. Kohonen, T.: Self-organizing Maps, 3rd edn. Springer, Berlin (2001) MATHCrossRefGoogle Scholar
  28. Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J.: Som-pak: the self-organizing map program package. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science (1996) Google Scholar
  29. Koshevoy, G., Mosler, K.: Zonoid trimming for multivariate distributions. Ann. Stat. 25(5), 1998–2017 (1997) MathSciNetMATHCrossRefGoogle Scholar
  30. Koshevoy, G., Mosler, K.: Lift zonoids, random convex hulls, and the variability of random vectors. Bernoulli 4(3), 377–399 (1998) MathSciNetMATHCrossRefGoogle Scholar
  31. Koshevoy, G., Möttönen, J., Oja, H.: A scatter matrix estimate based on the zonotope. Ann. Stat. 31(5), 1439–1459 (2003) MATHCrossRefGoogle Scholar
  32. Lee, J., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, Berlin (2007) MATHCrossRefGoogle Scholar
  33. Lopuhaä, H.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27(5), 1638–1665 (1999) MATHCrossRefGoogle Scholar
  34. Lopuhaä, H., Rousseeuw, P.: Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Stat. 19(1), 229–248 (1991) MATHCrossRefGoogle Scholar
  35. Nag, A., Mitra, A., Mitra, S.: Multiple outlier detection in multivariate data using self-organizing maps title. Comput. Stat. 20(2), 245–264 (2005) MathSciNetMATHCrossRefGoogle Scholar
  36. Oja, M., Kaski, S., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1998–2001 addendum. Neural Comput. Surv. 1, 1–176 (2002) Google Scholar
  37. Paris Scholz, S.: Robustness concepts and investigations for estimators of convex bodies. PhD thesis (2002) Google Scholar
  38. Pison, G., van Aelst, S., Willems, G.: Small sample corrections for LTS and MCD. Metrika 55, 111–123 (2002) MathSciNetCrossRefGoogle Scholar
  39. Pöllä, M., Honkela, T., Kohonen, T.: Bibliography of self-organizing map (som) papers: 2002–2005 addendum. Technical Report TKK-ICS-R23, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland (2009) Google Scholar
  40. Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc., Ser. B, Stat. Methodol. 71(2), 447–466 (2009) MathSciNetCrossRefGoogle Scholar
  41. Rocke, D.: Robustness properties of S-estimators of multivariate location and shape in high dimension. Ann. Stat. 24(3), 1327–1345 (1996) MathSciNetMATHCrossRefGoogle Scholar
  42. Rocke, D.M., Woodruff, D.L.: Identification of outliers in multivariate data. J. Am. Stat. Assoc. 91(435), 1047–1061 (1996) MathSciNetMATHCrossRefGoogle Scholar
  43. Rojas, R.: Theorie der neuronalen Netze: eine systematische Einführung. Springer, Berlin (1993) Google Scholar
  44. Rousseeuw, P.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283–297 (1985) MathSciNetCrossRefGoogle Scholar
  45. Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection. Wiley, New York (1987) MATHCrossRefGoogle Scholar
  46. Rousseeuw, P., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999) CrossRefGoogle Scholar
  47. Rousseeuw, P., van Zomeren, B.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990) CrossRefGoogle Scholar
  48. Ultsch, A.: Self-organizing neural networks for visualization and classification. In: Opitz, O., Lausen, B., Klar, R. (eds.) Information and Classification: Concepts, Methods and Applications, pp. 307–313. Springer, Berlin (1993) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Steffen Liebscher
    • 1
  • Thomas Kirschstein
    • 1
  • Claudia Becker
    • 1
  1. 1.Martin-Luther-UniversityHalle/SaaleGermany

Personalised recommendations