Statistics and Computing

, Volume 22, Issue 1, pp 325–336 | Cite as

The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator

  • Steffen LiebscherEmail author
  • Thomas Kirschstein
  • Claudia Becker


Self-organizing maps (SOMs) introduced by Kohonen (Biol. Cybern. 43(1):59–69, 1982) are well-known in the field of artificial neural networks. The way SOMs are performing is very intuitive, leading to great popularity and numerous applications (related to statistics: classification, clustering). The result of the unsupervised learning process performed by SOMs is a non-linear, low-dimensional projection of the high-dimensional input data, that preserves certain features of the underlying data, e.g. the topology and probability distribution (Lee and Verleysen in Nonlinear Dimensionality Reduction, Springer, 2007; Kohonen in Self-organizing Maps, 3rd edn., Springer, 2001).

With the U-matrix Ultsch (Information and Classification: Concepts, Methods and Applications, pp. 307–313, Springer, 1993) introduced a powerful visual representation of the SOM results. We propose an approach that utilizes the U-matrix to identify outlying data points. Then the revised subsample (i.e. the initial sample minus the outlying points) is used to give a robust estimation of location and scatter.


Outlier detection Computational complexity Minimum covariance determinant 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Atkinson, A.C.: Fast very robust methods for the detection of multiple outliers. J. Am. Stat. Assoc. 89(428), 1329–1339 (1994) zbMATHCrossRefGoogle Scholar
  2. Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, New York (2000) Google Scholar
  3. Bartkowiak, A., Szustalewicz, A.: The grand tour as a method for detecting multivariate outliers. Mach. Graph. Vis. 6, 487–505 (1997) Google Scholar
  4. Becker, C., Gather, U.: The masking breakdown point of multivariate outlier identification rules. J. Am. Stat. Assoc. 94, 947–955 (1999) MathSciNetzbMATHCrossRefGoogle Scholar
  5. Becker, C., Gather, U.: The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Comput. Stat. Data Anal. 36(1), 119–127 (2001) MathSciNetzbMATHCrossRefGoogle Scholar
  6. Becker, C., Paris Scholz, S.: MVE, MCD, and MZE: A simulation study comparing convex body minimizers. Allg. Stat. Arch. 88(2), 155–162 (2004) MathSciNetGoogle Scholar
  7. Becker, C., Paris Scholz, S.: Deepest points and least deep points: robustness and outliers with MZE. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds.) From Data and Information Analysis to Knowledge Engineering, pp. 254–261. Springer, Berlin (2006) CrossRefGoogle Scholar
  8. Cottrell, M., Fort, J.C., Pagès, G.: Theoretical aspects of the SOM algorithm. Neurocomputing 21(1–3), 119–138 (1998) zbMATHCrossRefGoogle Scholar
  9. Croux, C., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87(3), 603 (2000) MathSciNetzbMATHCrossRefGoogle Scholar
  10. Davies, P.: Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. Ann. Stat. 15(3), 1269–1292 (1987) zbMATHCrossRefGoogle Scholar
  11. Davies, P.: The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator. Ann. Stat. 20(4), 1828–1843 (1992) zbMATHCrossRefGoogle Scholar
  12. Davies, P., Gather, U.: Breakdown and groups. Ann. Stat. 33(3), 977–988 (2005) MathSciNetzbMATHCrossRefGoogle Scholar
  13. Davies, P., Gather, U.: Addendum to the discussion of “breakdown and groups”. Ann. Stat. 34(3), 1577–1579 (2006) MathSciNetzbMATHCrossRefGoogle Scholar
  14. Erwin, E., Obermeyer, K., Schulten, K.: Convergence properties of self-organizing maps. Artif. Neural Netw. 1, 409–414 (1991) Google Scholar
  15. Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps: ordering, convergence properties and energy functions. Biol. Cybern. 67(1), 47–55 (1992a) zbMATHCrossRefGoogle Scholar
  16. Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps: stationary states, metastability and convergence rate. Biol. Cybern. 67(1), 35–45 (1992b) zbMATHCrossRefGoogle Scholar
  17. Fort, J.: SOM’s mathematics. Neural Netw. 19(6–7), 812–816 (2006). Advances in Self Organising Maps—WSOM’05 zbMATHCrossRefGoogle Scholar
  18. Fung, W.-K.: Unmasking outliers and leverage points: a confirmation. J. Am. Stat. Assoc. 88(422), 515–519 (1993) MathSciNetCrossRefGoogle Scholar
  19. Gather, U., Becker, C.: Outlier identification and robust methods. In: Maddala, G., Rao, C. (eds.) Robust Inference. Handbook of Statistics, vol. 15, pp. 123–143 (1997) Google Scholar
  20. Hampel, F., Ronchetti, E., Rousseeuw, P., Stahel, W.: Robust Statistics: The Approach based on Influence Functions. Wiley, New York (2005) CrossRefGoogle Scholar
  21. Hardin, J., Rocke, D.: The distribution of robust distances. J. Comput. Graph. Stat. 14(4), 928–946 (2005) MathSciNetCrossRefGoogle Scholar
  22. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York (2009) Google Scholar
  23. Hertz, J., Palmer, R.G., Krogh, A.S.: Introduction to the Theory of Neural Computation. Perseus Publishing, New York (1991) Google Scholar
  24. Hubert, M., Rousseeuw, P., Aelst, S.: High-breakdown robust multivariate methods. Stat. Sci. 23(1), 92–119 (2008) CrossRefGoogle Scholar
  25. Kaski, S., Kangas, J., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Comput. Surv. 1(3&4), 1–176 (1998) Google Scholar
  26. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982) MathSciNetzbMATHCrossRefGoogle Scholar
  27. Kohonen, T.: Self-organizing Maps, 3rd edn. Springer, Berlin (2001) zbMATHCrossRefGoogle Scholar
  28. Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J.: Som-pak: the self-organizing map program package. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science (1996) Google Scholar
  29. Koshevoy, G., Mosler, K.: Zonoid trimming for multivariate distributions. Ann. Stat. 25(5), 1998–2017 (1997) MathSciNetzbMATHCrossRefGoogle Scholar
  30. Koshevoy, G., Mosler, K.: Lift zonoids, random convex hulls, and the variability of random vectors. Bernoulli 4(3), 377–399 (1998) MathSciNetzbMATHCrossRefGoogle Scholar
  31. Koshevoy, G., Möttönen, J., Oja, H.: A scatter matrix estimate based on the zonotope. Ann. Stat. 31(5), 1439–1459 (2003) zbMATHCrossRefGoogle Scholar
  32. Lee, J., Verleysen, M.: Nonlinear Dimensionality Reduction. Springer, Berlin (2007) zbMATHCrossRefGoogle Scholar
  33. Lopuhaä, H.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27(5), 1638–1665 (1999) zbMATHCrossRefGoogle Scholar
  34. Lopuhaä, H., Rousseeuw, P.: Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Stat. 19(1), 229–248 (1991) zbMATHCrossRefGoogle Scholar
  35. Nag, A., Mitra, A., Mitra, S.: Multiple outlier detection in multivariate data using self-organizing maps title. Comput. Stat. 20(2), 245–264 (2005) MathSciNetzbMATHCrossRefGoogle Scholar
  36. Oja, M., Kaski, S., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1998–2001 addendum. Neural Comput. Surv. 1, 1–176 (2002) Google Scholar
  37. Paris Scholz, S.: Robustness concepts and investigations for estimators of convex bodies. PhD thesis (2002) Google Scholar
  38. Pison, G., van Aelst, S., Willems, G.: Small sample corrections for LTS and MCD. Metrika 55, 111–123 (2002) MathSciNetCrossRefGoogle Scholar
  39. Pöllä, M., Honkela, T., Kohonen, T.: Bibliography of self-organizing map (som) papers: 2002–2005 addendum. Technical Report TKK-ICS-R23, Helsinki University of Technology, Department of Information and Computer Science, Espoo, Finland (2009) Google Scholar
  40. Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc., Ser. B, Stat. Methodol. 71(2), 447–466 (2009) MathSciNetCrossRefGoogle Scholar
  41. Rocke, D.: Robustness properties of S-estimators of multivariate location and shape in high dimension. Ann. Stat. 24(3), 1327–1345 (1996) MathSciNetzbMATHCrossRefGoogle Scholar
  42. Rocke, D.M., Woodruff, D.L.: Identification of outliers in multivariate data. J. Am. Stat. Assoc. 91(435), 1047–1061 (1996) MathSciNetzbMATHCrossRefGoogle Scholar
  43. Rojas, R.: Theorie der neuronalen Netze: eine systematische Einführung. Springer, Berlin (1993) Google Scholar
  44. Rousseeuw, P.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283–297 (1985) MathSciNetCrossRefGoogle Scholar
  45. Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection. Wiley, New York (1987) zbMATHCrossRefGoogle Scholar
  46. Rousseeuw, P., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999) CrossRefGoogle Scholar
  47. Rousseeuw, P., van Zomeren, B.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990) CrossRefGoogle Scholar
  48. Ultsch, A.: Self-organizing neural networks for visualization and classification. In: Opitz, O., Lausen, B., Klar, R. (eds.) Information and Classification: Concepts, Methods and Applications, pp. 307–313. Springer, Berlin (1993) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Steffen Liebscher
    • 1
    Email author
  • Thomas Kirschstein
    • 1
  • Claudia Becker
    • 1
  1. 1.Martin-Luther-UniversityHalle/SaaleGermany

Personalised recommendations