Advertisement

Problems of Fuzzy c-Means Clustering and Similar Algorithms with High Dimensional Data Sets

  • Roland Winkler
  • Frank Klawonn
  • Rudolf Kruse
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

Fuzzy c-means clustering and its derivatives are very successful on many clustering problems. However, fuzzy c-means clustering and similar algorithms have problems with high dimensional data sets and a large number of prototypes. In particular, we discuss hard c-means, noise clustering, fuzzy c-means with a polynomial fuzzifier function and its noise variant. A special test data set that is optimal for clustering is used to show weaknesses of said clustering algorithms in high dimensions. We also show that a high number of prototypes influences the clustering procedure in a similar way as a high number of dimensions. Finally, we show that the negative effects of high dimensional data sets can be reduced by adjusting the parameter of the algorithms, i.e. the fuzzifier, depending on the number of dimensions.

Keywords

Cluster Algorithm Data Object High Dimensional Data Gradient Descent Algorithm Noise Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: Database theory - ICDT’99, Lecture Notes in Computer Science, vol 1540, Springer, Berlin/Heidelberg, pp 217–235Google Scholar
  2. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New YorkzbMATHCrossRefGoogle Scholar
  3. Dave RN (1991) Characterization and detection of noise in clustering. Pattern Recogn Lett 12(11):657–664CrossRefGoogle Scholar
  4. Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Cybern Syst Int J 3(3):32–57MathSciNetzbMATHCrossRefGoogle Scholar
  5. Durrant RJ, Kabán A (2008) When is ’nearest neighbour’ meaningful: A converse theorem and implications. J Complex 25(4):385–397CrossRefGoogle Scholar
  6. Frigui H, Krishnapuram R (1996) A robust clustering algorithm based on competitive agglomeration and soft rejection of outliers. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 550–555Google Scholar
  7. Gustafson DE, Kessel WC (1978) Fuzzy clustering with a fuzzy covariance matrix. IEEE 17:761–766Google Scholar
  8. Höppner F, Klawonn F, Kruse R, Runkler T (1999) Fuzzy cluster analysis. Wiley, Chichester, EnglandzbMATHGoogle Scholar
  9. Klawonn F, Höppner F (2003) What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier. In: Cryptographic Hardware and Embedded Systems - CHES 2003, Lecture Notes in Computer Science, vol 2779, Springer, Berlin/Heidelberg, pp 254–264Google Scholar
  10. Kruse R, Döring C, Lesot MJ (2007) Advances in fuzzy clustering and its applications. In: Fundamentals of fuzzy clustering. Wiley, pp 3–30Google Scholar
  11. Steinhaus H (1957) Sur la division des corps materiels en parties. Bull Acad Pol Sci, Cl III 4:801–804MathSciNetzbMATHGoogle Scholar
  12. Winkler R, Klawonn F, Kruse R (2011) Fuzzy C-Means in High Dimensional Spaces. International Journal of Fuzzy System Applications (IJFSA), 1(1), 1–16. doi:10.4018/IJFSA.2011010101MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.German Aerospace Center BraunschweigBraunschweigGerman
  2. 2.Ostfalia, University of Applied SciencesWolfenbüttelGerman
  3. 3.Otto-von-Guericke University MagdeburgMagdeburgGerman

Personalised recommendations