Problems of Fuzzy c-Means Clustering and Similar Algorithms with High Dimensional Data Sets
Fuzzy c-means clustering and its derivatives are very successful on many clustering problems. However, fuzzy c-means clustering and similar algorithms have problems with high dimensional data sets and a large number of prototypes. In particular, we discuss hard c-means, noise clustering, fuzzy c-means with a polynomial fuzzifier function and its noise variant. A special test data set that is optimal for clustering is used to show weaknesses of said clustering algorithms in high dimensions. We also show that a high number of prototypes influences the clustering procedure in a similar way as a high number of dimensions. Finally, we show that the negative effects of high dimensional data sets can be reduced by adjusting the parameter of the algorithms, i.e. the fuzzifier, depending on the number of dimensions.
KeywordsCluster Algorithm Data Object High Dimensional Data Gradient Descent Algorithm Noise Cluster
- Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: Database theory - ICDT’99, Lecture Notes in Computer Science, vol 1540, Springer, Berlin/Heidelberg, pp 217–235Google Scholar
- Frigui H, Krishnapuram R (1996) A robust clustering algorithm based on competitive agglomeration and soft rejection of outliers. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 550–555Google Scholar
- Gustafson DE, Kessel WC (1978) Fuzzy clustering with a fuzzy covariance matrix. IEEE 17:761–766Google Scholar
- Klawonn F, Höppner F (2003) What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier. In: Cryptographic Hardware and Embedded Systems - CHES 2003, Lecture Notes in Computer Science, vol 2779, Springer, Berlin/Heidelberg, pp 254–264Google Scholar
- Kruse R, Döring C, Lesot MJ (2007) Advances in fuzzy clustering and its applications. In: Fundamentals of fuzzy clustering. Wiley, pp 3–30Google Scholar