Comparing Fuzzy-C Means and K-Means Clustering Techniques: A Comprehensive Study

  • Sandeep Panda
  • Sanat Sahu
  • Pradeep Jena
  • Subhagata Chattopadhyay
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 166)

Abstract

Clustering techniques are unsupervised learning methods of grouping similar from dissimilar data types. Therefore, these are popular for various data mining and pattern recognition purposes. However, their performances are data dependent. Thus, choosing right clustering technique for a given dataset is a research challenge. In this paper, we have tested the performances of a Soft clustering (e.g., Fuzzy C means or FCM) and a Hard clustering technique (e.g., K-means or KM) on Iris (150 x 4); Wine (178 x 13) and Lens (24 x 4) datasets. Distance measure is the heart of any clustering algorithm to compute the similarity between any two data. Two distance measures such as Manhattan (MH) and Euclidean (ED) are used to note how these influence the overall clustering performance. The performance has been compared based on seven parameters: (i) sensitivity, (ii) specificity, (iii) precision, (iv) accuracy, (v) run time, (vi) average intra cluster distance (i.e. compactness of the clusters) and (vii) inter cluster distance (i.e. distinctiveness of the clusters). Based on the experimental results, the paper concludes that both KM and FCM have performed well. However, KM outperforms FCM in terms of speed. FCM-MH combination produces most compact clusters, while KM-ED yields most distinct clusters.

Keywords

Clustering FCM KM Distance measures Performance test 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bezdek, J.C.: Fuzzy mathematics in pattern classification. Applied Mathematics Centre, Cornell University, Ithaca. PhD thesis (1973)Google Scholar
  2. 2.
    Keller, J., Gary, M.R., Givens, J.A.: A fuzzy k-nearest neighbor algorithm. IEEE Tr. Syst. Man Cyber. 15(4), 580–585 (1985)CrossRefGoogle Scholar
  3. 3.
    Yao, J., Dash, M., Tan, S.T., Liu, H.: Entropy-based fuzzy clustering and fuzzy modeling. Fuzzy Sets and Systems 113, 381–388 (2000)CrossRefMATHGoogle Scholar
  4. 4.
    MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, CA, pp. 281–297 (1967)Google Scholar
  5. 5.
    Sibson, R.: SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal (British Computer Society) 16(1), 364–366 (1973)MathSciNetGoogle Scholar
  6. 6.
    Chattopadhyay, S., Pratihar, D.K., De Sarkar, S.C.: A comparative study of fuzzy C-means algorithm and entropy-based fuzzy clustering algorithm. Computing and Informatics 30(4), 701–720 (2011)Google Scholar
  7. 7.
    http://archive.ics.uci.edu/ml/ (Online; last accessed on December 23, 2011)
  8. 8.
    Han, J., Kamber, M. (eds.): Data Mining Concepts and Techniques, 2nd edn. Elsevier, San Fransisco (2006)MATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  • Sandeep Panda
    • 1
  • Sanat Sahu
    • 1
  • Pradeep Jena
    • 1
  • Subhagata Chattopadhyay
    • 1
  1. 1.Dept. of Computer Science and EngineeringNational Institute of Science and TechnologyBerhampurIndia

Personalised recommendations