Cluster Analysis on Different Data Sets Using K-Modes and K-Prototype Algorithms

  • R. Madhuri
  • M. Ramakrishna Murty
  • J. V. R. Murthy
  • P. V. G. D. Prasad Reddy
  • Suresh C. Satapathy
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 249)

Abstract

The k-means algorithm is well-known for its efficiency in clustering large data sets and it is restricted to the numerical data types. But the real world is a mixture of various data typed objects. In this paper we implemented algorithms which extend the k-means algorithm to categorical domains by using Modified k-modes algorithm and domains with mixed categorical and numerical values by using k-prototypes algorithm. The Modified k-modes algorithm will replace the means with the modes of the clusters by following three measures like “using a simple matching dissimilarity measure for categorical data”, “replacing means of clusters by modes” and “using a frequency-based method to find the modes of a problem used by the k-means algorithm”. The other algorithm used in this paper is the k-prototypes algorithm which is implemented by integrating the Incremental k-means and the Modified k-modes partition clustering algorithms. All these algorithms reduce the cost function value.

Keywords

Cluster K-means K-modes K-prototypes mixed data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Haung, Z.: Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Canberra, ACT 2601, Australia (1998)Google Scholar
  2. 2.
    He, Z., Deng, S., Xu, X.: Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode. Harbin Institute of Technology, China (2005)Google Scholar
  3. 3.
    Haung, Z.: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data MiningGoogle Scholar
  4. 4.
    Sayal, R., Vijay Kumar, V.: A Novel Similarity Measure for Clustering Categorical Data Sets. International Journal of Computer Applications (2011)Google Scholar
  5. 5.
    Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (2011)Google Scholar
  6. 6.
    Mastrogiannis, N., Giannikos, I., Boutsinas, B., Antzoulatos, G.: CL.E.KMODES: A modified k-modes clustering algorithm. University of Patras, Greece (2009)Google Scholar
  7. 7.
    Khan, S.S., Kant, S.: Computation of Initial Modes for K-modes Clustering Algorithm using Evidence Accumulation (2007)Google Scholar
  8. 8.
    Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson education (2006)Google Scholar
  9. 9.
    He, Z.: Approximation Algorithms for K-Modes Clustering. Harbin Institute of Technology, China (2006)Google Scholar
  10. 10.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • R. Madhuri
    • 1
  • M. Ramakrishna Murty
    • 1
  • J. V. R. Murthy
    • 2
  • P. V. G. D. Prasad Reddy
    • 3
  • Suresh C. Satapathy
    • 4
  1. 1.Dept. of CSEGMR Institute of TechnologyRajamIndia
  2. 2.Dept. of CSEJNTUKKakinadaIndia
  3. 3.Dept. of CS&SEAndhra UniversityVisakhapatnamIndia
  4. 4.Dept. of CSEANITSVisakhapatnamIndia

Personalised recommendations