Cluster Analysis on Different Data Sets Using K-Modes and K-Prototype Algorithms
The k-means algorithm is well-known for its efficiency in clustering large data sets and it is restricted to the numerical data types. But the real world is a mixture of various data typed objects. In this paper we implemented algorithms which extend the k-means algorithm to categorical domains by using Modified k-modes algorithm and domains with mixed categorical and numerical values by using k-prototypes algorithm. The Modified k-modes algorithm will replace the means with the modes of the clusters by following three measures like “using a simple matching dissimilarity measure for categorical data”, “replacing means of clusters by modes” and “using a frequency-based method to find the modes of a problem used by the k-means algorithm”. The other algorithm used in this paper is the k-prototypes algorithm which is implemented by integrating the Incremental k-means and the Modified k-modes partition clustering algorithms. All these algorithms reduce the cost function value.
KeywordsCluster K-means K-modes K-prototypes mixed data
Unable to display preview. Download preview PDF.
- 1.Haung, Z.: Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Canberra, ACT 2601, Australia (1998)Google Scholar
- 2.He, Z., Deng, S., Xu, X.: Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode. Harbin Institute of Technology, China (2005)Google Scholar
- 3.Haung, Z.: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data MiningGoogle Scholar
- 4.Sayal, R., Vijay Kumar, V.: A Novel Similarity Measure for Clustering Categorical Data Sets. International Journal of Computer Applications (2011)Google Scholar
- 5.Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (2011)Google Scholar
- 6.Mastrogiannis, N., Giannikos, I., Boutsinas, B., Antzoulatos, G.: CL.E.KMODES: A modified k-modes clustering algorithm. University of Patras, Greece (2009)Google Scholar
- 7.Khan, S.S., Kant, S.: Computation of Initial Modes for K-modes Clustering Algorithm using Evidence Accumulation (2007)Google Scholar
- 8.Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson education (2006)Google Scholar
- 9.He, Z.: Approximation Algorithms for K-Modes Clustering. Harbin Institute of Technology, China (2006)Google Scholar
- 10.Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier (2006)Google Scholar