Abstract
The k-means algorithm is well-known for its efficiency in clustering large data sets and it is restricted to the numerical data types. But the real world is a mixture of various data typed objects. In this paper we implemented algorithms which extend the k-means algorithm to categorical domains by using Modified k-modes algorithm and domains with mixed categorical and numerical values by using k-prototypes algorithm. The Modified k-modes algorithm will replace the means with the modes of the clusters by following three measures like “using a simple matching dissimilarity measure for categorical data”, “replacing means of clusters by modes” and “using a frequency-based method to find the modes of a problem used by the k-means algorithm”. The other algorithm used in this paper is the k-prototypes algorithm which is implemented by integrating the Incremental k-means and the Modified k-modes partition clustering algorithms. All these algorithms reduce the cost function value.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Haung, Z.: Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Canberra, ACT 2601, Australia (1998)
He, Z., Deng, S., Xu, X.: Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode. Harbin Institute of Technology, China (2005)
Haung, Z.: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
Sayal, R., Vijay Kumar, V.: A Novel Similarity Measure for Clustering Categorical Data Sets. International Journal of Computer Applications (2011)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (2011)
Mastrogiannis, N., Giannikos, I., Boutsinas, B., Antzoulatos, G.: CL.E.KMODES: A modified k-modes clustering algorithm. University of Patras, Greece (2009)
Khan, S.S., Kant, S.: Computation of Initial Modes for K-modes Clustering Algorithm using Evidence Accumulation (2007)
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson education (2006)
He, Z.: Approximation Algorithms for K-Modes Clustering. Harbin Institute of Technology, China (2006)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Madhuri, R., Murty, M.R., Murthy, J.V.R., Reddy, P.V.G.D.P., Satapathy, S.C. (2014). Cluster Analysis on Different Data Sets Using K-Modes and K-Prototype Algorithms. In: Satapathy, S., Avadhani, P., Udgata, S., Lakshminarayana, S. (eds) ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India- Vol II. Advances in Intelligent Systems and Computing, vol 249. Springer, Cham. https://doi.org/10.1007/978-3-319-03095-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-03095-1_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03094-4
Online ISBN: 978-3-319-03095-1
eBook Packages: EngineeringEngineering (R0)