Cluster Analysis on Different Data Sets Using K-Modes and K-Prototype Algorithms

Madhuri, R.; Murty, M. Ramakrishna; Murthy, J. V. R.; Reddy, P. V. G. D. Prasad; Satapathy, Suresh C.

doi:10.1007/978-3-319-03095-1_15

R. Madhuri⁶,
M. Ramakrishna Murty⁶,
J. V. R. Murthy⁷,
P. V. G. D. Prasad Reddy⁸ &
…
Suresh C. Satapathy⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 249))

2837 Accesses
13 Citations

Abstract

The k-means algorithm is well-known for its efficiency in clustering large data sets and it is restricted to the numerical data types. But the real world is a mixture of various data typed objects. In this paper we implemented algorithms which extend the k-means algorithm to categorical domains by using Modified k-modes algorithm and domains with mixed categorical and numerical values by using k-prototypes algorithm. The Modified k-modes algorithm will replace the means with the modes of the clusters by following three measures like “using a simple matching dissimilarity measure for categorical data”, “replacing means of clusters by modes” and “using a frequency-based method to find the modes of a problem used by the k-means algorithm”. The other algorithm used in this paper is the k-prototypes algorithm which is implemented by integrating the Incremental k-means and the Modified k-modes partition clustering algorithms. All these algorithms reduce the cost function value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Haung, Z.: Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Canberra, ACT 2601, Australia (1998)
Google Scholar
He, Z., Deng, S., Xu, X.: Improving K-Modes Algorithm Considering Frequencies of Attribute Values in Mode. Harbin Institute of Technology, China (2005)
Google Scholar
Haung, Z.: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
Google Scholar
Sayal, R., Vijay Kumar, V.: A Novel Similarity Measure for Clustering Categorical Data Sets. International Journal of Computer Applications (2011)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques (2011)
Google Scholar
Mastrogiannis, N., Giannikos, I., Boutsinas, B., Antzoulatos, G.: CL.E.KMODES: A modified k-modes clustering algorithm. University of Patras, Greece (2009)
Google Scholar
Khan, S.S., Kant, S.: Computation of Initial Modes for K-modes Clustering Algorithm using Evidence Accumulation (2007)
Google Scholar
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson education (2006)
Google Scholar
He, Z.: Approximation Algorithms for K-Modes Clustering. Harbin Institute of Technology, China (2006)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Elsevier (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of CSE, GMR Institute of Technology, Rajam, Srikakulam(Dist.), A.P., India
R. Madhuri & M. Ramakrishna Murty
Dept. of CSE, JNTUK, Kakinada, A.P., India
J. V. R. Murthy
Dept. of CS&SE, Andhra University, Visakhapatnam, A.P., India
P. V. G. D. Prasad Reddy
Dept. of CSE, ANITS, Visakhapatnam, A.P., India
Suresh C. Satapathy

Authors

R. Madhuri
View author publications
You can also search for this author in PubMed Google Scholar
M. Ramakrishna Murty
View author publications
You can also search for this author in PubMed Google Scholar
J. V. R. Murthy
View author publications
You can also search for this author in PubMed Google Scholar
P. V. G. D. Prasad Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Suresh C. Satapathy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Madhuri .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences, Vishakapatnam, India
Suresh Chandra Satapathy
College of Engineering(A), Andhra University, Vishakapatnam, India
P. S. Avadhani
University of Hyderabad, Hyderabad, India
Siba K. Udgata
CSIR-National Institute of Oceanography, Visakhapatnam, India
Sadasivuni Lakshminarayana

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Madhuri, R., Murty, M.R., Murthy, J.V.R., Reddy, P.V.G.D.P., Satapathy, S.C. (2014). Cluster Analysis on Different Data Sets Using K-Modes and K-Prototype Algorithms. In: Satapathy, S., Avadhani, P., Udgata, S., Lakshminarayana, S. (eds) ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India- Vol II. Advances in Intelligent Systems and Computing, vol 249. Springer, Cham. https://doi.org/10.1007/978-3-319-03095-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-03095-1_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03094-4
Online ISBN: 978-3-319-03095-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics