An Empirical Evaluation of K-Means Clustering Algorithm Using Different Distance/Similarity Metrics

Gupta, Manoj Kumar; Chandra, Pravin

doi:10.1007/978-3-030-30577-2_79

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 605))

1769 Accesses
15 Citations

Abstract

k-means is an effective and efficient clustering algorithm. It uses distance/similarity metric to find out the distance/similarity among the data objects. The objects which are closer/similar to each other are assigned to the same cluster where as distant/dissimilar objects are assigned to different clusters. Most of the implementations of k-means are based on Euclidean/Squared Euclidean distance metrics. In order to find out the possibility of different distance/similarity metrics to be used with k-means algorithm, an empirical evaluation has been performed. In this paper, accuracy, performance and reliability of 13 different distance/similarity measures over 6 different variations of data using k-means algorithm have been compared based on empirical evaluation on well-known benchmark IRIS data set. Accuracy is measured in terms of similarity of cluster assignment between ground truth and machine clustering. Performance is measured in terms of the number of iterations used for convergence of the final cluster assignment. Reliability is measured on the basis of correctness of the cluster assignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Han, J., Kamber, M,, Pei, J.: Data Mining Concepts and Techniques, 3rd edn. Elsevier (2012)
Google Scholar
Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications – a decade review from 2000 to 2011. Expert Syst. Appl. 39, 11303–11311 (2012)
Article Google Scholar
Arora, R.K., Gupta, M.K.: e-Governance using data warehousing and data mining. Int. J. Comput. Appl. 169(8), 28–31 (2017)
Google Scholar
Gheware, S.D., Kejkar, A.S., Tondare, S.M.: Data mining: tasks, tools, techniques and applications. Int. J. Adv. Res. Comput. Commun. Eng. 3(10) (2014)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 60 (1999)
Article Google Scholar
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Article Google Scholar
Gupta, M.K., Chandra, P.: A comparative study of clustering algorithms. In: Proceedings of the 13th INDIACom-2019, IEEE Conference ID: 461816, 6th International Conference on “Computing for Sustainable Global Development” (2019)
Google Scholar
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. (2015). https://doi.org/10.1007/s40745-015-0040-1
Article MathSciNet Google Scholar
Gan, G., Ma, C., Wu, J.: Data clustering: theory, algorithms, and applications. In: American Statistical Association and the Society for Industrial and Applied Mathematics (2007)
Google Scholar
Irani, J., Pise, N., Phatak, M.: Clustering techniques and the similarity measures used in clustering: a survey. Int. J. Comput. Appl. 134(7) (2016)
Article Google Scholar
Torres, G., Basnet, R., Sung, A., Mukkamala, S., Ribiero, B.: A similarity measure for clustering and its applications. In: Proceedings of World Academy of Science, Engineering and Technology, vol. 31, pp. 490–496, July 2008. ISSN 1307-6884
Google Scholar
Boriah, S., Chandola, V., Kumar, V.: Similarity Measures for Categorical Data: A Comparative Evaluation. SIAM
Google Scholar
Deepana, R.: On sample weighted clustering algorithm using Euclidean and Mahalanobis distances. Int. J. Stat. Syst. 12(3), 421–430 (2017)
Google Scholar
Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 1(4) (2007)
Google Scholar
Xu, Z.S., Chen, J.: An overview of distance and similarity measures of intuitionistic fuzzy sets. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 14(4), 529–555 (2008)
Article MathSciNet Google Scholar
Choi, S.S., Cha, S.H., Tappert, C.C.: A survey of binary similarity and distance measures. Systemics Cybern. Inform. 8(1) (2010)
Google Scholar
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y.: A comparison study on similarity and dissimilarity measures in clustering continuous data, Data. PLoS ONE 10(12), e0144059 (2015). https://doi.org/10.1371/journal.pone.0144059
Article Google Scholar
Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis, 5th edn. Wiley, Chichester (2011)
Book Google Scholar
Gupta, M.K., Chandra, P.: P-k-means: k-means using partition based cluster initialization method. In: Proceedings of the International Conference on Advancements in Computing & Management (ICACM 2019). Elsevier SSRN (2019)
Google Scholar
http://reference.wolfram.com

Download references

Author information

Authors and Affiliations

USIC&T and Faculty, Rukmini Devi Institute of Advanced Studies, Guru Gobind Singh Indraprastha University, Delhi, India
Manoj Kumar Gupta
USIC&T, Guru Gobind Singh Indraprastha University, Delhi, India
Pravin Chandra

Authors

Manoj Kumar Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Pravin Chandra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manoj Kumar Gupta .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Jaypee University of Information Technology, Kandaghat, India
Pradeep Kumar Singh
Department of Electrical Engineering, IIT Delhi, Delhi, Delhi, India
Bijaya Ketan Panigrahi
School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Telangana, India
Nagender Kumar Suryadevara
Institute of Information Technology and Management, Delhi, Delhi, India
Sudhir Kumar Sharma
USICT, GGSIPU, Delhi, Delhi, India
Amit Prakash Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, M.K., Chandra, P. (2020). An Empirical Evaluation of K-Means Clustering Algorithm Using Different Distance/Similarity Metrics. In: Singh, P., Panigrahi, B., Suryadevara, N., Sharma, S., Singh, A. (eds) Proceedings of ICETIT 2019. Lecture Notes in Electrical Engineering, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-030-30577-2_79

Download citation

DOI: https://doi.org/10.1007/978-3-030-30577-2_79
Published: 24 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30576-5
Online ISBN: 978-3-030-30577-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics