Abstract
k-means is an effective and efficient clustering algorithm. It uses distance/similarity metric to find out the distance/similarity among the data objects. The objects which are closer/similar to each other are assigned to the same cluster where as distant/dissimilar objects are assigned to different clusters. Most of the implementations of k-means are based on Euclidean/Squared Euclidean distance metrics. In order to find out the possibility of different distance/similarity metrics to be used with k-means algorithm, an empirical evaluation has been performed. In this paper, accuracy, performance and reliability of 13 different distance/similarity measures over 6 different variations of data using k-means algorithm have been compared based on empirical evaluation on well-known benchmark IRIS data set. Accuracy is measured in terms of similarity of cluster assignment between ground truth and machine clustering. Performance is measured in terms of the number of iterations used for convergence of the final cluster assignment. Reliability is measured on the basis of correctness of the cluster assignment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Han, J., Kamber, M,, Pei, J.: Data Mining Concepts and Techniques, 3rd edn. Elsevier (2012)
Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications – a decade review from 2000 to 2011. Expert Syst. Appl. 39, 11303–11311 (2012)
Arora, R.K., Gupta, M.K.: e-Governance using data warehousing and data mining. Int. J. Comput. Appl. 169(8), 28–31 (2017)
Gheware, S.D., Kejkar, A.S., Tondare, S.M.: Data mining: tasks, tools, techniques and applications. Int. J. Adv. Res. Comput. Commun. Eng. 3(10) (2014)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 60 (1999)
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Gupta, M.K., Chandra, P.: A comparative study of clustering algorithms. In: Proceedings of the 13th INDIACom-2019, IEEE Conference ID: 461816, 6th International Conference on “Computing for Sustainable Global Development” (2019)
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. (2015). https://doi.org/10.1007/s40745-015-0040-1
Gan, G., Ma, C., Wu, J.: Data clustering: theory, algorithms, and applications. In: American Statistical Association and the Society for Industrial and Applied Mathematics (2007)
Irani, J., Pise, N., Phatak, M.: Clustering techniques and the similarity measures used in clustering: a survey. Int. J. Comput. Appl. 134(7) (2016)
Torres, G., Basnet, R., Sung, A., Mukkamala, S., Ribiero, B.: A similarity measure for clustering and its applications. In: Proceedings of World Academy of Science, Engineering and Technology, vol. 31, pp. 490–496, July 2008. ISSN 1307-6884
Boriah, S., Chandola, V., Kumar, V.: Similarity Measures for Categorical Data: A Comparative Evaluation. SIAM
Deepana, R.: On sample weighted clustering algorithm using Euclidean and Mahalanobis distances. Int. J. Stat. Syst. 12(3), 421–430 (2017)
Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 1(4) (2007)
Xu, Z.S., Chen, J.: An overview of distance and similarity measures of intuitionistic fuzzy sets. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 14(4), 529–555 (2008)
Choi, S.S., Cha, S.H., Tappert, C.C.: A survey of binary similarity and distance measures. Systemics Cybern. Inform. 8(1) (2010)
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y.: A comparison study on similarity and dissimilarity measures in clustering continuous data, Data. PLoS ONE 10(12), e0144059 (2015). https://doi.org/10.1371/journal.pone.0144059
Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis, 5th edn. Wiley, Chichester (2011)
Gupta, M.K., Chandra, P.: P-k-means: k-means using partition based cluster initialization method. In: Proceedings of the International Conference on Advancements in Computing & Management (ICACM 2019). Elsevier SSRN (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Gupta, M.K., Chandra, P. (2020). An Empirical Evaluation of K-Means Clustering Algorithm Using Different Distance/Similarity Metrics. In: Singh, P., Panigrahi, B., Suryadevara, N., Sharma, S., Singh, A. (eds) Proceedings of ICETIT 2019. Lecture Notes in Electrical Engineering, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-030-30577-2_79
Download citation
DOI: https://doi.org/10.1007/978-3-030-30577-2_79
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30576-5
Online ISBN: 978-3-030-30577-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)