Skip to main content

An Empirical Evaluation of K-Means Clustering Algorithm Using Different Distance/Similarity Metrics

  • Conference paper
  • First Online:
Proceedings of ICETIT 2019

Abstract

k-means is an effective and efficient clustering algorithm. It uses distance/similarity metric to find out the distance/similarity among the data objects. The objects which are closer/similar to each other are assigned to the same cluster where as distant/dissimilar objects are assigned to different clusters. Most of the implementations of k-means are based on Euclidean/Squared Euclidean distance metrics. In order to find out the possibility of different distance/similarity metrics to be used with k-means algorithm, an empirical evaluation has been performed. In this paper, accuracy, performance and reliability of 13 different distance/similarity measures over 6 different variations of data using k-means algorithm have been compared based on empirical evaluation on well-known benchmark IRIS data set. Accuracy is measured in terms of similarity of cluster assignment between ground truth and machine clustering. Performance is measured in terms of the number of iterations used for convergence of the final cluster assignment. Reliability is measured on the basis of correctness of the cluster assignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Han, J., Kamber, M,, Pei, J.: Data Mining Concepts and Techniques, 3rd edn. Elsevier (2012)

    Google Scholar 

  2. Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications – a decade review from 2000 to 2011. Expert Syst. Appl. 39, 11303–11311 (2012)

    Article  Google Scholar 

  3. Arora, R.K., Gupta, M.K.: e-Governance using data warehousing and data mining. Int. J. Comput. Appl. 169(8), 28–31 (2017)

    Google Scholar 

  4. Gheware, S.D., Kejkar, A.S., Tondare, S.M.: Data mining: tasks, tools, techniques and applications. Int. J. Adv. Res. Comput. Commun. Eng. 3(10) (2014)

    Google Scholar 

  5. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  6. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 60 (1999)

    Article  Google Scholar 

  7. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)

    Article  Google Scholar 

  8. Gupta, M.K., Chandra, P.: A comparative study of clustering algorithms. In: Proceedings of the 13th INDIACom-2019, IEEE Conference ID: 461816, 6th International Conference on “Computing for Sustainable Global Development” (2019)

    Google Scholar 

  9. Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. (2015). https://doi.org/10.1007/s40745-015-0040-1

    Article  MathSciNet  Google Scholar 

  10. Gan, G., Ma, C., Wu, J.: Data clustering: theory, algorithms, and applications. In: American Statistical Association and the Society for Industrial and Applied Mathematics (2007)

    Google Scholar 

  11. Irani, J., Pise, N., Phatak, M.: Clustering techniques and the similarity measures used in clustering: a survey. Int. J. Comput. Appl. 134(7) (2016)

    Article  Google Scholar 

  12. Torres, G., Basnet, R., Sung, A., Mukkamala, S., Ribiero, B.: A similarity measure for clustering and its applications. In: Proceedings of World Academy of Science, Engineering and Technology, vol. 31, pp. 490–496, July 2008. ISSN 1307-6884

    Google Scholar 

  13. Boriah, S., Chandola, V., Kumar, V.: Similarity Measures for Categorical Data: A Comparative Evaluation. SIAM

    Google Scholar 

  14. Deepana, R.: On sample weighted clustering algorithm using Euclidean and Mahalanobis distances. Int. J. Stat. Syst. 12(3), 421–430 (2017)

    Google Scholar 

  15. Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 1(4) (2007)

    Google Scholar 

  16. Xu, Z.S., Chen, J.: An overview of distance and similarity measures of intuitionistic fuzzy sets. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 14(4), 529–555 (2008)

    Article  MathSciNet  Google Scholar 

  17. Choi, S.S., Cha, S.H., Tappert, C.C.: A survey of binary similarity and distance measures. Systemics Cybern. Inform. 8(1) (2010)

    Google Scholar 

  18. Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y.: A comparison study on similarity and dissimilarity measures in clustering continuous data, Data. PLoS ONE 10(12), e0144059 (2015). https://doi.org/10.1371/journal.pone.0144059

    Article  Google Scholar 

  19. Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis, 5th edn. Wiley, Chichester (2011)

    Book  Google Scholar 

  20. Gupta, M.K., Chandra, P.: P-k-means: k-means using partition based cluster initialization method. In: Proceedings of the International Conference on Advancements in Computing & Management (ICACM 2019). Elsevier SSRN (2019)

    Google Scholar 

  21. http://reference.wolfram.com

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manoj Kumar Gupta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gupta, M.K., Chandra, P. (2020). An Empirical Evaluation of K-Means Clustering Algorithm Using Different Distance/Similarity Metrics. In: Singh, P., Panigrahi, B., Suryadevara, N., Sharma, S., Singh, A. (eds) Proceedings of ICETIT 2019. Lecture Notes in Electrical Engineering, vol 605. Springer, Cham. https://doi.org/10.1007/978-3-030-30577-2_79

Download citation

Publish with us

Policies and ethics