Advertisement

Impact of Distance Measures on the Performance of Clustering Algorithms

  • Vijay Kumar
  • Jitender Kumar Chhabra
  • Dinesh Kumar
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 243)

Abstract

Distance measure plays a vital role in clustering algorithms. Selecting the right distance measure for a given dataset is a challenging problem. In this paper, the effect of six distance measures on three clustering algorithms, K-means, single linkage, and average linkage is investigated. The distance measures include Euclidean, Euclidean squared, Manhattan, Mahalanobis, cosine similarity, and Pearson correlation. We describe all the distance measures pointing out their strengths and weaknesses. The performance of clustering algorithms on distance measures are evaluated on two artificial and four real-life datasets. Experimental results show the impact of distance measures when used for different datasets.

Keywords

Clustering Distance measure Clustering algorithms 

References

  1. 1.
    Bandyopadhyay, S., Maulik, U.: Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recogn. 2, 1197–1208 (2002)CrossRefGoogle Scholar
  2. 2.
    Chen, J., Zhao, Z., Ye, J., Liu, H.: Nonlinear adaptive distance metric learning for clustering. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.123–132. New York (2007)Google Scholar
  3. 3.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Recognition. Wiley (2001)Google Scholar
  4. 4.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ (1988)MATHGoogle Scholar
  5. 5.
    Kaufman, L., Rousseeuw, P.: Finding groups in data: an introduction to cluster analysis. Wiley, New York (1990)CrossRefGoogle Scholar
  6. 6.
    Mahalanbois, P.C.: On the generalized distance in statistics. In: Proceedings of the National Institute of Sciences of India, pp. 49–55. Chicago (1936)Google Scholar
  7. 7.
    Prekopcsak, Z., Lemire, D.: Time series classification by class-specific mahalanobis distance measures. Adv. Data Anal. Classif. 6(3), 185–200 (2012)CrossRefMATHMathSciNetGoogle Scholar
  8. 8.
    Vimal, A., Valluri, S.R., Karlapalem, K.: An experiment with distance measures for clustering. Technical report, Center of Data Engineering, IIIT, Hyderabad (2008)Google Scholar
  9. 9.
    Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw 16(3), 513–521 (2005)CrossRefGoogle Scholar
  10. 10.
    UCI Repository of Machine Learning Databases. http://www.ics.uci.edu/ mlearn/databases

Copyright information

© Springer India 2014

Authors and Affiliations

  • Vijay Kumar
    • 1
  • Jitender Kumar Chhabra
    • 2
  • Dinesh Kumar
    • 3
  1. 1.CSE DepartmentJCDM College of EngineeringSirsaIndia
  2. 2.Computer Engineering DepartmentNITKurukshetraIndia
  3. 3.CSE DepartmentGJUS&THisarIndia

Personalised recommendations