Impact of Distance Measures on the Performance of Clustering Algorithms
Distance measure plays a vital role in clustering algorithms. Selecting the right distance measure for a given dataset is a challenging problem. In this paper, the effect of six distance measures on three clustering algorithms, K-means, single linkage, and average linkage is investigated. The distance measures include Euclidean, Euclidean squared, Manhattan, Mahalanobis, cosine similarity, and Pearson correlation. We describe all the distance measures pointing out their strengths and weaknesses. The performance of clustering algorithms on distance measures are evaluated on two artificial and four real-life datasets. Experimental results show the impact of distance measures when used for different datasets.
KeywordsClustering Distance measure Clustering algorithms
- 2.Chen, J., Zhao, Z., Ye, J., Liu, H.: Nonlinear adaptive distance metric learning for clustering. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.123–132. New York (2007)Google Scholar
- 3.Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Recognition. Wiley (2001)Google Scholar
- 6.Mahalanbois, P.C.: On the generalized distance in statistics. In: Proceedings of the National Institute of Sciences of India, pp. 49–55. Chicago (1936)Google Scholar
- 8.Vimal, A., Valluri, S.R., Karlapalem, K.: An experiment with distance measures for clustering. Technical report, Center of Data Engineering, IIIT, Hyderabad (2008)Google Scholar
- 10.UCI Repository of Machine Learning Databases. http://www.ics.uci.edu/ mlearn/databases