Comparative Study of Distance Functions for Nearest Neighbors

Conference paper

Abstract

Many learning algorithms rely on distance metrics to receive their input data. Research has shown that these metrics can improve the performance of these algorithms. Over the years an often popular function is the Euclidean function. In this paper, we investigate a number of different metrics proposed by different communities, including Mahalanobis, Euclidean, Kullback-Leibler and Hamming distance. Overall, the best-performing method is the Mahalanobis distance metric.

Keywords

Kullback-Leibler distance Euclidean distance Mahalanobis distance Manhattan distance Hamming distance Minkowski distance Nearest Neighbor 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Abdi, H., Encyclopedia of Measurement and Statistics, 2007Google Scholar
  2. [2]
    Bar-Hillel, A., Learning from Weak Representations using Distance Functions and Generative Models, Ph.D. Thesis, Hebrew University of Jerusalem, 2006.Google Scholar
  3. [3]
    Beitao L., Chang, E., Wu, C., DPF – A Perceptual Distance Function for Image Retrieval. In Proceedings of the IEEE conference on Image Processing, Sept 2002.Google Scholar
  4. [4]
    Boriah, S., Chandola, V. Kumar, V. Similarity Measures for Categorical Data: A Comparative Evaluation, In Proceedings of the 2008 Society of Industrial and Applied Mathematics (SIAM) International Conference on Data Mining., pp.23–254, 2008.Google Scholar
  5. [5]
    Cover, T.M., Hart, P.E., Nearest Neighbor Pattern Classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 13, pp. 21–271, Jan. 1967.MATHGoogle Scholar
  6. [6]
    Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I. S., Information-Theoretic Metric Learning, In the Proceedings of the 24th International Conference on Machine Learning, 2007.Google Scholar
  7. [7]
    Griffiths, R. Multiple Comparison Methods for Data Review of Census for Agriculture Press Releases, In the Proceedings of the Survey Research Methods Section of the American Statistical Association, 1992.Google Scholar
  8. [8]
    Jensen, D.D., Cohen, P.R., Multiple Comparisons in Induction Algorithms, Klumer Academic Publishers, pp. 1–33, 2002.Google Scholar
  9. [9]
    Jones, W.P., Furnas, G.W., Pictures of Relevance: A Geometric Analysis of similarity Measures, Journal of American Society of Information Science vol. 38, Issue 6, pp. 420–442, 1987.CrossRefGoogle Scholar
  10. [10]
    Kamichety, H.M., Natarajan, P., Rakshit S., An Empirical Framework to Evaluate Performance of Dissimilarity Metrics in Content Based Image Retrieval Systems, Technical Report, Center of Artificial Intelligence and Robotics, Bangalore, 2002.Google Scholar
  11. [11]
    Noreault, T., McGill, M., Koll, M.B., A Performance Evaluation of Similarity Measures, Document Term Weighting Schemes and Representations in a Boolean Environment, In SIGIR ’80 Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval, 76, 1981.Google Scholar
  12. [12]
    Qian, G., Sural, S., Gu, Y., Pramanik, S., Similarity Between Euclidean and Cosine Angle Distance of Nearest Neighbor Queries, In the Proceedings of the ACM Symposium on Applied Computing, 2004.Google Scholar
  13. [13]
    Tumminello, M., Lillo, F., Mantegna, R.N., Kulback- Leiber as a Measure of the Information Filtered from Multivariate Data, Physical Review E. 76, 031123 , 2007.CrossRefMathSciNetGoogle Scholar
  14. [14]
    Weinberger, K.Q., Blitzer, J., Saul, L.K., Distance Metric Learning for Large Margin Nearest Neighbor Classification, Advances in Neural Information Processing Systems, MIT Press, 2006.Google Scholar
  15. [15]
    Weinberger, K. Q., Saul, L. K., Fast Solvers and Efficient Implementations for Distance Metric Learning, Under Review by the International Conference on Machine Learning (ICML), 2007.Google Scholar
  16. [16]
    Wilson, D.R., Martinez, T.R., Improved Heterogeneous Distance Functions, Journal of Artificial Intelligence Research (JAIR), vol. 6, Issue 1, pp. 1–34, 1997.MATHMathSciNetGoogle Scholar
  17. [17]
    Wilson, D.R., Advances in Instance-Based Learning Algorithms, Ph.D. Thesis, Brigham Young University, 1997.Google Scholar
  18. [18]
    Wölfel, M., Ekenel,H. K., Feature Weighted Mahalanobis Distance: Improved Robustness for Gaussian Classifiers, In the Proceedings of the 13th European Signal Processing Conference (EUSIPCO 2005), Sept 2005.Google Scholar
  19. [19]
    Zwick, R., Carlstein, E., Budescu, D.V., Measures of Similarity among Fuzzy Concepts: A Comparative Analysis, International Journal of Approximate Reasoning 1, 2, pp. 221–242, 1987.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  1. 1.School of Computing and Information TechnologyUniversity of TechnologyKingston 6Jamaica W.I
  2. 2.Department of Mathematics and Computing Centre for Systems BiologyUniversity of Southern QueenslandToowoombaAustralia

Personalised recommendations