Skip to main content

Comparative Study of Distance Functions for Nearest Neighbors

  • Conference paper
  • First Online:
Advanced Techniques in Computing Sciences and Software Engineering

Abstract

Many learning algorithms rely on distance metrics to receive their input data. Research has shown that these metrics can improve the performance of these algorithms. Over the years an often popular function is the Euclidean function. In this paper, we investigate a number of different metrics proposed by different communities, including Mahalanobis, Euclidean, Kullback-Leibler and Hamming distance. Overall, the best-performing method is the Mahalanobis distance metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdi, H., Encyclopedia of Measurement and Statistics, 2007

    Google Scholar 

  2. Bar-Hillel, A., Learning from Weak Representations using Distance Functions and Generative Models, Ph.D. Thesis, Hebrew University of Jerusalem, 2006.

    Google Scholar 

  3. Beitao L., Chang, E., Wu, C., DPF – A Perceptual Distance Function for Image Retrieval. In Proceedings of the IEEE conference on Image Processing, Sept 2002.

    Google Scholar 

  4. Boriah, S., Chandola, V. Kumar, V. Similarity Measures for Categorical Data: A Comparative Evaluation, In Proceedings of the 2008 Society of Industrial and Applied Mathematics (SIAM) International Conference on Data Mining., pp.23–254, 2008.

    Google Scholar 

  5. Cover, T.M., Hart, P.E., Nearest Neighbor Pattern Classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 13, pp. 21–271, Jan. 1967.

    MATH  Google Scholar 

  6. Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I. S., Information-Theoretic Metric Learning, In the Proceedings of the 24th International Conference on Machine Learning, 2007.

    Google Scholar 

  7. Griffiths, R. Multiple Comparison Methods for Data Review of Census for Agriculture Press Releases, In the Proceedings of the Survey Research Methods Section of the American Statistical Association, 1992.

    Google Scholar 

  8. Jensen, D.D., Cohen, P.R., Multiple Comparisons in Induction Algorithms, Klumer Academic Publishers, pp. 1–33, 2002.

    Google Scholar 

  9. Jones, W.P., Furnas, G.W., Pictures of Relevance: A Geometric Analysis of similarity Measures, Journal of American Society of Information Science vol. 38, Issue 6, pp. 420–442, 1987.

    Article  Google Scholar 

  10. Kamichety, H.M., Natarajan, P., Rakshit S., An Empirical Framework to Evaluate Performance of Dissimilarity Metrics in Content Based Image Retrieval Systems, Technical Report, Center of Artificial Intelligence and Robotics, Bangalore, 2002.

    Google Scholar 

  11. Noreault, T., McGill, M., Koll, M.B., A Performance Evaluation of Similarity Measures, Document Term Weighting Schemes and Representations in a Boolean Environment, In SIGIR ’80 Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval, 76, 1981.

    Google Scholar 

  12. Qian, G., Sural, S., Gu, Y., Pramanik, S., Similarity Between Euclidean and Cosine Angle Distance of Nearest Neighbor Queries, In the Proceedings of the ACM Symposium on Applied Computing, 2004.

    Google Scholar 

  13. Tumminello, M., Lillo, F., Mantegna, R.N., Kulback- Leiber as a Measure of the Information Filtered from Multivariate Data, Physical Review E. 76, 031123 , 2007.

    Article  MathSciNet  Google Scholar 

  14. Weinberger, K.Q., Blitzer, J., Saul, L.K., Distance Metric Learning for Large Margin Nearest Neighbor Classification, Advances in Neural Information Processing Systems, MIT Press, 2006.

    Google Scholar 

  15. Weinberger, K. Q., Saul, L. K., Fast Solvers and Efficient Implementations for Distance Metric Learning, Under Review by the International Conference on Machine Learning (ICML), 2007.

    Google Scholar 

  16. Wilson, D.R., Martinez, T.R., Improved Heterogeneous Distance Functions, Journal of Artificial Intelligence Research (JAIR), vol. 6, Issue 1, pp. 1–34, 1997.

    MATH  MathSciNet  Google Scholar 

  17. Wilson, D.R., Advances in Instance-Based Learning Algorithms, Ph.D. Thesis, Brigham Young University, 1997.

    Google Scholar 

  18. Wölfel, M., Ekenel,H. K., Feature Weighted Mahalanobis Distance: Improved Robustness for Gaussian Classifiers, In the Proceedings of the 13th European Signal Processing Conference (EUSIPCO 2005), Sept 2005.

    Google Scholar 

  19. Zwick, R., Carlstein, E., Budescu, D.V., Measures of Similarity among Fuzzy Concepts: A Comparative Analysis, International Journal of Approximate Reasoning 1, 2, pp. 221–242, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Janett Walters-Williams .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this paper

Cite this paper

Walters-Williams, J., Li, Y. (2010). Comparative Study of Distance Functions for Nearest Neighbors. In: Elleithy, K. (eds) Advanced Techniques in Computing Sciences and Software Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3660-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-3660-5_14

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-3659-9

  • Online ISBN: 978-90-481-3660-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics