An Empirical Comparison of Exact Nearest Neighbour Algorithms

  • Ashraf M. Kibriya
  • Eibe Frank
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4702)

Abstract

Nearest neighbour search (NNS) is an old problem that is of practical importance in a number of fields. It involves finding, for a given point q, called the query, one or more points from a given set of points that are nearest to the query q. Since the initial inception of the problem a great number of algorithms and techniques have been proposed for its solution. However, it remains the case that many of the proposed algorithms have not been compared against each other on a wide variety of datasets. This research attempts to fill this gap to some extent by presenting a detailed empirical comparison of three prominent data structures for exact NNS: KD-Trees, Metric Trees, and Cover Trees. Our results suggest that there is generally little gain in using Metric Trees or Cover Trees instead of KD-Trees for the standard NNS problem.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Minsky, M., Papert, S.: Perceptrons, pp. 222–225. MIT Press, Cambridge (1969)MATHGoogle Scholar
  2. 2.
    Aurenhammer, F.: Voronoi diagrams–A survey of a fundamental geometric data structure. ACM Computing Surveys 23(3), 345–405 (1991)CrossRefGoogle Scholar
  3. 3.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2001)MATHGoogle Scholar
  4. 4.
    Liu, T., Moore, A.W., Gray, A.G.: Efficient exact k-NN and nonparametric classification in high dimensions. In: Proc. of NIPS 2003, MIT Press, Cambridge (2004)Google Scholar
  5. 5.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proc. 13th Annual ACM symposium on Theory of Computing, pp. 604–613. ACM Press, New York (1998)Google Scholar
  6. 6.
    Nene, S.A., Nayar, S.K.: A simple algorithm for nearest neighbor search in high dimensions. IEEE Trans. Pattern Anal. Mach. Intell. 19(9), 989–1003 (1997)CrossRefGoogle Scholar
  7. 7.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proc. 20th Annual Symposium on Computational Geometry, pp. 253–262. ACM Press, New York (2004)Google Scholar
  8. 8.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proc. 23rd International Conference on Machine learning, pp. 97–104. ACM Press, New York (2006)CrossRefGoogle Scholar
  9. 9.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Mount, D.M., Arya, S.: ANN: A library for approximate nearest neighbor searching. In: CGC 2nd Annual Fall Workshop on Computational Geometry (1997), Available from http://www.cs.umd.edu/~mount/ANN
  11. 11.
    Kibriya, A.M.: Fast algorithms for nearest neighbour search. Master’s thesis, Department of Computer Science, University of Waikato, New Zealand (2007)Google Scholar
  12. 12.
    Omohundro, S.M.: Five balltree construction algorithms. Technical Report TR-89-063, International Computer Science Institute (December 1989)Google Scholar
  13. 13.
    Uhlmann, J.K.: Satisfying general proximity / similarity queries with metric trees. Information Processing Letters 40(4), 175–179 (1991)MATHCrossRefGoogle Scholar
  14. 14.
    Moore, A.W.: The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In: Proc. 16th Conference on Uncertainty in Artificial Intelligence, pp. 397–405. Morgan Kaufmann, San Francisco (2000)Google Scholar
  15. 15.
    Bei, C.D., Gray, R.M.: An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE Transactions on Communications 33(10), 1132–1133 (1985)CrossRefGoogle Scholar
  16. 16.
    Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52(3), 239–281 (2003)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ashraf M. Kibriya
    • 1
  • Eibe Frank
    • 1
  1. 1.Department of Computer Science, University of Waikato, HamiltonNew Zealand

Personalised recommendations