Advertisement

Algorithmica

, Volume 72, Issue 1, pp 237–263 | Cite as

Randomized Partition Trees for Nearest Neighbor Search

  • Sanjoy DasguptaEmail author
  • Kaushik Sinha
Article

Abstract

The \(k\)-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in several situations of interest: when the data are drawn from a doubling measure; when the data and query distributions are identical and are supported on a set of bounded doubling dimension; and when the data are documents from a topic model.

Keywords

Nearest neighbor Intrinsic dimension Spatial partition k-d tree Random projection 

Notes

Acknowledgments

The authors are grateful to the National Science Foundation for support under grant IIS-1162581, and to the anonymous reviewers for their detailed feedback.

References

  1. 1.
    Ailon, N., Chazelle, B.: The fast Johnson-Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput. 39, 302–322 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)CrossRefGoogle Scholar
  3. 3.
    Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithm for approximate nearest neighbor searching. J. ACM 45, 891–923 (1998)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Assouad, P.: Plongements lipschitziens dans \({\mathbb{R}}^n\). Bull. Soc. Math. France 111(4), 429–448 (1983)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning (2006)Google Scholar
  7. 7.
    Cayton, L., Dasgupta, S.: A learning framework for nearest-neighbor search. In: Advances in Neural Information Processing Systems (2007)Google Scholar
  8. 8.
    Clarkson, K.: Nearest neighbor queries in metric spaces. Discret. Comput. Geom. 22, 63–93 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    Clarkson, K.: Nearest-neighbor searching and metric space dimensions. In: Darrell, T., Indyk, P. (eds.) Nearest-Neighbor Methods for Learning and Vision: Theory and Practice. MIT Press, Cambridge (2005)Google Scholar
  10. 10.
    Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: ACM Symposium on Theory of, Computing, pp. 537–546 (2008)Google Scholar
  11. 11.
    Dasgupta, S., Sinha, K.: Randomized partition trees for exact nearest neighbor search. In: 26th Annual Conference on Learning Theory (2013)Google Scholar
  12. 12.
    Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: 44th Annual IEEE Symposium on Foundations of Computer, Science, pp. 534–543 (2003)Google Scholar
  13. 13.
    Karger, D., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: ACM Symposium on Theory of, Computing, pp. 741–750 (2002)Google Scholar
  14. 14.
    Kleinberg, J.: Two algorithms for nearest-neighbor search in high dimensions. In: 29th ACM Symposium on Theory of, Computing (1997)Google Scholar
  15. 15.
    Krauthgamer, R., Lee, J.: Navigating nets: simple algorithms for proximity search. In: ACM-SIAM Symposium on Discrete Algorithms (2004)Google Scholar
  16. 16.
    Liu, T., Moore, A., Gray, A., Yang, K.: An investigation of practical approximate nearest neighbor algorithms. In: Advances in Neural Information Processing Systems (2004)Google Scholar
  17. 17.
    Maneewongvatana, S., Mount, D.: The analysis of a probabilistic approach to nearest neighbor searching. In: Seventh International Worshop on Algorithms and Data Structures, pp. 276–286 (2001)Google Scholar
  18. 18.
    McFee, B., Lanckriet, G.: Large-scale music similarity search with spatial trees. In: 12th Conference of the International Society for Music Retrieval (2011)Google Scholar
  19. 19.
    Stone, C.: Consistent nonparametric regression. Ann. Stat. 5, 595–645 (1977)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of CaliforniaSan DiegoUSA
  2. 2.Wichita State UniversityWichitaUSA

Personalised recommendations