# Randomized Partition Trees for Nearest Neighbor Search

- 402 Downloads
- 10 Citations

## Abstract

The \(k\)-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in several situations of interest: when the data are drawn from a doubling measure; when the data and query distributions are identical and are supported on a set of bounded doubling dimension; and when the data are documents from a topic model.

## Keywords

Nearest neighbor Intrinsic dimension Spatial partition k-d tree Random projection## Notes

### Acknowledgments

The authors are grateful to the National Science Foundation for support under grant IIS-1162581, and to the anonymous reviewers for their detailed feedback.

## References

- 1.Ailon, N., Chazelle, B.: The fast Johnson-Lindenstrauss transform and approximate nearest neighbors. SIAM J. Comput.
**39**, 302–322 (2009)CrossRefzbMATHMathSciNetGoogle Scholar - 2.Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM
**51**(1), 117–122 (2008)CrossRefGoogle Scholar - 3.Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithm for approximate nearest neighbor searching. J. ACM
**45**, 891–923 (1998)CrossRefzbMATHMathSciNetGoogle Scholar - 4.Assouad, P.: Plongements lipschitziens dans \({\mathbb{R}}^n\). Bull. Soc. Math. France
**111**(4), 429–448 (1983)zbMATHMathSciNetGoogle Scholar - 5.Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM
**18**(9), 509–517 (1975)CrossRefzbMATHMathSciNetGoogle Scholar - 6.Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning (2006)Google Scholar
- 7.Cayton, L., Dasgupta, S.: A learning framework for nearest-neighbor search. In: Advances in Neural Information Processing Systems (2007)Google Scholar
- 8.Clarkson, K.: Nearest neighbor queries in metric spaces. Discret. Comput. Geom.
**22**, 63–93 (1999)CrossRefzbMATHMathSciNetGoogle Scholar - 9.Clarkson, K.: Nearest-neighbor searching and metric space dimensions. In: Darrell, T., Indyk, P. (eds.) Nearest-Neighbor Methods for Learning and Vision: Theory and Practice. MIT Press, Cambridge (2005)Google Scholar
- 10.Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: ACM Symposium on Theory of, Computing, pp. 537–546 (2008)Google Scholar
- 11.Dasgupta, S., Sinha, K.: Randomized partition trees for exact nearest neighbor search. In: 26th Annual Conference on Learning Theory (2013)Google Scholar
- 12.Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: 44th Annual IEEE Symposium on Foundations of Computer, Science, pp. 534–543 (2003)Google Scholar
- 13.Karger, D., Ruhl, M.: Finding nearest neighbors in growth-restricted metrics. In: ACM Symposium on Theory of, Computing, pp. 741–750 (2002)Google Scholar
- 14.Kleinberg, J.: Two algorithms for nearest-neighbor search in high dimensions. In: 29th ACM Symposium on Theory of, Computing (1997)Google Scholar
- 15.Krauthgamer, R., Lee, J.: Navigating nets: simple algorithms for proximity search. In: ACM-SIAM Symposium on Discrete Algorithms (2004)Google Scholar
- 16.Liu, T., Moore, A., Gray, A., Yang, K.: An investigation of practical approximate nearest neighbor algorithms. In: Advances in Neural Information Processing Systems (2004)Google Scholar
- 17.Maneewongvatana, S., Mount, D.: The analysis of a probabilistic approach to nearest neighbor searching. In: Seventh International Worshop on Algorithms and Data Structures, pp. 276–286 (2001)Google Scholar
- 18.McFee, B., Lanckriet, G.: Large-scale music similarity search with spatial trees. In: 12th Conference of the International Society for Music Retrieval (2011)Google Scholar
- 19.Stone, C.: Consistent nonparametric regression. Ann. Stat.
**5**, 595–645 (1977)CrossRefzbMATHGoogle Scholar