When Is “Nearest Neighbor” Meaningful?

Beyer, Kevin; Goldstein, Jonathan; Ramakrishnan, Raghu; Shaft, Uri

doi:10.1007/3-540-49257-7_15

Kevin Beyer⁶,
Jonathan Goldstein⁶,
Raghu Ramakrishnan⁶ &
…
Uri Shaft⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1540))

Included in the following conference series:

International Conference on Database Theory

5393 Accesses
793 Citations
6 Altmetric

Abstract

We explore the effect of dimensionality on the “nearest neighbor” problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10–15 dimensions.

These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10–15) dimensionality!

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Faloutsos, C., Swami, A.: Efficient Similarity Search in Sequence Databases. In Proc. 4th Inter. Conf. on FODO (1993) 69–84
Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E., Lipman, D.J.: Basic Local Alignment Search Tool. In Journal of Molecular Biology, Vol. 215 (1990) 403–410
Google Scholar
Ang, Y.H., Li, Z., Ong, S.H.: Image retrieval based on multidimensional feature properties. In SPIE, Vol. 2420 (1995) 47–57
Article Google Scholar
Arya, S.: Nearest Neighbor Searching and Applications. Ph.D. thesis, Univ. of Maryland at College Park (1995)
Google Scholar
Arya, S., Mount, D.M., Narayan, O.: Accounting for Boundary Effects in Nearest Neighbors Searching. In Proc. 11th ACM Symposium on Computational Geometry (1995) 336–344
Google Scholar
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.: An Optimal Algorithm for Nearest Neighbor Searching. In Proc. 5th ACM SIAM Symposium on Discrete Algorithms (1994) 573–582
Google Scholar
Bellman, R.E.: Adaptive Control Processes. Princeton University Press (1961)
Google Scholar
Belussi, A., Faloutsos, C.: Estimating the Selectivity of Spatial Queries Using the ‘Correlation’ Fractal Dimension. In Proc. VLDB (1995) 299–310
Google Scholar
Bentley, J.L., Weide, B.W., Yao, A.C.: Optimal Expected-time Algorithms for Closest Point Problem”, In ACM Transactions on Mathematical Software, Vol. 6,No. 4 (1980) 563–580
Article MATH MathSciNet Google Scholar
Berchtold, S., Böhm, C., Braunmüller, B., Keim, D.A., Kriegel, H.-P.: Fast Parallel Similarity Search in Multimedia Databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data (1997) 1–12
Google Scholar
Berchtold, S., Böhm, C.,, B., Keim, D.A., Kriegel H.-P.: A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space. In Proc. 16th ACM SIGACTSIGMOD-SIGART Symposium on PODS (1997) 78–86
Google Scholar
Bern, M.: Approximate Closest Point Queries in High Dimensions. In Information Processing Letters, Vol. 45 (1993) 95–99
Article MATH MathSciNet Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is Nearest Neighbors Meaningful? Technical Report No. TR1377, Computer Sciences Dept., Univ. of Wisconsin-Madison, June 1998
Google Scholar
Bozkaya, T., Ozsoyoglu, M.: Distance-Based Indexing for High-Dimensional Metric Spaces. In Proc. 16th ACM SIGACT-SIGMOD-SIGART Symposium on PODS (1997) 357–368
Google Scholar
Faloutsos, C., et al: Efficient and Effective Querying by Image Content. In Journal of Intelligent Information Systems, Vol. 3,No. 3 (1994) 231–262
Article Google Scholar
Faloutsos, C., Gaede, V.: Analysis of n-Dimensional Quadtrees Using the Housdorff Fractal Dimension. In Proc. ACM SIGMOD Int. Conf. of the Management of Data (1996)
Google Scholar
Faloutsos, C., Kamel, I.: Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. In Proc. 13th ACM SIGACT-SIGMOD-SIGART Symposium on PODS 1994 4–13
Google Scholar
Fayyad, U.M., Smyth, P.: Automated Analysis and Exploration of Image Databases: Results, Progress and Challenges. In Journal of intelligent information systems, Vol. 4,No. 1 (1995) 7–25
Article Google Scholar
Katayama, N., Satoh, S.: The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. In Proc. 16th ACM SIGACT-SIGMOD-SIGART Symposium on PODS (1997) 369–380
Google Scholar
Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The TV-Tree: An Index Structure for High-Dimensional Data. In VLDB Journal, Vol. 3,No. 4 (1994) 517–542
Article Google Scholar
Manjunath, B.S., Ma, W.Y.: Texture Features for Browsing and Retrieval of Image Data. In IEEE Trans. on Pattern Analysis and Machine Learning, Vol. 18,No. 8 (1996) 837–842
Article Google Scholar
Mehrotra, R., Gary, J.E.: Feature-Based Retrieval of Similar Shapes. In 9th Data Engineering Conference (1992) 108–115
Google Scholar
Murase, H., Nayar, S.K.: Visual Learning and Recognition of 3D Objects from Appearance. In Int. J. of Computer Vision, Vol. 14,No. 1 (1995) 5–24
Article Google Scholar
Nene, S.A., Nayar, S.K.: A Simple Algorithm for Nearest Neighbor Search in High Dimensions. In IEEE Trans. on Pattern Analysis and Machine Learning, Vol. 18,No. 8 (1996) 989–1003
Google Scholar
Pentland, A., Picard, R.W., Scalroff, S.: Photobook: Tools for Content Based Manipulation of Image Databases. In SPIE Vol. 2185 (1994) 34–47
Article Google Scholar
Scott, D.W.: Multivariate Density Estimation. Wiley Interscience, Chapter 2 (1992)
Google Scholar
Shaft, U., Goldstein, J., Beyer, K.: Nearest Neighbors Query Performance for Unstable Distributions. Technical Report No. TR1388, Computer Sciences Dept., Univ. of Wisconsin-Madison, October 1998
Google Scholar
Swain, M.J., Ballard D.H.: Color Indexing. In Inter. Journal of Computer Vision, Vol. 7,No. 1 (1991) 11–32
Article Google Scholar
Swets, D.L., Weng, J.: Using Discriminant Eigenfeatures for Image Retrieval. In IEEE Trans. on Pattern Analysis and Machine Learning, Vol. 18,No. 8 (1996) 831–836
Article Google Scholar
Taubin, G., Cooper, D.B.: Recognition and Positioning of Rigid Objects Using Algebraic Moment Invariants. In SPIE, Vol. 1570 (1991) 318–327
Google Scholar
White, D.A., Jain, R.: Similarity Indexing with the SS-Tree. In ICDE (1996) 516–523
Google Scholar

Download references

Author information

Authors and Affiliations

CS Dept., University of Wisconsin-Madison, 1210 W. Dayton St., Madison, WI 53706
Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan & Uri Shaft

Authors

Kevin Beyer
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Goldstein
View author publications
You can also search for this author in PubMed Google Scholar
Raghu Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
Uri Shaft
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, The Hebrew University, Givat-Ram, Jerusalem, 91940, Israel
Catriel Beeri
Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, 19104-6389, USA
Peter Buneman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U. (1999). When Is “Nearest Neighbor” Meaningful?. In: Beeri, C., Buneman, P. (eds) Database Theory — ICDT’99. ICDT 1999. Lecture Notes in Computer Science, vol 1540. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49257-7_15

Download citation

DOI: https://doi.org/10.1007/3-540-49257-7_15
Published: 15 January 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65452-0
Online ISBN: 978-3-540-49257-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics