Skip to main content

When Is Nearest Neighbors Indexable?

  • Conference paper
Database Theory - ICDT 2005 (ICDT 2005)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3363))

Included in the following conference series:

Abstract

In this paper, we consider whether traditional index structures are effective in processing unstable nearest neighbors workloads. It is known that under broad conditions, nearest neighbors workloads become unstable–distances between data points become indistinguishable from each other. We complement this earlier result by showing that if the workload for your application is unstable, you are not likely to be able to index it efficiently using (almost all known) multidimensional index structures. For a broad class of data distributions, we prove that these index structures will do no better than a linear scan of the data as dimensionality increases.

Our result has implications for how experiments should be designed on index structures such as R-Trees, X-Trees and SR-Trees: Simply put, experiments trying to establish that these index structures scale with dimensionality should be designed to establish cross-over points, rather than to show that the methods scale to an arbitrary number of dimensions. In other words, experiments should seek to establish the dimensionality of the dataset at which the proposed index structure deteriorates to linear scan, for each data distribution of interest; that linear scan will eventually dominate is a given.

An important problem is to analytically characterize the rate at which index structures degrade with increasing dimensionality, because the dimensionality of a real data set may well be in the range that a particular method can handle. The results in this paper can be regarded as a step towards solving this problem. Although we do not characterize the rate at which a structure degrades, our techniques allow us to reason directly about a broad class of index structures, rather than the geometry of the nearest neighbors problem, in contrast to earlier work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. In: Proc. SIGMOD, pp. 322–331 (1992)

    Google Scholar 

  2. Berchtold, S., Keim, D.A., Kriegel, H.-P.: The x-tree: An Index Structure for High-Dimensional Data. In: Proc. VLDB, pp. 28–39 (1996)

    Google Scholar 

  3. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is Nearest Neighbors Meaningful? In: Proc. ICDT (1999)

    Google Scholar 

  4. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proc. VLDB, pp. 518–529 (1999)

    Google Scholar 

  5. Goldstein, J.: Improved Query Processing and Data Representation Techniques. Ph.D. Thesis, Univ. of Wisconsin-Madison (1999)

    Google Scholar 

  6. Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: Proc. SIGMOD, pp. 47–57 (1984)

    Google Scholar 

  7. Hellerstein, J.M., Koutsoupias, E., Papadimitriou, C.H.: On the analysis of indexing schemes. In: Proc. PODS, pp. 249–256 (1997)

    Google Scholar 

  8. Katayama, N., Satoh, S.: The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. In: Proc. SIGMOD, pp. 369–380 (1997)

    Google Scholar 

  9. Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The TV-Tree – An Index Structure for High-Dimensional Data. VLDB J.: Special Issue on Spatial Database Systems 3/4, 517–542 (1994)

    Google Scholar 

  10. Robinson, J.T.: The K-D-B Tree: A Search Structure for Large Multi-dimensional Dynamic Indexes. In: Proc. SIGMOD, pp. 10–18 (1981)

    Google Scholar 

  11. Sellis, T.K., Roussopoulos, N., Faloutsos, C.: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. In: Proc. VLDB, pp. 507–518 (1987)

    Google Scholar 

  12. Shaft, U.: Database Support for Queries by Image Content. Ph.D. Thesis, Univ. of Wisconsin-Madison (2002)

    Google Scholar 

  13. Smith, J.R.: Query vector projection access method. In: Storage and Retrieval for Image and Video Databases, vol. VII, pp. 511–522 (1998)

    Google Scholar 

  14. White, D.A., Jain, R.C.: Similarity Indexing with the SS-tree. In: Proc. ICDE, pp. 516–523 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shaft, U., Ramakrishnan, R. (2004). When Is Nearest Neighbors Indexable?. In: Eiter, T., Libkin, L. (eds) Database Theory - ICDT 2005. ICDT 2005. Lecture Notes in Computer Science, vol 3363. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30570-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30570-5_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24288-8

  • Online ISBN: 978-3-540-30570-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics